[01:10:26] 10serviceops, 10Performance-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10Jdforrester-WMF) [01:12:11] 10serviceops, 10Performance-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10Jdforrester-WMF) [07:32:43] <_joe_> IMHO an operator is a system component and given it modifies how kubernetes "operates" it should be under the control of whoever runs the cluster, in general [07:32:49] <_joe_> exceptions can be made case by case [07:33:21] <_joe_> everyone: by noon I'll merge https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/837495 and the corresponding first commit of sextant [07:33:29] <_joe_> unless someone objects [07:49:51] 10serviceops, 10API Platform, 10SRE: Block non-browser requests that use generic user agent (UA) headers - https://phabricator.wikimedia.org/T319423 (10Joe) FWIW we're banning more generic UAs via dynamic requestctl rules; our rule of thumb is to start rate-limiting requests from a specific UA only when it s... [08:26:33] Morning :) [08:56:52] good morning [09:22:16] hello folks [09:33:21] <_joe_> elukey: good morning [09:33:33] <_joe_> but I guess you're not here to have a casual chat :) [09:36:08] nono it was just a simple "hello" [09:36:14] I don't have anything weird to raise :D [09:36:19] (yet) [09:36:31] hey o/ [09:36:37] today I am going to test istio 1.15.3 on minikube so I may report some horrors :D [09:42:01] <_joe_> indeed [09:42:04] <_joe_> brave soul [10:03:44] maybe it is trivial, but does anybody know where we store the config for https://docker-registry.wikimedia.org/pause ? [10:04:22] looking into gerrit/phab/etc.. didn't lead to meaningful results, only some old posts from Yuvi working on it for cloud [10:05:26] <_joe_> elukey: I think akosiaris imported it directly [10:05:44] <_joe_> it's a container that's built from scratch, but anyways alex will remember [10:05:51] <_joe_> elukey: why are you asking? [10:07:33] _joe_ me and Janis are wondering if we need any update to it for 1.23 [10:08:41] <_joe_> elukey: go on and import it :) [10:08:47] I don't think we totally require it. But if we can't figure out where it came from it might be a good reason to find out now :) [10:09:08] <_joe_> jayme: I'm pretty sure we imported it from gcr.io [10:09:46] <_joe_> gcr.io/google_containers/pause-amd64 is the current name IIRC [10:09:54] _joe_ we could also explicitly add it to production-images if it is not a big one [10:10:14] <_joe_> elukey: it's not debian based, it's FROM scratch IIRC [10:11:16] sure sure I mean we can explicitly build the (I guess) Go binaries plus add a docker file etc.. [10:11:21] like we do with the rest [10:12:40] <_joe_> https://hub.docker.com/layers/jenner/k8s.gcr.io-pause/3.1/images/sha256-fcaff905397ba63fd376d0c3019f1f1cb6e7506131389edbcb3d22719f1ae54d?context=explore [10:12:52] <_joe_> ok [10:13:12] elukey: I just rebased https://gerrit.wikimedia.org/r/c/operations/puppet/+/855012/2 - if you have a minute. Should be the usual "kind of noop" [10:15:35] 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Priority Backlog 📥): Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push - https://phabricator.wikimedia.org/T322453 (10JMeybohm) >>! In T322453#8381008, @dduvall wrote: > @JMeybohm can you provid... [10:18:54] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Support multiple kubernetes versions with puppet - https://phabricator.wikimedia.org/T278329 (10JMeybohm) 05Open→03Resolved a:03JMeybohm We not have exact major.minor version as an enum type puppet and with dedicated apt components [10:18:59] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [10:21:04] 10serviceops, 10Machine-Learning-Team: Fix calico, cfssl-issuer and knative-serving Helm dependencies - https://phabricator.wikimedia.org/T303279 (10JMeybohm) Fixed for cfss-issuer in chart version 0.3.0 [10:22:01] 10serviceops, 10CFSSL-PKI, 10Infrastructure-Foundations, 10Prod-Kubernetes, and 2 others: Update cfssl-issuer to cert-manager 1.8.x - https://phabricator.wikimedia.org/T310486 (10JMeybohm) 05Open→03Resolved [10:22:06] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [10:22:48] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Define priorityClassName for istio and cert-manager deployments - https://phabricator.wikimedia.org/T310618 (10JMeybohm) p:05Triage→03Low [10:23:42] I honestly don't remember how the pause container was added [10:25:34] akosiaris: I'll try to import the pause image to production-images if everybody agrees, so we have it there [10:27:15] <_joe_> elukey: please do [10:27:43] <_joe_> I'm not sure where the go sources for pause are available [10:29:07] super I'll open a task after the Istio fun [10:29:55] jayme: +1ed, nice! Are you going to deprecate the calico-future component as well? [10:30:36] oh, yes yes. Let's stick with the same versioning system for everything [10:34:54] <_joe_> elukey: the pause code is C, not go. https://github.com/kubernetes/kubernetes/tree/master/build/pause/linux [10:37:11] ahh okok [10:37:16] https://github.com/kubernetes/kubernetes/blob/master/build/pause/linux/pause.c#L67 - this is great [10:37:24] hmmm, it could be a result of the building process then. I probably built the debian packages back then (it anyway requires docker) and imported it [10:48:04] <_joe_> akosiaris: what is? [10:49:51] the pause container ima ge [10:49:53] image* [10:50:15] you probably end up with it in the docker image repo after doing the kubernetes build process [11:23:09] jayme: Do you know if it's possible to reference a service's name in one of its values.yaml files ? [11:25:05] Example would be https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855535/1/helmfile.d/services/mw-web/values.yaml#5 if we could move that to https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855535/1/helmfile.d/services/_mediawiki-common_/global.yaml and do something like servergroup: kube-${servicename} [11:39:07] _joe_: question, the patch you are about to merge, will it require changes to existing charts? [11:39:18] I am trying to coordinate some future work, that is all [11:40:15] <_joe_> effie: no, not immediately [11:40:31] <_joe_> but new stuff should start using it i guess [11:41:01] ok next question, if we have a new chart in the works, do we recomment they start from scratch and avoid any future edits ? [11:41:16] (after the patch is merged) [11:41:50] <_joe_> depends on where you are in the process [11:41:57] very very early [11:42:02] <_joe_> if you just ran scaffolding, probably it's better to re-run it [11:42:15] lovely, thank you [12:01:32] claime: I guess that depends in what you mean by $servicename [12:02:49] <_joe_> uhm I have a BIG doubt. How do I tell helmfile to only deploy to the canaries? [12:04:06] <_joe_> I thought --state-values-set 'releases=["canary"]' would work [12:05:28] --selector name=canary [12:06:09] <_joe_> sigh [12:06:11] <_joe_> yes [12:06:13] <_joe_> thanks [12:06:57] jayme: yeah, currently the closest is Release.Namespace. I can push it through a set: in the helmfile, but apparently can't use it in the yaml, even using yaml.gotmpl (or at least it didn't work when I tried) [12:08:08] you can't use it in the values.yaml but you could set php.servergoup directly in helmfile.yaml I guess [12:08:25] Yes [12:09:04] Apparently there's some https://helmfile.readthedocs.io/en/latest/#values-files-templates but I haven't managed to make it work yet [12:09:22] Probably a version issue, or just me not understanding the doc well [12:09:38] you don't need the values.yaml at all IIUC [12:12:15] if I'm not mistaken you can do somthing like: [12:14:30] releases: [12:14:32] - name: foo [12:14:33] set: [12:14:34] - name: php.servergroup [12:14:36] value: {{ .Release.Namespace }} [12:15:37] Yes [13:26:17] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Switch to cgroup v2 and systemd as cgroup driver for docker and kubelet - https://phabricator.wikimedia.org/T313473 (10JMeybohm) a:03JMeybohm [13:26:48] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Reserve resources for system daemons on kubernetes nodes - https://phabricator.wikimedia.org/T277876 (10JMeybohm) [13:26:52] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [13:55:02] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Reserve resources for system daemons on kubernetes nodes - https://phabricator.wikimedia.org/T277876 (10JMeybohm) This might be helpful to calculate initial values: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#memory_cpu [13:57:13] jayme: o/ qq - is there a quick way to generate the Chart.lock file or is it manual? (I'd like to fix the knative chart so we can close the task) [14:01:17] elukey: helm dependency update [14:01:59] ah so I fix the Chart config and run it [14:06:38] yep. And maybe you will run into issues with CI. That's because I committed the chart.tgz as well [14:08:09] so I ran helm3 package charts/knative-serving-crds originally to get the .tgz, but now if I ran helm3 dependency update charts/knative-serving I get a failure [14:08:19] it tries to look for charts/knative-serving/charts/knative-serving-crds [14:10:33] ah, yeah...I remember. I'm in a meeting currently, can I come back to you in ~30m? [14:10:49] sure sure [14:11:48] the tgz you get from running helm dependency update as well, if the dependency is not a local URI [14:19:51] ack yes [14:27:30] elukey: do you have a CR already for this? [14:28:05] as first step I'd remove the "repository:" field for the dependency in knative-serving and then run "helm dependency update [14:28:37] jayme: yeah I did that but nothing has been really generated (I ran it from the main dir of the repo though) [14:29:09] not sure if I have to initialize a helm3 repo in some way, never done it locally [14:31:35] ah, yeah..I now remember 🤦 [14:32:57] kind of... [14:33:35] IIRC "repository" needs to be specified or else helm just tries to look the dependency up in the ./charts/ directory [14:34:24] but you can't specify "file://" because we block that in CI [14:36:17] (will try in a bit) [14:41:23] so "repository: https://helm-charts.wikimedia.org/stable" is the way to go I think and that should also pass CI [14:42:08] if you're up to it you could just commit Chart.yaml and Chart.lock and we merge https://gerrit.wikimedia.org/r/c/operations/docker-images/docker-report/+/826859 to see if it DTRT [15:08:40] <_joe_> jayme: do we have a dashboard showing the available/reserved memory/cpu in k8s? [15:17:58] <_joe_> ok everyone: I merged the new scaffolding structure; I'll post soon enough(TM) a rake task to help with basic conversion of our charts [15:23:56] hello, for https://gerrit.wikimedia.org/r/c/operations/puppet/+/855059, I just need to restart keyholder service, is that correct? [15:24:06] just granting a new group access to a key [15:25:14] <_joe_> 301 #sre [15:25:16] <_joe_> :P [15:25:29] <_joe_> no I think you also need to rearm keyholder [15:33:07] jayme: back sorry, you are saying "repository: https://etc.." in the Chart.yaml? And then helm update dep ? [15:42:16] yea [15:43:57] _joe_: do you mean in terms of requests from pods vs. usable? [15:44:09] yeah [15:44:10] <_joe_> jayme: yes [15:45:11] no, I don't think so [15:49:59] I'm not even sure that allocatable resources is exposed currently [15:50:58] probably something that could be covered with kube-state-metrics [16:54:38] <_joe_> jayme: kubectl describe node gives you that information [17:02:54] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Import istio 1.1x (k8s 1.23 dependency) - https://phabricator.wikimedia.org/T322193 (10elukey) I was able to deploy main/ml istio configs on minikube today, and both worked after some tweaks: * I had to apply the following diff to avoid er... [17:03:09] preliminary tests with istio 1.15.3 look good :)