[00:00:34] mutante: have a good vacation! [00:08:01] brennen: thank you:)) I will actually sign out from IRC in a while. But you can find my number on office wiki on the contact page if needed! cheers [00:08:19] i will endeavor to not do that. :) [00:08:31] ;) [01:40:15] 10serviceops, 10SRE, 10Traffic, 10envoy, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10RLazarus) [01:46:55] 10serviceops, 10SRE, 10Traffic, 10envoy, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10RLazarus) 1.15.4 is still running in a few places on k8s -- after bumping the default version, I rolled out all services where that was the only diff. Some servic... [01:49:07] 10serviceops, 10SRE, 10Traffic, 10envoy, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10RLazarus) [07:23:14] good morning folks [07:23:23] opened the super controversial https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/767924 for istio's install-cni [07:23:32] so we can discuss it and see what's best [07:23:51] (I'll keep researching today for alternatives) [08:00:28] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10SRE, 10Maps (Geoshapes), and 2 others: New Service Request geoshapes - https://phabricator.wikimedia.org/T274388 (10MSantos) >>! In T274388#7751113, @akosiaris wrote: >>>! In T274388#7744335, @MSantos wrote: >>> Set up the traffic layer to send traffic... [08:15:41] 10serviceops, 10decommission-hardware: decommission rdb100[56].eqiad.wmnet - https://phabricator.wikimedia.org/T273139 (10akosiaris) [08:15:47] 10serviceops, 10decommission-hardware: decommission rdb100[56].eqiad.wmnet - https://phabricator.wikimedia.org/T273139 (10akosiaris) 05Stalled→03Open [08:24:36] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10SRE, 10Maps (Geoshapes), and 2 others: New Service Request geoshapes - https://phabricator.wikimedia.org/T274388 (10akosiaris) >>! In T274388#7752324, @MSantos wrote: >>>! In T274388#7751113, @akosiaris wrote: >>>>! In T274388#7744335, @MSantos wrote:... [08:30:43] 10serviceops, 10decommission-hardware: decommission mw130[2-6].eqiad.wmnet - https://phabricator.wikimedia.org/T303027 (10akosiaris) [08:30:51] 10serviceops, 10decommission-hardware: decommission mw130[2-6].eqiad.wmnet - https://phabricator.wikimedia.org/T303027 (10akosiaris) p:05Triage→03High [09:12:43] 10serviceops, 10decommission-hardware: decommission rdb100[56].eqiad.wmnet - https://phabricator.wikimedia.org/T273139 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by akosiaris@cumin1001 for hosts: `rdb[1005-1006].eqiad.wmnet` - rdb1005.eqiad.wmnet (**PASS**) - Downtimed host on Icinga... [11:30:59] Good morning. I'm looking for guidance on how best to progress with this deployment chart for datahub: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/764375 [11:31:47] It runs locally on minikube, but I'm wondering whether it's ready for me to merge and begin working with helmfile and the staging cluster. [12:49:40] btullis: I thing this now needs a review from one of us (as in serviceops) as well as a service-deployment request (if that does not already exist): https://phabricator.wikimedia.org/project/profile/1305/ [12:50:55] I can take a look at all the info you added (thx) at https://phabricator.wikimedia.org/T301454 on monday I guess [13:01:16] 10serviceops, 10Data-Engineering, 10Event-Platform, 10Sustainability (Incident Followup): eventgate-* tls telemetry is disabled - https://phabricator.wikimedia.org/T303042 (10JMeybohm) [13:24:16] jayme: Thanks ever so much. I'll create a service deployment request now. [13:25:15] btullis: nice. From skimming your last comments it seems that they contain a lot of information useful to the service request as well (like what is expected to get traffic etc.) [13:31:24] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) [13:32:13] 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review: setup/install kubernetes20[1(89)|2(012)] - https://phabricator.wikimedia.org/T302208 (10JMeybohm) [13:32:18] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) [13:33:35] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: setup/install kubernetes10[18-22] - https://phabricator.wikimedia.org/T293728 (10JMeybohm) [13:33:39] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) [13:48:25] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) [13:49:43] 10serviceops, 10decommission-hardware, 10Patch-For-Review: decommission mw130[2-6].eqiad.wmnet - https://phabricator.wikimedia.org/T303027 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by akosiaris@cumin1001 for hosts: `mw[1302-1306].eqiad.wmnet` - mw1302.eqiad.wmnet (**PASS**) - Downtim... [13:50:26] 10serviceops, 10decommission-hardware, 10Patch-For-Review: decommission mw130[2-6].eqiad.wmnet - https://phabricator.wikimedia.org/T303027 (10akosiaris) [13:52:24] 10serviceops, 10decommission-hardware: decommission kubernetes100[1-4] - https://phabricator.wikimedia.org/T303044 (10JMeybohm) 05Open→03Stalled [13:52:26] 10serviceops, 10decommission-hardware: decommission kubernetes200[1-4] - https://phabricator.wikimedia.org/T303045 (10JMeybohm) 05Open→03Stalled [13:53:10] 10serviceops, 10decommission-hardware: decommission kubernetes200[1-4] - https://phabricator.wikimedia.org/T303045 (10JMeybohm) [13:53:14] 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review: setup/install kubernetes20[1(89)|2(012)] - https://phabricator.wikimedia.org/T302208 (10JMeybohm) [13:53:18] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) [13:53:28] 10serviceops, 10decommission-hardware: decommission kubernetes100[1-4] - https://phabricator.wikimedia.org/T303044 (10JMeybohm) [13:53:32] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: setup/install kubernetes10[18-22] - https://phabricator.wikimedia.org/T293728 (10JMeybohm) [13:53:36] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) [13:54:39] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) [13:56:07] 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review: setup/install kubernetes20[1(89)|2(012)] - https://phabricator.wikimedia.org/T302208 (10JMeybohm) 05Open→03Resolved a:03JMeybohm parent and decom tasks created/updated, closing this [13:56:11] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) [13:56:16] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw, 10Kubernetes: (Need By: TBD) rack/setup/install kubernetes20[19|2(012)] - https://phabricator.wikimedia.org/T299470 (10JMeybohm) [13:56:55] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) [15:00:01] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Seen): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Majavah) [15:02:25] jayme: Many thanks again. I've created that service request: https://phabricator.wikimedia.org/T303049 [15:02:55] I'll add some comments on my own patch which might help with the context. [15:11:01] 10serviceops, 10Data-Catalog, 10Data-Engineering, 10SRE, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) [15:11:36] 10serviceops, 10Data-Catalog, 10Data-Engineering, 10SRE, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) How can I tell what the source IP address(es) of my services will be, as seen by the back-end data stores? Will these be predicatabl... [15:18:42] 10serviceops, 10Data-Catalog, 10Data-Engineering, 10SRE, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) The diagram doesn't cover prometheus support, but it is included. I have added: `prometheus.io/port: 4318` and `prometheus.io/scrap... [15:25:32] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Seen): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Krinkle) [16:27:31] Has anyone got any ideas why my helm-linter isn't working in Jenkins? I get this: [16:27:35] https://www.irccloud.com/pastebin/ePHJkQfP/ [16:28:13] Happens every time: https://integration.wikimedia.org/ci/job/helm-lint/6898/console from https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/764375 [16:37:00] btullis: is there any use of "match" inside the chart? [16:37:27] it should give the same error if you run rake locally though [16:37:38] rake run_locally['default'] [16:37:57] there may be something.match and something is ni [16:37:59] *nil [16:40:23] Yes are quite a few `match:`but most of these are in the common-templates, which are just symlinks to files that I haven't modified. There are some `matchLabels:` in the files that I've added though... [16:41:46] btullis: one thing that I usually do is to use the helm3 binary (you can download it easily) like the following [16:41:56] helm3 lint 'charts/name-of-the-chart' [16:42:01] I have run that rake task and can verify that it fails locally too. Thanks. [16:42:44] and helm3 template 'charts/name-of-the-chart' [16:43:07] in theory the first should pass and the latter should give you the yaml config without emitting errors [16:43:22] if they return any issue there is something not set properly [16:43:43] (in theory the chart should be self sufficient and work with the base values.yaml shipped) [16:43:52] btullis: --^ [16:44:00] Oh that `helm tmplate` command is really useful. I didn't know about that. Yes, both of those pass. [16:44:31] interesting [16:46:17] I wonder it it's the `helmfile lint`? [16:47:45] Looks promising. [16:47:50] https://www.irccloud.com/pastebin/CY3zF3iE/ [16:48:36] yeah you still don't have any helmfile config, this is just the chart no? [16:49:04] ah no there is also the helmfile confgi [16:49:44] Yes, but only a basic one. I've added the production values (but no secrets) to it. [16:51:12] btullis: it shouldn't be the helmfile config, but a quick way to check it is to remove the helmfile configs and run rake locally [16:51:27] (it seems to be just helm lint itself with the chart) [16:52:39] Agreed. I just tried removing helmfile.d/services/datahub and re-running. No difference. [16:55:37] I tried to download the patch and I don't find a 'match' in the code, only matchLabels, so I am wondering if helm-lint likes the sub-charts [17:02:03] btullis: there is a match inside the Rakefile, it is probably the culprit [17:02:54] mmmm or not weird, it looks working [17:06:29] elukey: Thanks. It makes sense that this should crash it though, I think. I've specified dependencies that don't have `repository` keys. They just exist as directories within charts/ and these are found automatically by helm. [17:06:58] btullis: I was able to run the linter with --trace [17:07:01] and it stops at [17:07:01] NoMethodError: undefined method `match' for nil:NilClass [17:07:01] /src/Rakefile:53:in `block (3 levels) in ' [17:07:29] so yes it is the repository [17:11:31] Great, thanks. So I wonder if the right thing is to put a conditional statement into that function? Or do I *actually* need to use an http URL I wonder? [17:14:32] btullis: look at charts/knative-serving/Chart.yaml, this is what I have done it [17:14:40] and IIRC other charts do the same [17:16:49] OK, cool. Thanks. I don't understand how that doesn't raise the exception though, because it doesn't match http:// [17:16:50] Anyway, will give it a go. [17:23:06] Oh: https://helm.sh/docs/topics/charts/#chart-dependencies [17:23:07] > These dependencies can be dynamically linked using the dependencies field in Chart.yaml or brought in to the charts/ directory and managed manually. [17:23:32] Maybe I'm trying to do both here and I can remove them from the top level Chart.yaml [17:24:19] btullis: good point yes [17:27:05] btullis: I just realized that in the knative-serving chart I have "dependecies" [17:27:12] * elukey cries in a corner [17:27:29] ahahaha calico and cfssl-issuer as well [17:28:02] :-) I lost 30 minutes over 'prequisites` yesterday. [17:30:17] wow what a long standing bug