[06:37:56] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Joe) >>! In T330507#8735183, @Ottomata wrote: > @Joe we disc... [07:29:22] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Jelto) [07:33:46] good morning :) [07:33:54] going to dist-upgrade kafka-main2003 [07:35:19] <_joe_> 😱 [07:35:31] <_joe_> 🏃 [07:35:31] thanks for the support <3 [07:35:38] * _joe_ afk [07:35:42] (3 nodes to go, after that I'll be done) [07:36:21] 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10ayounsi) [07:36:44] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ayounsi) 05Open→03Resolved a:03ayounsi Thanks again everybody! [07:58:33] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Migrate charts away from deprecated topology annotations - https://phabricator.wikimedia.org/T325066 (10JMeybohm) [08:04:47] kafka-main2003 up with bullseye, cluster is recovering [08:05:03] (two nodes to go, will do them later on) [08:26:22] o/ I've been trying to add the envoy sidecar container (only for listeners, not to expose a public port) after a couple of patches I always forget something (i.e. last thing I missed is the configmaps to expose envoy config) [08:27:25] so far I added the container with mesh.deployment.container, the volume with mesh.deployment.volume, the configmap with mesh.configuration.configmap, the egress rules with mesh.networkpolicy.egress [08:28:43] and now wondering if I'm missing something, esp mesh.name.annotations or if that is included in some other ways [08:33:56] <_joe_> dcausse: that seems all, the annotations are usually included in the deployment annotations to make sure it restarts on a configmap change [08:34:30] _joe_: thanks! [08:35:34] <_joe_> dcausse: I hope the modules are understandable as a structure, if you struggle to understand how to use something please let me know [08:35:38] <_joe_> I want to improve the UX [08:36:19] ah now I see mesh.name.annotations is included from base.meta.pod_annotations [08:38:34] _joe_: I think they are understandable, but for someone like me still not very familiar with all the k8s concepts it's easy to miss something [08:39:25] <_joe_> dcausse: yeah I would love to get to a point where you can run something like [08:39:35] <_joe_> sextant add-module mesh charts/mychart [08:39:47] <_joe_> and that will automatically add the right stuff at the right places [08:40:00] <_joe_> but there's only so much we can do to mask complexity :) [08:41:29] that would be nice indeed, but sound complex when I see all the ifs I had to add in the chart to support various scenario [09:28:51] folks I created https://gerrit.wikimedia.org/r/c/operations/puppet/+/904062 to ease the process of working with Redis misc nodes [09:28:59] if it is not a good idea I'll abandon :) [09:29:24] I'll also create a task to update wikitech with the procedure to rollout a new Redis key [09:29:39] with gotchas etc.., so in the future we'll be more ready [09:37:10] 10serviceops: Update the Wikitech page of the Redis misc cluster with a sound procedure to rollout a new password - https://phabricator.wikimedia.org/T333432 (10elukey) [09:37:18] created also --^ [09:44:15] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fgiunchedi) [11:00:18] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) `mw-api-int` and `mw-api-int-ro` services now in production, we can proceed with creating the envoy listeners in https://gerrit.wikimedia.org/r/c/operat... [11:04:48] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) [12:27:08] Hey any objections to deploy latest version of restbase now ? [12:39:33] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Post Kubernetes v1.23 cleanup - https://phabricator.wikimedia.org/T328291 (10JMeybohm) [12:39:35] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Migrate charts away from deprecated topology annotations - https://phabricator.wikimedia.org/T325066 (10JMeybohm) 05Open→03Resolved I've removed the puppet code responsible for registering the labels and unlabeled all nodes in all clusters accordingly ` kub... [12:40:57] 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration, 10Platform Team Workboards (Platform Engineering Reliability): Thumbor-k8s performance improvements - https://phabricator.wikimedia.org/T333445 (10hnowlan) [12:50:58] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Post Kubernetes v1.23 cleanup - https://phabricator.wikimedia.org/T328291 (10JMeybohm) [12:51:36] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) > Is that correct? Correct! [13:15:11] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10JMeybohm) >>! In T330507#8737310, @Joe wrote: > if that's th... [13:40:07] hi folks [13:40:12] going to upgrade kafka-main2002 [13:40:49] ack [13:44:31] akosiaris: o/ when you have a moment https://gerrit.wikimedia.org/r/c/operations/puppet/+/904062 [13:54:21] 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration, and 2 others: Thumbor-k8s performance improvements - https://phabricator.wikimedia.org/T333445 (10akosiaris) [14:05:29] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) [14:07:47] 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration, and 2 others: Thumbor-k8s performance improvements - https://phabricator.wikimedia.org/T333445 (10akosiaris) I 've updated a bit the Thumbor dashboard. Aside from some performance changes (e.g. collapsing most rows by default) the main diff is adding 2 v... [14:13:41] why do we even have the replicas tbh [14:13:53] why do we even have persistence in fact... [14:14:04] questions for another time [14:25:04] :D [14:25:10] kafka-main2002 up with bullseye [14:25:14] only one left on buster [14:25:44] elukey: gg! [14:31:52] we're getting [critical][main] [source/server/server.cc:113] error initializing configuration '/etc/envoy/envoy.yaml': Invalid path: /etc/envoy/ssl/ca.crt for the envoy sidecar [14:32:12] note that we don't set .Values.mesh.public_port (we only need listeners) [14:43:26] wondering if the mesh.configuration.configmap template should be exposing the puppet_ca file even if public_port is not set [15:02:10] we should probably also start using the wmf-certificate package, so the ca bundle contains the PKI root ca cert and not the puppet ca one only [15:05:56] going to move kafka-main1001 to bullseye [15:05:58] last one [15:10:54] 10serviceops, 10SecTeam-Processed, 10Security, 10Vuln-Infoleak: changeprop-jobqueue password leaked on phabricator - https://phabricator.wikimedia.org/T332598 (10sbassett) [15:35:59] 10serviceops: Migrate kafka-main to bullseye - https://phabricator.wikimedia.org/T332013 (10elukey) 05Open→03Resolved a:03elukey ` elukey@cumin1001:~$ sudo cumin 'A:kafka-main' 'cat /etc/debian_version' 10 hosts will be targeted: kafka-main[2001-2005].codfw.wmnet,kafka-main[1001-1005].eqiad.wmnet OK to pro... [15:35:59] kafka main clusters both on bullseye :) [15:37:30] elukey: from you I would not have expected that... [15:38:03] 'A:kafka-main and A:bullseye' :D [15:38:16] and 0 with A:buster [15:38:39] * elukey stares at Riccardo [15:38:59] >_> [15:39:09] well done elukey <3 [16:02:34] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Epic: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) [16:04:15] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) [16:04:21] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) [16:04:24] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Epic: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) [16:04:31] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) [16:04:38] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) Done: {T333464} [16:04:45] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) [16:05:36] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) [16:22:52] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Epic, 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) [16:24:54] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Epic, 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) [16:28:06] 10serviceops, 10Data-Engineering-Planning, 10Epic, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) [16:43:16] I'd like to pool thumbor-k8s in codfw again to 50/50 metal/k8s briefly to test something, any objections? I expect it to be in for about 10 minutes [16:52:13] <_joe_> hnowlan: go, and let confctl log the action [17:00:11] nice improvements to the dashboard akosiaris! [17:40:18] 10serviceops, 10SRE, 10Traffic, 10VPS-project-Codesearch, 10Patch-For-Review: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10BCornwall) [17:55:14] 10serviceops, 10Patch-For-Review, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Priority Backlog 📥): Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push - https://phabricator.wikimedia.org/T322453 (10dduvall) I was able to reproduce the problem locally a... [18:12:49] 10serviceops, 10Keyholder, 10SRE, 10VPS-project-Codesearch, 10Patch-For-Review: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10BCornwall) [18:13:41] 10serviceops, 10Keyholder, 10SRE, 10VPS-project-Codesearch, 10Patch-For-Review: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10BCornwall) Removing the Traffic team as our services have been rolled out with the change.... [18:13:54] 10serviceops, 10Keyholder, 10SRE, 10VPS-project-Codesearch, 10Patch-For-Review: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10BCornwall) 05In progress→03Open [18:14:55] <_joe_> hnowlan: how did the test go? [18:15:24] 10serviceops, 10Keyholder, 10SRE, 10VPS-project-Codesearch, 10Patch-For-Review: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10BCornwall) a:05BCornwall→03None [18:22:10] 10serviceops, 10SRE, 10Traffic-Icebox, 10conftool: confd's watch functionality appears to be partially broken when interacting with etcd 3.x - https://phabricator.wikimedia.org/T260889 (10BCornwall) @Joe, thank you for all the work on this ticket! Would you say that this is resolved since the CRs have all... [18:22:26] 10serviceops, 10SRE, 10Traffic-Icebox, 10conftool: confd's watch functionality appears to be partially broken when interacting with etcd 3.x - https://phabricator.wikimedia.org/T260889 (10BCornwall) 05Open→03Stalled a:03Joe [18:29:02] 10serviceops, 10Infrastructure-Foundations, 10PyBal, 10SRE, 10SRE-tools: Applications and scripts need to be able to understand the pooled status of servers in our load balancers. - https://phabricator.wikimedia.org/T239392 (10BCornwall) [18:43:52] _joe_: not much improvement unfortunately