[07:33:13] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Joe) [07:56:01] hello folks [07:56:30] I am checking something about changeprop, and I noticed that its consumer groups seem to be all zookeeper-based (the old ones, not really supported anymore..) [07:56:35] does it ring a bell? [07:57:04] (I mean if in the past anybody raised the concern and there was an attempt to upgrade the consumer groups) [08:07:08] <_joe_> elukey: there was no attempt [08:07:33] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [08:07:34] <_joe_> and I assumed they were still the main mechanism on our kafka version [08:08:16] in theory no, now the consumer groups have a special topic to commit their offsets to, without the need of zookeeper [08:08:40] I don't find the in code/config the specific bits of how the old consumer is set though [08:09:24] I am asking since if we want to upgrade kafka in the future, this may need to be fixed [08:21:03] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [08:31:18] I am probably wrong, I don't find clear evidence of the zookeeper usage in code etc.. [08:31:44] kafka consumer-groups --list returns an empty list on kafka main though [08:31:47] weird [08:57:04] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fgiunchedi) [09:13:20] 10serviceops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 2 others: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10ayounsi) I pondered multiple options for the Netbox `server_bgp` custom field, feedback from ServiceOps welcom... [09:43:49] folks if you are ok I'd try to dist-upgrade kafka-main1003 to bullseye [09:44:10] with the procedure in https://phabricator.wikimedia.org/T332013#8724091 [09:50:00] _joe_ --^ :) [09:50:47] <_joe_> elukey: makes sense [09:51:53] <3 [10:02:12] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10hnowlan) [10:02:32] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ArielGlenn) [10:03:15] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10hnowlan) [10:06:14] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [10:06:26] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 4 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium a:03Clement_Goubert [10:12:03] kafka-main1003 rebooting in bullseye [10:19:31] <_joe_> \o/ [10:19:57] <_joe_> still didn't reboot AFAICT :P [10:20:33] yes yes the /etc/network/interfaces was not right [10:21:00] <_joe_> oh right, "predictable" network interfaces [10:21:12] <_joe_> vgutierrez: ^^ I know you're another fan [10:21:13] still is sigh [10:21:27] _joe_: huge fan indeed [10:22:00] <_joe_> in fact, it's on us. They said they'd be "predictable", not "undertandable" or "stable" [10:22:37] ok now it should be better [10:22:39] predictable as long as you don't upgrade the kernel or systemd [10:24:33] kafka is recovering [10:24:43] totally forgot about this madness, I'll add it to the guide [10:26:24] _joe_ seems all good from my pov, do you have a min to double check? [10:27:02] <_joe_> elukey: gimme 5 mins [10:31:55] 10serviceops: Migrate kafka-main to bullseye - https://phabricator.wikimedia.org/T332013 (10elukey) kafka-main1003 up and running with bullseye after a dist-upgrade. The only thing not listed to do is to update `/etc/network/interfaces` since with the upgrade of systemd we have renamings etc.. (so by default aft... [10:39:27] <_joe_> elukey: seems allright to me [10:40:16] thanks for checking :) [10:40:24] going afk for lunch, so far all metrics are good [10:42:18] <_joe_> jayme: I was thinking we could merge https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/902078/ in the afternoon if you don't have any objections [11:02:38] If anyone was feeling adventurous today I'd like to try rolling out this soon https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/899654 [11:05:27] hnowlan: memory: 750Gi << O_O [11:06:26] <_joe_> that's like 5 modern servers :) [11:06:33] lmao [11:07:00] <_joe_> yeah our kube workers have 128 gb of ram [11:07:16] <_joe_> so that's 6 workers worth of RAM [11:07:29] fixed ;D [11:08:10] :D [11:08:14] oops :) [11:08:29] https://frinkiac.com/meme/S08E04/710959/m/VGh1bWJvciAqQ09VTEQqIHVzZSA3NTBHaQpwZXIgcG9k [11:08:39] _joe_: I don't have any [11:08:56] hnowlan: x) [11:11:01] 10serviceops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 2 others: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10cmooney) Personally I think it's a big conceptual change to introduce a second separate automation-pipeline fo... [11:13:23] 10serviceops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 2 others: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10cmooney) On the Netbox side I'm happy with the current status, or having it as a dropdown. I think it's good... [11:34:15] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Jelto) [11:48:15] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10Clement_Goubert) 05Open→03Resolved [11:48:20] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10Clement_Goubert) [13:56:00] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10JJMC89) [14:07:15] hnowlan: how is a 200Gi worth of ram pod scheduled on wikikube? Am I reading it right? [14:08:47] ah no wait it is resource quotas [14:08:59] yeah [14:09:02] okok now it makes sense :D [14:09:05] pebkac :) [14:12:27] my own pebkac was trying to assign 750Gi * 7 to each pod in the earlier patchset :D [14:15:27] ahahah yes [14:15:28] :D [14:15:38] Download more RAM hnowlan [14:17:10] RAM doubler was never ported to linux :( [14:57:32] 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Trizek-WMF) [14:57:44] 10serviceops, 10SRE, 10CommRel-Specialists-Support (Jan-Mar-2023), 10Datacenter-Switchover: CommRel support for March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T328287 (10Trizek-WMF) 05In progress→03Resolved A post-action document has been created. There is nothing special to highl... [15:08:55] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) > why is kafka-main a better fit than kafka-jumbo?... [15:12:39] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) Also, clearly we will not be ready to deploy this... [15:38:24] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: kubernetes[2023-2024].codfw.wmnet,kubernetes[1023-1024].eqiad.wmnet are using devicemapper instead of overlay2 - https://phabricator.wikimedia.org/T332803 (10JMeybohm) 05Open→03Resolved a:03JMeybohm `profile::docker::engine` now forces overlay2 storage_dri... [15:51:25] <_joe_> omg ram doubler [15:51:30] <_joe_> what memories [15:54:06] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) [17:20:35] 10serviceops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 2 others: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10ayounsi) > Aside from duplication of code what are the blockers to having the Kubernetes groups also in Homer?... [18:05:47] actually I think linux does have something like ram doubler if you dig for it :) [18:07:00] zram + zswap [19:18:52] <_joe_> bblack: https://www.amazon.com/Connectix-Ram-Doubler/dp/B000JWFY50 sadly currently unavailable [19:21:52] lol [19:22:29] QEMM had similar stuff for old x86/dos-ish machines [19:22:56] https://en.wikipedia.org/wiki/QEMM#MagnaRAM [19:27:43] <_joe_> haha the name is amazing [19:29:37] <_joe_> "magna" in Rome's vernacular means "eat!" (imperative) or "eater" - so that reads "ram eater" [19:29:50] <_joe_> which seems very descriptive of its function [19:30:12] <_joe_> bblack: I was reading about https://en.wikipedia.org/wiki/SoftRAM instead [19:32:13] nice one! [19:32:22] "placebo software" [22:09:52] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Volans) >>! In T330165#8731601, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://sal.toolforge.org/log... [23:17:31] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10colewhite)