[07:57:45] 06serviceops: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#10697400 (10elukey) @Dzahn Thanks for the reports! I quickly checked all the github reports and it seems that those people got errors when pushing the image, while our issue is that we cannot pull it f... [09:44:45] 06serviceops, 10Maps: Upgrade maps servers to bullseye - https://phabricator.wikimedia.org/T327513#10697660 (10elukey) 05Open→03Invalid Please see T381565 [09:45:27] 06serviceops, 10Maps: tegola-vector-tiles: load balancing reads between postgres servers - https://phabricator.wikimedia.org/T286494#10697668 (10elukey) 05Open→03Resolved a:03elukey This has already been implemented :) [09:59:30] 06serviceops, 10Maps (Kartotherian): kartotherian CI tests failing missing debian stretch - https://phabricator.wikimedia.org/T336680#10697728 (10elukey) 05Open→03Resolved a:03elukey We are now on Bookworm and k8s, the issue is not there anymore. [10:02:32] 06serviceops, 07ci-test-error, 10Maps (Kartotherian): kartotherian CI tests failing missing debian stretch - https://phabricator.wikimedia.org/T336680#10697751 (10hashar) [10:05:20] 06serviceops, 10Discovery-Search (2025.03.22 - 2025.04.11): Migrate discovery-search jobs to mw-cron - https://phabricator.wikimedia.org/T388538#10697765 (10Clement_Goubert) That `xargs` invoc should still work under `mw-cron`, we would need to add `ts` to the image but that's about it. I tested it inside a `m... [10:11:09] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, 10SRE-tools: Create a cookbook to automate gerrit's switchover - https://phabricator.wikimedia.org/T260666#10697808 (10LSobanski) 05Duplicate→03Open This is separate from the activity in {T387833} so let's keep it open. [10:33:27] 06serviceops: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#10697895 (10akosiaris) scap backport worked fine this time around. Scott's right, this is a very infuriating Heisenbug. Recapping a bit: * We haven't seen this up until now before Friday 2025-03-28, a... [10:37:57] 06serviceops: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#10697903 (10elukey) >>! In T390251#10697895, @akosiaris wrote: > * The nginx cache being full as a theory is debunked by the timing of T390251#10695479, at which point both registries in codfw were in... [10:58:21] 06serviceops, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 3 others: Switch from restbase to api gateway for Citoid - https://phabricator.wikimedia.org/T361576#10697957 (10Mvolz) [10:58:50] 06serviceops, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 3 others: Switch from restbase to api gateway for Citoid - https://phabricator.wikimedia.org/T361576#10697960 (10Mvolz) [11:24:08] 06serviceops, 10Discovery-Search (2025.03.22 - 2025.04.11): Migrate discovery-search jobs to mw-cron - https://phabricator.wikimedia.org/T388538#10698040 (10Clement_Goubert) Ok, after a few test changes (escaping interleaved quotes through 3 template languages is no fun), I simplified the invocation for my tes... [13:36:49] 06serviceops: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#10699052 (10elukey) @Scott_French @dancy some people deployed today and there was no sign of the issue, we checked namespace events etc.. but everything looked good. As Alex mentioned, in /var/log/ngi... [13:41:53] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress): Rollout more wikis: week 3 - https://phabricator.wikimedia.org/T390724 (10Jgiannelos) 03NEW [13:42:11] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress): Rollout more wikis: week 3 - https://phabricator.wikimedia.org/T390724#10699125 (10Jgiannelos) [13:59:48] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress): Rollout more wikis: week 3 - https://phabricator.wikimedia.org/T390724#10699240 (10MSantos) [13:59:55] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress): Rollout more wikis: week 3 - https://phabricator.wikimedia.org/T390724#10699241 (10MSantos) LGTM. [14:06:31] 06serviceops, 13Patch-For-Review: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#10699281 (10elukey) Last famous words! The debug logging is too verbose and it fills up the root partition in some hours, so we cannot really leave it running. sigh. [14:17:39] 06serviceops, 10Discovery-Search (2025.03.22 - 2025.04.11): Migrate discovery-search jobs to mw-cron - https://phabricator.wikimedia.org/T388538#10699313 (10EBernhardson) The other important detail that `ts` adds is it prefixes the wiki name to the logs, with 4 of them running in parallel this is important to... [14:25:33] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress): Rollout more wikis: week 3 - https://phabricator.wikimedia.org/T390724#10699385 (10hnowlan) SGTM 🎉 [14:25:44] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress): Rollout more wikis: week 3 - https://phabricator.wikimedia.org/T390724#10699386 (10hnowlan) [14:47:46] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, and 2 others: Ensure all required kubectl versions are installed on deploy hosts - https://phabricator.wikimedia.org/T388388#10699474 (10jhathaway) @kamila This patch is now rolled out, so https://gerrit.wikimedia.org/r/c/opera... [15:02:35] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, and 2 others: Ensure all required kubectl versions are installed on deploy hosts - https://phabricator.wikimedia.org/T388388#10699573 (10kamila) Oh, thank you @jhathaway , excellent timing :D Much appreciated! [15:11:34] 06serviceops, 06Abstract Wikipedia team: Provide guidance on how to use apache bench to benchmark requests not through SSL for production services - https://phabricator.wikimedia.org/T390099#10699640 (10ecarg) Hi~ it's looking like we need some guidance on this :) how can we easily route to spare servers to pe... [15:12:52] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, and 2 others: Ensure all required kubectl versions are installed on deploy hosts - https://phabricator.wikimedia.org/T388388#10699648 (10kamila) p:05High→03Low The hard-coded hack works, so kubectl no longer complains. I'm... [15:14:58] 06serviceops, 10function-orchestrator, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 07OKR-Work: Wire up the Wikifunctions-specific memcached pool to the function-orchestrator service - https://phabricator.wikimedia.org/T390744 (10Jdforrester-WMF) 03NEW [15:17:23] 06serviceops, 10function-orchestrator, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 07OKR-Work: Wire up the Wikifunctions-specific memcached pool to the function-orchestrator service - https://phabricator.wikimedia.org/T390744#10699720 (10Jdforrester-WMF) p:05Triage→03Medium [15:20:36] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10699729 (10elukey) @Jhancock.wm the server is provisioned, please go ahead! [15:33:29] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Ensure all required kubectl versions are installed on deploy hosts - https://phabricator.wikimedia.org/T388388#10699811 (10jhathaway) @kamila I would expect @JMeybohm's patch, https://gerrit.wikimedia.org/r/c/op... [15:49:27] 06serviceops, 10function-orchestrator, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 07OKR-Work: Wire up the Wikifunctions-specific memcached pool to the function-orchestrator service - https://phabricator.wikimedia.org/T390744#10699920 (10Jdforrester-WMF) Note that this work was originally in scope for the... [15:52:43] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10699927 (10elukey) >>! In T384970#10699729, @elukey wrote: > @Jhancock.wm the server is provisioned, please go ahead! Taking it back, I think that... [16:15:11] 06serviceops, 06Abstract Wikipedia team: Provide guidance on how to use apache bench to benchmark requests not through SSL for production services - https://phabricator.wikimedia.org/T390099#10700043 (10akosiaris) >>! In T390099#10699640, @ecarg wrote: > Hi~ it's looking like we need some guidance on this :) h... [16:42:52] so.. wikikube has k8s-ingress-wikikube-ro and k8s-ingress-wikikube-rw DNS names in the repo in templates/mnet, but k8s-ingress-dse and k8s-ingress-ml has -staging and -serve instead. should the aux cluster copy one or the other? https://gerrit.wikimedia.org/r/c/operations/dns/+/1132699/1/templates/wmnet#907 [17:21:40] mutante: the dse ones [17:21:43] or the ml ones [17:21:49] wikikube is an exception here [17:22:33] that's the general rule, but here specifically, the -ro and -rw is specific to active/active and active/passive services [17:22:58] akosiaris: after talking more about it elsewhere we figured it's only a matter of time until a service wants to be on aux but needs active/passive .. which made us think we should go with -ro and -rw [17:23:06] which lead to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133176/4/hieradata/common/service.yaml [17:23:22] herron: cc^ [17:23:41] if you already have a use case, sure. [17:24:14] codesearch can be active/active but I guess at one point phabricator could be the one what needs active/passive [17:24:31] we were going back and forth while talking about it :) [19:05:52] 06serviceops, 13Patch-For-Review: Migrate mw-script to PHP 8.1 - https://phabricator.wikimedia.org/T387917#10700770 (10Scott_French) I've added a Note to https://wikitech.wikimedia.org/wiki/Maintenance_scripts in advance of tomorrow's switch. [19:06:03] 06serviceops, 13Patch-For-Review: Migrate mw-script to PHP 8.1 - https://phabricator.wikimedia.org/T387917#10700771 (10Scott_French) [19:43:57] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Dumps-Generation, 06MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10700944 (10Jdforrester-WMF) [21:49:48] 06serviceops: mwscript-cleanup.service failure - https://phabricator.wikimedia.org/T390790#10701546 (10Dzahn) [21:50:42] 06serviceops, 06SRE: mwscript-cleanup.service failure - https://phabricator.wikimedia.org/T390790#10701547 (10Dzahn) [21:53:00] 06serviceops, 06SRE: mwscript-cleanup.service failure - https://phabricator.wikimedia.org/T390790#10701553 (10Dzahn) https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=alertname%3DCheck%20unit%20status%20of%20mwscript-cleanup [21:56:32] 06serviceops, 06SRE: mwscript-cleanup.service failure - https://phabricator.wikimedia.org/T390790#10701559 (10Dzahn) p:05Triage→03Medium