[03:06:45] 06serviceops: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#10728426 (10Scott_French) This evening, I tried to pull together a rough timeline of the issues between 14:00 and 17:00 UTC today in https://phabricator.wikimedia.org/P74821 (note: time goes backwards!... [09:28:57] 06serviceops: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#10729005 (10elukey) I think as well that Dragonfly may need to be checked.. Affected by digest mismatch: ` root@wikikube-worker1256:/var/lib/dragonfly-dfdaemon/logs# grep 52c8ffea230bf6fd62801737a3713... [10:48:37] 06serviceops, 10Deployments, 10Shellbox, 10Wikibase-Quality-Constraints, and 5 others: Burst of GuzzleHttp Exception for http://localhost:6025/call/constraint-regex-checker - https://phabricator.wikimedia.org/T371633#10729200 (10Lucas_Werkmeister_WMDE) >>! In T371633#10628754, @karapayneWMDE wrote: > -... [11:14:05] 06serviceops, 10Shellbox, 10SyntaxHighlight, 10MW-1.44-notes (1.44.0-wmf.24; 2025-04-08), 07Wikimedia-production-error: Shellbox bubbles GuzzleHttp\Exception\ConnectException when it should probably wrap it in a ShellboxError? - https://phabricator.wikimedia.org/T374117#10729286 (10hashar) 05Resolved→... [11:19:20] 06serviceops, 10Shellbox, 10SyntaxHighlight, 10MW-1.44-notes (1.44.0-wmf.24; 2025-04-08), 07Wikimedia-production-error: Shellbox bubbles GuzzleHttp\Exception\ConnectException when it should probably wrap it in a ShellboxError? - https://phabricator.wikimedia.org/T374117#10729321 (10hashar) 05Open→... [11:32:09] 06serviceops, 06MediaWiki-Platform-Team: MediaWikiCronJobFailed - https://phabricator.wikimedia.org/T391574#10729339 (10Clement_Goubert) 05Open→03In progress p:05Triage→03High a:03Clement_Goubert That's me, I'll take care of it. [13:28:16] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Migrate mwdebug* hosts to PHP8.1 - https://phabricator.wikimedia.org/T391452#10729673 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mwdebug1002.eqiad.wmnet with O... [13:39:46] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 07Code-Health-Objective, and 2 others: Move PCS endpoints behind API Gateway - https://phabricator.wikimedia.org/T264670#10729700 (10Jgiannelos) [13:48:10] 06serviceops, 06DC-Ops, 10decommission-hardware, 10ops-codfw, 06SRE: decommission mw2278, mw2279 - https://phabricator.wikimedia.org/T391001#10729755 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm [13:53:54] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: hard down for wikikube-worker2142 - https://phabricator.wikimedia.org/T391341#10729811 (10Jhancock.wm) starting with firmware updates. hopefully we'll get a more concise error. [13:54:11] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Migrate mwdebug* hosts to PHP8.1 - https://phabricator.wikimedia.org/T391452#10729812 (10jijiki) 05Open→03In progress p:05Triage→03Medium [13:54:21] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Migrate mwdebug* hosts to PHP8.1 - https://phabricator.wikimedia.org/T391452#10729817 (10jijiki) [13:57:58] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Migrate mwdebug* hosts to PHP8.1 - https://phabricator.wikimedia.org/T391452#10729823 (10jijiki) mwdebug1002 is done. I will reimage mwdebug2002 today as well, and if nothing comes up until next week, I will mov... [15:03:49] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: hard down for wikikube-worker2142 - https://phabricator.wikimedia.org/T391341#10730203 (10Jhancock.wm) the NIC card has perished. I am opening an return with Dell. [15:07:15] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: hard down for wikikube-worker2142 - https://phabricator.wikimedia.org/T391341#10730222 (10Clement_Goubert) RIP. Thanks. [15:11:09] hey folks, should we give any thought about trying https://gitlab.wikimedia.org/repos/data-engineering/service-utils for some services like changeprop? [15:11:15] to get rid of service-runner [15:11:43] 06serviceops, 10Wikidata, 10Wikidata Integration in Wikimedia projects: Migrate wikidata_dev_team jobs to mw-cron - https://phabricator.wikimedia.org/T388543#10730289 (10karapayneWMDE) [15:16:02] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: hard down for wikikube-worker2142 - https://phabricator.wikimedia.org/T391341#10730305 (10Jhancock.wm) SR208354425. np! [15:25:40] elukey: yes, but it's not up to SRE to get this done. It's proposed to newer projects right now. However it's isn't ready. That being said, next FY there is a KR to either adopt it properly (with a shared perhaps ownership) or create a properly supported one by SREs. And then tell people to use that instead of service-runner [15:26:03] ahhh TIL, nice! [15:30:06] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Migrate mwdebug* hosts to PHP8.1 - https://phabricator.wikimedia.org/T391452#10730396 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mwdebug2002.codfw.wmnet wi... [15:56:35] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: relocate (3) service-ops hosts out of eqiad D6 - https://phabricator.wikimedia.org/T391599 (10RobH) 03NEW [15:57:45] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: relocate (3) service-ops hosts out of eqiad D6 - https://phabricator.wikimedia.org/T391599#10730520 (10RobH) p:05Triage→03High a:03Kappakayala @Kappakayala I think you'd be the person to triage this within #service-ops and assign a point of contact for fe... [16:04:27] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: relocate (3) service-ops hosts out of eqiad D6 - https://phabricator.wikimedia.org/T391599#10730569 (10Clement_Goubert) Tagging @eevans for sessionstore host. For `wikikube-worker` they can change VLAN/IP and move rack. Just tell us when you want to move them so... [16:30:30] 06serviceops, 10MW-on-K8s, 13Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Migrate mwdebug* hosts to PHP8.1 - https://phabricator.wikimedia.org/T391452#10730701 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mwdebug2002.codfw.wmnet with O... [16:52:09] 06serviceops, 10MW-on-K8s, 06Release-Engineering-Team, 13Patch-For-Review: Refactor scap's kubernetes DeploymentsConfig to support selection of image kinds - https://phabricator.wikimedia.org/T389499#10730830 (10Scott_French) 05Open→03Resolved p:05Triage→03Medium Since no further work / cleanup... [16:54:17] 06serviceops, 10Scap, 13Patch-For-Review: Migrate scap's maintenance script invocations to PHP 8.1 - https://phabricator.wikimedia.org/T390225#10730861 (10Scott_French) 05Open→03Resolved With the override cleaned up, there should be nothing else to do here. [17:49:33] 06serviceops, 10Deployments, 10Shellbox, 10Wikibase-Quality-Constraints, and 6 others: Burst of GuzzleHttp Exception for http://localhost:6025/call/constraint-regex-checker - https://phabricator.wikimedia.org/T371633#10731089 (10dancy) Thanks to everyone who worked to take care of these errors! [19:24:28] 06serviceops: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#10731311 (10Scott_French) I took a closer look at the registry nginx access logs this morning, in hopes that I could understand a bit better (1) what the dragonfly fetch looks like on that side and (2)... [23:21:56] 06serviceops, 06Content-Transform-Team, 06MediaWiki-Engineering, 06MW-Interfaces-Team, and 3 others: Transition parsoidtest1001 to PHP 8.1 - https://phabricator.wikimedia.org/T380485#10731980 (10Scott_French) Following up here after implementation discussion has largely shifted to {T386246}. In short, the...