[00:23:00] Yippee, build fixed! [00:23:00] Project beta-update-databases-eqiad build #90722: 09FIXED in 3 min 0 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/90722/ [02:21:53] 10Phabricator (Upstream), 07Upstream: Missing space between two Diffusion strings / messages - https://phabricator.wikimedia.org/T304205#11535459 (10Pppery) Not technically the same link (one is the upstream, one is the WMF fork), but https://gitlab.wikimedia.org/repos/phabricator/phabricator/-/blob/wmf/stable... [03:06:49] 06Release-Engineering-Team, 10Scap: On deployment server, skip fetching MediaWiki branches/tags we have no use for - https://phabricator.wikimedia.org/T414920#11535498 (10bd808) >>! In T414920#11532819, @hashar wrote: > `wmf/next` might be needed though. `wmf/next` is the [[https://www.mediawiki.org/wiki/Wiki... [03:23:53] 10Beta-Cluster-Infrastructure, 10GitLab, 10m3api: Unblock IPs for Beta Cluster access - https://phabricator.wikimedia.org/T414864#11535505 (10bd808) The runners you would like unblocked are hosted on Digital Ocean. I do not think that it would be reasonable to open the Beta Cluster to the full DO IPv4 addres... [06:55:08] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for sanjaydevs - https://phabricator.wikimedia.org/T415010 (10Sanjaydevs) 03NEW [08:23:24] 06Release-Engineering-Team (Priority Backlog πŸ“₯), 07Essential-Work, 05Release, 05Train Deployments: 1.46.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T413803#11535763 (10IKhitron) Could you please merge the https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GlobalWatchlist/+/1229000?... [09:14:18] 06Release-Engineering-Team (Priority Backlog πŸ“₯), 07Essential-Work, 13Patch-For-Review, 05Release, 05Train Deployments: 1.46.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T413803#11535882 (10Aklapper) @IKhitron: Hi, that patch first needs a review by someone who knows that code area, to... [09:20:14] 06Release-Engineering-Team (Priority Backlog πŸ“₯), 07Essential-Work, 13Patch-For-Review, 05Release, 05Train Deployments: 1.46.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T413803#11535896 (10IKhitron) >>! In T413803#11535882, @Aklapper wrote: > @IKhitron: Hi, that patch first needs a re... [09:34:18] 10GitLab (Account Approval), 06Release-Engineering-Team (Doing 😎): Requesting GitLab account activation for okerekechinweotito - https://phabricator.wikimedia.org/T414995#11535963 (10Aklapper) 05Openβ†’03Resolved a:03Aklapper Account approved - happy hacking! [09:34:21] 10GitLab (Account Approval), 06Release-Engineering-Team (Doing 😎): Requesting GitLab account activation for sanjaydevs - https://phabricator.wikimedia.org/T415010#11535967 (10Aklapper) 05Openβ†’03Resolved a:03Aklapper Account approved - happy hacking! [10:11:28] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:11:39] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T415021 (10wmcs-alerts) 03NEW [10:16:28] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:33:46] 10Phabricator maintenance bot: Maintenance_bot no longer runs new_wiki_handler job - https://phabricator.wikimedia.org/T415028 (10Urbanecm) 03NEW [11:12:36] 10Phabricator, 06MediaWiki-Platform-Team (Q3 Kanban Board): Silent bulk update for MediaWiki Platform - https://phabricator.wikimedia.org/T414778#11536397 (10OWresch-WMF) Thanks @Aklapper ! I will do it by hand until I found the culprit -- wish me luck. [11:12:39] 10Phabricator, 06MediaWiki-Platform-Team (Q3 Kanban Board): Silent bulk update for MediaWiki Platform - https://phabricator.wikimedia.org/T414778#11536398 (10OWresch-WMF) 05Openβ†’03Resolved [11:22:57] 10Continuous-Integration-Config, 10Excimer, 10LuaSandbox, 06MediaWiki-Platform-Team, 07Developer Productivity: Improve CI for PECL packages - https://phabricator.wikimedia.org/T277063#11536517 (10OWresch-WMF) [11:49:44] 10Phabricator maintenance bot, 13Patch-For-Review: Maintenance_bot no longer runs new_wiki_handler job - https://phabricator.wikimedia.org/T415028#11536876 (10Urbanecm) Okay, after a bunch of changes (see MR), the code works again: ` tools.phabbot@tools-bastion-15:~$ toolforge jobs run newwikis --image python... [12:03:10] 10Phabricator maintenance bot: Maintenance_bot no longer runs new_wiki_handler job - https://phabricator.wikimedia.org/T415028#11536945 (10Urbanecm) 05Openβ†’03Resolved ` tootools.phabbot@tools-bastion-15:~/phabbot$ toolforge jobs load jobs.yaml... [12:39:29] 10Phabricator maintenance bot: Maintenance_bot no longer runs new_wiki_handler job - https://phabricator.wikimedia.org/T415028#11537059 (10A_smart_kitten) FWIW my understanding (based on [[https://wm-bot.wmcloud.org/browser/index.php?start=09%2F14%2F2025&end=09%2F15%2F2025&display=%23wikimedia-tech|this IRC... [12:52:21] hi, is someone able to merge and deploy https://gerrit.wikimedia.org/r/c/integration/config/+/1228566 please? [13:23:05] 06Release-Engineering-Team, 10Scap, 06ServiceOps new: scap --unlock-all asks for confirmation even with --bg flag - https://phabricator.wikimedia.org/T415062#11537262 (10Clement_Goubert) a:03dancy [13:23:51] 06Release-Engineering-Team, 10Scap, 06ServiceOps new: scap --unlock-all asks for confirmation even with --bg flag - https://phabricator.wikimedia.org/T415062#11537267 (10Clement_Goubert) [13:24:02] 06Release-Engineering-Team, 10Scap, 06ServiceOps new, 10ServiceOps-good-first-task, and 3 others: Add scap lock/unlock steps to sre.switchdc.mediawiki cookbook - https://phabricator.wikimedia.org/T330996#11537269 (10Clement_Goubert) [13:32:21] 10Phabricator maintenance bot: Maintenance_bot no longer runs new_wiki_handler job - https://phabricator.wikimedia.org/T415028#11537290 (10Ladsgroup) Yup [15:10:20] (03PS2) 10Prashant_32194: fresh-install: fix fresh-node24 checksum and install validation [fresh] - 10https://gerrit.wikimedia.org/r/1227819 (https://phabricator.wikimedia.org/T414756) [15:51:04] (03merge) 10dancy: Renamed mpic repository to test-kitchen [repos/releng/gitlab-trusted-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/139 (https://phabricator.wikimedia.org/T407808) (owner: 10sfaci) [16:02:44] 06Release-Engineering-Team (Radar), 10Ceph, 06ServiceOps new, 10SRE-swift-storage, and 2 others: Move the docker registry's /restricted prefix to Docker Distribution backed up by Ceph - https://phabricator.wikimedia.org/T412951#11538047 (10dancy) Thanks for the report @elukey. This sounds very promising! [16:09:21] 10Beta-Cluster-Infrastructure, 10GitLab, 10m3api: Unblock IPs for Beta Cluster access - https://phabricator.wikimedia.org/T414864#11538068 (10dancy) >>! In T414864#11532038, @LucasWerkmeister wrote: > FWIW, I’ve tried to get m3api-oauth2 CI running on the WMCS runners instead (`wmcs` tag), but so far haven’t... [16:19:30] 10Beta-Cluster-Infrastructure, 10GitLab, 10m3api: Unblock IPs for Beta Cluster access - https://phabricator.wikimedia.org/T414864#11538114 (10dancy) >>! In T414864#11535504, @bd808 wrote: > The runners you would like unblocked are hosted on Digital Ocean. I do not think that it would be reasonable to open th... [16:46:57] (03open) 10dancy: scap lock: Add --yes flag for --unlock-all mode [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/1069 (https://phabricator.wikimedia.org/T330996 https://phabricator.wikimedia.org/T415062) [16:46:58] (03update) 10dancy: scap lock: Add --yes flag for --unlock-all mode [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/1069 (https://phabricator.wikimedia.org/T330996 https://phabricator.wikimedia.org/T415062) [16:49:18] 10Beta-Cluster-Infrastructure, 10GitLab, 10m3api: Unblock IPs for Beta Cluster access - https://phabricator.wikimedia.org/T414864#11538312 (10bd808) >>! In T414864#11538114, @dancy wrote: > For this to be feasible we would need to set up an egress gateway node on the D.O. side and use the //DOKS routing agen... [16:50:23] (03merge) 10dancy: scap lock: Add --yes flag for --unlock-all mode [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/1069 (https://phabricator.wikimedia.org/T330996 https://phabricator.wikimedia.org/T415062) [16:50:54] (03open) 10dancy: Release 4.233.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/1070 [16:53:28] (03merge) 10dancy: Release 4.233.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/1070 [16:55:06] 06Release-Engineering-Team (Radar), 10Ceph, 06ServiceOps new, 10SRE-swift-storage, and 3 others: Move the docker registry's /restricted prefix to Docker Distribution backed up by Ceph - https://phabricator.wikimedia.org/T412951#11538348 (10MatthewVernon) One further note - cluster-wide metrics on sync dela... [17:01:55] 06Release-Engineering-Team, 10Scap, 06ServiceOps new, 13Patch-For-Review: scap --unlock-all asks for confirmation even with --bg flag - https://phabricator.wikimedia.org/T415062#11538372 (10dancy) @Clement_Goubert Please use `scap lock --unlock-all --yes ` (note: `--bg` is not passed here). [17:02:08] (03CR) 10Hashar: [C:03+2] Zuul: [mediawiki/extensions/CampaignEvents] Add Wikibase phan dependency [integration/config] - 10https://gerrit.wikimedia.org/r/1228595 (https://phabricator.wikimedia.org/T411829) (owner: 10Daimona Eaytoy) [17:02:17] (03CR) 10Hashar: [C:03+2] Zuul: [mediawiki/extensions/ContentStabilization] Add BlueSpiceSmartList [integration/config] - 10https://gerrit.wikimedia.org/r/1228475 (owner: 10Hslater) [17:03:38] (03Merged) 10jenkins-bot: Zuul: [mediawiki/extensions/CampaignEvents] Add Wikibase phan dependency [integration/config] - 10https://gerrit.wikimedia.org/r/1228595 (https://phabricator.wikimedia.org/T411829) (owner: 10Daimona Eaytoy) [17:03:50] (03Merged) 10jenkins-bot: Zuul: [mediawiki/extensions/ContentStabilization] Add BlueSpiceSmartList [integration/config] - 10https://gerrit.wikimedia.org/r/1228475 (owner: 10Hslater) [17:04:59] CI seem to be ignoring `recheck`s on https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1228173/comments/09389706_8ca7f270 ...I feel like I must be missing something about why they aren't working, but I'm not sure what. [17:05:14] (03PS2) 10Hashar: Zuul: Add IPReputation as dependency for IPInfo [integration/config] - 10https://gerrit.wikimedia.org/r/1228566 (https://phabricator.wikimedia.org/T410618) (owner: 10Kosta Harlan) [17:05:33] (03CR) 10Hashar: [C:03+2] Zuul: Add IPReputation as dependency for IPInfo [integration/config] - 10https://gerrit.wikimedia.org/r/1228566 (https://phabricator.wikimedia.org/T410618) (owner: 10Kosta Harlan) [17:06:59] (03Merged) 10jenkins-bot: Zuul: Add IPReputation as dependency for IPInfo [integration/config] - 10https://gerrit.wikimedia.org/r/1228566 (https://phabricator.wikimedia.org/T410618) (owner: 10Kosta Harlan) [17:07:30] (03PS1) 10Unorthodox: Test: My first Gerrit contribution [integration/config] - 10https://gerrit.wikimedia.org/r/1229156 [17:07:39] (03update) 10dancy: reggie: set safe-to-evict: "false" [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/512 (https://phabricator.wikimedia.org/T406733) (owner: 10jelto) [17:07:40] (03close) 10dancy: reggie: set safe-to-evict: "false" [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/512 (https://phabricator.wikimedia.org/T406733) (owner: 10jelto) [17:09:11] (03open) 10dancy: prod.tfvars: Reduce prometheus_retention to 90 days [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/547 [17:09:13] (03update) 10dancy: prod.tfvars: Reduce prometheus_retention to 90 days [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/547 [17:11:19] (03merge) 10dancy: prod.tfvars: Reduce prometheus_retention to 90 days [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/547 [17:19:43] (re my last message) okayy it seems like CI maybe didn't like that another comment was submitted at the same time as the `recheck` comment. i commented `recheck` again but with that as the only comment, and it seems like it's been picked up by CI this time [17:27:13] (03CR) 10Daimona Eaytoy: "Hmmm this doesn't seem to be working for some reason: https://integration.wikimedia.org/ci/job/mwext-phan-php83/12487/console" [integration/config] - 10https://gerrit.wikimedia.org/r/1228595 (https://phabricator.wikimedia.org/T411829) (owner: 10Daimona Eaytoy) [17:40:03] maintenance-disconnect-full-disks build 773895 integration-agent-docker-1056 (/: 36%, /srv: 99%, /var/lib/docker: 36%): OFFLINE due to disk space [17:45:03] maintenance-disconnect-full-disks build 773896 integration-agent-docker-1056 (/: 36%, /srv: 32%, /var/lib/docker: 35%): RECOVERY disk space OK [18:02:59] (03PS2) 10Hashar: Add --shell to start an user shell [integration/quibble] - 10https://gerrit.wikimedia.org/r/1229064 [18:03:35] (03CR) 10Hashar: [C:03+2] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/1228475 (owner: 10Hslater) [18:03:38] (03CR) 10Hashar: [C:03+2] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/1228566 (https://phabricator.wikimedia.org/T410618) (owner: 10Kosta Harlan) [18:03:41] (03CR) 10Hashar: [C:03+2] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/1228595 (https://phabricator.wikimedia.org/T411829) (owner: 10Daimona Eaytoy) [18:05:42] (03CR) 10CI reject: [V:04-1] Add --shell to start an user shell [integration/quibble] - 10https://gerrit.wikimedia.org/r/1229064 (owner: 10Hashar) [18:07:46] (03Abandoned) 10Hashar: Test: My first Gerrit contribution [integration/config] - 10https://gerrit.wikimedia.org/r/1229156 (owner: 10Unorthodox) [18:08:44] A_smart_kitten gotta be one word (no other sentence) I think [18:09:01] and it must be its own comment I think (not a reply?) [18:09:29] paladox: IIRC other words are okay since https://gerrit.wikimedia.org/r/c/integration/config/+/487037 [18:10:10] & FWIW the reply at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1228173/comments/09389706_8ca7f270 seemed to work [18:11:05] my working theory is that the regex at https://gerrit.wikimedia.org/g/integration/config/+/bba9993cb7f935c294534de168c380a556c89647/zuul/layout.yaml#610 doesn't (always?) match when multiple comments are submitted at once [18:13:49] I would try and test out if the regex could be changed to support when multiple comments are left at once, but I'm not sure how to find an example of the input string that the regex is being tested against [18:31:48] (03PS3) 10Hashar: Add --shell to start an user shell [integration/quibble] - 10https://gerrit.wikimedia.org/r/1229064 [18:50:02] !log Unblock 152.7.0.0/16 (T415100) [18:50:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:50:11] (03CR) 10CI reject: [V:04-1] Add --shell to start an user shell [integration/quibble] - 10https://gerrit.wikimedia.org/r/1229064 (owner: 10Hashar) [19:05:36] !log Rebooted deployment-cache-text08 to see if the mystery haproxy startup failure would go away (T415100) [19:05:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:07:19] haproxy is dead on both deployment-cache-* nodes and I am not seeing why yet. This means that all of Beta Cluster is functionally down with the CDN edge not starting. [19:08:43] 10Beta-Cluster-Infrastructure, 10GitLab, 10m3api: Unblock running tests against Beta Cluster from Digital Ocean GitLab CI runners - https://phabricator.wikimedia.org/T414864#11538924 (10bd808) [19:11:12] 10Beta-Cluster-Infrastructure: Puppet agent failure detected on instance deployment-puppetserver-1 in project deployment-prep - https://phabricator.wikimedia.org/T414934#11538935 (10bd808) 05Openβ†’03Invalid Already resolved. `lang=shell-session bd808@deployment-puppetserver-1.deployment-prep.eqiad1:~$ sud... [19:17:45] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T415021#11538946 (10bd808) p:05Triageβ†’03Medium Looks to be a duplicate of the behavior from {T412774}. The instance is up, but something spiked it's load to a point where Prom... [19:24:31] 10Beta-Cluster-Infrastructure, 06Traffic: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 - https://phabricator.wikimedia.org/T415113 (10bd808) 03NEW [19:24:43] 10Beta-Cluster-Infrastructure, 06Traffic: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 - https://phabricator.wikimedia.org/T415113#11538979 (10bd808) p:05Triageβ†’03High [19:33:46] Beta Cluster's CDN is down (T415113). I need to eat before my brain can think of anything new to try to get it back up. Help is very welcome. [19:33:47] T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 - https://phabricator.wikimedia.org/T415113 [19:53:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [19:53:36] 10Beta-Cluster-Infrastructure: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep - https://phabricator.wikimedia.org/T415115 (10wmcs-alerts) 03NEW [19:54:01] 06Release-Engineering-Team (Radar), 10Ceph, 06ServiceOps new, 10SRE-swift-storage, and 3 others: Move the docker registry's /restricted prefix to Docker Distribution backed up by Ceph - https://phabricator.wikimedia.org/T412951#11539048 (10elukey) >>! In T412951#11538047, @dancy wrote: > Thanks for the rep... [19:54:37] 10Beta-Cluster-Infrastructure, 06Traffic: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 - https://phabricator.wikimedia.org/T415113#11539051 (10ssingh) ` Jan 20 16:51:40 deployment-cache-text08 systemd[1]: haproxy.service: Control process exited, code=exited, status=1/FAILUR... [19:55:25] 10Beta-Cluster-Infrastructure, 06Traffic: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 - https://phabricator.wikimedia.org/T415113#11539054 (10ssingh) ` sukhe@deployment-cache-text08:~$ sudo haproxy -f /etc/haproxy/conf.d/tls.cfg [NOTICE] (17847) : haproxy version is 2.8... [19:58:28] FIRING: [2x] PuppetAgentFailure: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [20:19:19] 10Beta-Cluster-Infrastructure, 06Traffic: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 - https://phabricator.wikimedia.org/T415113#11539242 (10ssingh) @SLyngshede-WMF: See if you can find time to look into this when you come online, or I will tomorrow. Thanks! [20:25:30] 10Beta-Cluster-Infrastructure, 06Traffic: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 - https://phabricator.wikimedia.org/T415113#11539251 (10bd808) Running the config check with all of the config files gives a different error: `lang=shell-session,counterexample bd808@depl... [20:28:33] 10Beta-Cluster-Infrastructure, 06Traffic: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 - https://phabricator.wikimedia.org/T415113#11539257 (10bd808) >>! In T415113#11539251, @bd808 wrote: > I wonder if that `lua.check_traffic_class` method is coming from a private location... [20:29:18] 10Beta-Cluster-Infrastructure, 06Traffic: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library - https://phabricator.wikimedia.org/T415113#11539262 (10bd808) [20:38:13] !log Cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229186 (T415113) [20:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:38:15] T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library - https://phabricator.wikimedia.org/T415113 [20:38:33] 10Continuous-Integration-Config, 07Epic, 07PHP 8.5 support: Make PHP 8.5 voting on development (master) branch of MW ecosystem (core, vendor, extensions, skins, libraries) in CI - https://phabricator.wikimedia.org/T411814#11539287 (10Umherirrender) [20:40:37] 10Beta-Cluster-Infrastructure, 06Traffic, 13Patch-For-Review: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library - https://phabricator.wikimedia.org/T415113#11539294 (10bd808) `lang=shell-session bd808@deployment-cache-text08.deplo... [20:44:25] 10Beta-Cluster-Infrastructure, 06Traffic, 13Patch-For-Review: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library - https://phabricator.wikimedia.org/T415113#11539309 (10bd808) 05Openβ†’03In progress a:03bd808 Cherry-pick has th... [20:53:28] FIRING: [2x] PuppetAgentFailure: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [20:53:36] 10Beta-Cluster-Infrastructure: Puppet agent failure detected on instance deployment-cache-upload08 in project deployment-prep - https://phabricator.wikimedia.org/T415133 (10wmcs-alerts) 03NEW [21:01:02] 10Beta-Cluster-Infrastructure, 06Traffic, 13Patch-For-Review: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library - https://phabricator.wikimedia.org/T415113#11539426 (10bd808) [21:01:05] 10Beta-Cluster-Infrastructure, 06Traffic: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep - https://phabricator.wikimedia.org/T415115#11539430 (10bd808) β†’14Duplicate dup:03T415113 [21:01:16] 10Beta-Cluster-Infrastructure: Puppet agent failure detected on instance deployment-cache-upload08 in project deployment-prep - https://phabricator.wikimedia.org/T415133#11539432 (10bd808) β†’14Duplicate dup:03T415113 [21:11:28] RESOLVED: [2x] PuppetAgentFailure: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [21:15:58] 06Release-Engineering-Team (Radar), 10Ceph, 06ServiceOps new, 10SRE-swift-storage, and 3 others: Move the docker registry's /restricted prefix to Docker Distribution backed up by Ceph - https://phabricator.wikimedia.org/T412951#11539487 (10dancy) >>! In T412951#11539048, @elukey wrote: > Thanks! When you h... [21:18:59] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T415021#11539495 (10bd808) `sudo journalctl --since "2026-01-20 10:05:00" --until "2026-01-20 10:15:00"` turned up the kernel oom-killer going after cassandra in the time range wh... [21:57:27] 06Release-Engineering-Team (Priority Backlog πŸ“₯), 07Essential-Work, 05Release, 05Train Deployments: 1.46.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T413803#11539595 (10matmarex) Thanks for catching that GlobalWatchlist bug @IKhitron. As far as I can see, the bug is that, when using the... [22:01:30] 06Release-Engineering-Team (Radar), 10Ceph, 06ServiceOps new, 10SRE-swift-storage, and 3 others: Move the docker registry's /restricted prefix to Docker Distribution backed up by Ceph - https://phabricator.wikimedia.org/T412951#11539601 (10Scott_French) Thank you very much @elukey - that's great news! +1... [22:17:59] 06Release-Engineering-Team (Priority Backlog πŸ“₯), 07Essential-Work, 05Release, 05Train Deployments: 1.46.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T413803#11539622 (10IKhitron) >>! In T413803#11539595, @matmarex wrote: > Thanks for catching that GlobalWatchlist bug @IKhitron. As far a... [22:25:21] 06Release-Engineering-Team (Priority Backlog πŸ“₯), 07Essential-Work, 05Release, 05Train Deployments: 1.46.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T413803#11539642 (10bd808) >>! In T413803#11539622, @IKhitron wrote: > And the Beta wikis are down now, there is just not enough time. Th... [22:25:45] 06Release-Engineering-Team: 500 error "attempt to write a readonly database" on releng-data.wmcloud.org - https://phabricator.wikimedia.org/T415139 (10vaughnwalters) 03NEW [22:56:33] 10Phabricator: Update to Phorge/Arcanist upstream (2026.xx) - https://phabricator.wikimedia.org/T410849#11539717 (10Aklapper) [22:56:37] 10Phabricator (Upstream), 06Release-Engineering-Team (Doing 😎), 07Upstream: "Call to phutil_nonempty_string() expected null or a string, got: bool" rendering alt text of image file - https://phabricator.wikimedia.org/T352170#11539718 (10Aklapper) [23:02:20] 06Release-Engineering-Team: 500 error "attempt to write a readonly database" on releng-data.wmcloud.org - https://phabricator.wikimedia.org/T415139#11539727 (10thcipriani) Crawled to death. Added some blocks. Should be back! [23:02:38] 06Release-Engineering-Team (Doing 😎), 07Essential-Work: 500 error "attempt to write a readonly database" on releng-data.wmcloud.org - https://phabricator.wikimedia.org/T415139#11539728 (10thcipriani) 05Openβ†’03Resolved a:03thcipriani [23:04:44] 06Release-Engineering-Team (Doing 😎), 07Essential-Work: 500 error "attempt to write a readonly database" on releng-data.wmcloud.org - https://phabricator.wikimedia.org/T415139#11539734 (10SLong-WMF) So was the root cause here that external crawlers were causing an overload of our ability to handle traffic,...