[00:06:21] FIRING: [2x] PrometheusK8sCertExpirySoon: Prometheus k8s certificate is about to expire - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/PrometheusK8sCertExpirySoon - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPrometheusK8sCertExpirySoon [00:55:15] 10Tool-humaniki-2: More intuitive charts for languages stats - https://phabricator.wikimedia.org/T425297#11885069 (10Bugreporter2) @Danya - can you post screenshots? [06:27:20] (03open) 10samwilson: Run CI on WMCS [toolforge-repos/ocr] - 10https://gitlab.wikimedia.org/toolforge-repos/ocr/-/merge_requests/5 [06:34:06] (03merge) 10samwilson: Run CI on WMCS [toolforge-repos/ocr] - 10https://gitlab.wikimedia.org/toolforge-repos/ocr/-/merge_requests/5 [06:41:38] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-32 (2026-04-21 to 2026-05-05)), 07OKR-Work, 13Patch-For-Review: Add linting rules for operations and paths - https://phabricator.wikimedia.org/T422504#11885218 (10KBach) @KineticPelagic @AGhirelli-WMF thank you for bringing... [06:55:17] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [components-api] Queue builds when the build queue is full - https://phabricator.wikimedia.org/T402568#11885224 (10DamianZaremba) In the last few weeks this has caused an outage on ClueBot NG twice (as dependency updates happen). Latest was from this m... [06:58:52] 06cloud-services-team, 10Cloud-VPS, 06Community-Tech, 10Wikimedia OCR, 07Essential-Work: [Cloud VPS alert][wikisource] Puppet failure on kraken-ocr.wikisource.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T424818#11885226 (10Samwilson) I also can't log in. I guess SSH keys aren't being copie... [07:05:21] (03merge) 10samwilson: Remove the Tesseract frk model (it has been superseded by deu_latf) [toolforge-repos/ocr] - 10https://gitlab.wikimedia.org/toolforge-repos/ocr/-/merge_requests/4 (owner: 10sweil) [07:06:06] 06cloud-services-team, 10Cloud-VPS, 06Community-Tech, 10Wikimedia OCR, 07Essential-Work: [Cloud VPS alert][wikisource] Puppet failure on kraken-ocr.wikisource.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T424818#11885243 (10sweil) A recreate with Debian trixie would be fine for me. Maybe th... [07:27:39] !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/25252392662 (https://github.com/cluebotng/component-configs/commits/7352cd4f730ca9f5c276772f0b338230989feef4) [07:27:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [07:36:34] RESOLVED: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https&var-backend=k8s-ingress-http&var-cluster=prometheus-toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [07:44:33] FIRING: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https&var-backend=k8s-ingress-http&var-cluster=prometheus-toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [07:50:10] 10Cloud-Services: Migrate the toolhub's Docker image to Trixie - https://phabricator.wikimedia.org/T425303 (10elukey) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific proj... [07:54:33] RESOLVED: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https&var-backend=k8s-ingress-http&var-cluster=prometheus-toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [08:07:33] FIRING: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https&var-backend=k8s-ingress-http&var-cluster=prometheus-toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [08:21:32] FIRING: InstanceDown: Project cloudinfra instance mx-out05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:21:32] FIRING: TargetDown: Job mtail-mx-out is unreachable in project cloudinfra instance mx-out05 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [08:31:32] RESOLVED: InstanceDown: Project cloudinfra instance mx-out05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:31:32] RESOLVED: TargetDown: Job mtail-mx-out is unreachable in project cloudinfra instance mx-out05 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [08:52:31] 06cloud-services-team, 10Toolforge: toolforge webservice restart does not wait for pod to be ready, only running - https://phabricator.wikimedia.org/T425215#11885769 (10taavi) IIRC the health checks were added much later compared to the current restart polling code, so not exactly intentional at least. [09:03:31] 06cloud-services-team, 10Cloud-VPS, 06Community-Tech, 10Wikimedia OCR, 07Essential-Work: [Cloud VPS alert][wikisource] Puppet failure on kraken-ocr.wikisource.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T424818#11885867 (10taavi) 05Open→03Resolved a:03taavi The machine was in the... [10:21:16] 10Tool-wsindex, 10Wikisource Reader App (Android): Display extra information from Wikidata in Book Details Screen - https://phabricator.wikimedia.org/T406245#11886151 (10Bodhisattwa) [10:21:28] 10Tool-wsindex, 10Wikisource Reader App (Android): Display extra information from Wikidata in Book Details Screen - https://phabricator.wikimedia.org/T406245#11886152 (10Bodhisattwa) 05Open→03Resolved [10:51:56] FIRING: [2x] SystemdUnitDown: The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:46:31] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10Puppet-Core: Add more rspec test to the puppet code - https://phabricator.wikimedia.org/T289668#11886378 (10MoritzMuehlenhoff) 05Open→03Declined We have plenty of spec tests, so this umbrella task feels a little too broad. If speci... [12:20:05] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1282331 (owner: 10L10n-bot) [12:31:52] 10Tool-curator: Test Bun for Toolforge container using cookiecutter template - https://phabricator.wikimedia.org/T425350 (10DaxServer) 03NEW [12:36:23] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-32 (2026-04-21 to 2026-05-05)), 07OKR-Work, 13Patch-For-Review: Add linting rules for operations and paths - https://phabricator.wikimedia.org/T422504#11886503 (10AGhirelli-WMF) Thank you so much @KBach ! [12:37:10] 10Tool-curator: Test Bun for Toolforge container using cookiecutter template - https://phabricator.wikimedia.org/T425350#11886504 (10DaxServer) 05Open→03In progress p:05Triage→03Medium a:03DaxServer [12:37:36] (03approved) 10aghirelli: Add rules for operations and paths (T422504) [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/17 (owner: 10kineticpelagic) [12:38:01] (03merge) 10aghirelli: Add rules for operations and paths (T422504) [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/17 (owner: 10kineticpelagic) [12:38:40] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-32 (2026-04-21 to 2026-05-05)), 07OKR-Work, 13Patch-For-Review: Add linting rules for operations and paths - https://phabricator.wikimedia.org/T422504#11886512 (10AGhirelli-WMF) 05In progress→03Resolved [12:46:56] FIRING: [2x] SystemdUnitDown: The systemd unit backup_vms.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:21:56] FIRING: [2x] SystemdUnitDown: The systemd unit backup_vms.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:42:14] RESOLVED: SystemdUnitDown: The systemd unit backup_vms.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:46:56] RESOLVED: [2x] SystemdUnitDown: The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:29:08] 06cloud-services-team, 10Toolforge: Make toolsbeta paging alerts less confusing - https://phabricator.wikimedia.org/T424814#11886876 (10taavi) [14:36:41] 06cloud-services-team, 10Toolforge, 06Infrastructure-Foundations, 06SRE: Adjust WMCS Gitlab CI/CD repo to stop using mirrors.wikimedia.org - https://phabricator.wikimedia.org/T423596#11886916 (10MoritzMuehlenhoff) p:05Triage→03Medium [14:48:04] 06cloud-services-team, 10Puppet-Infrastructure: Prevent catalog breakage on cloud instances by decoupling core cloud puppetmaster from custom puppetmasters - https://phabricator.wikimedia.org/T227029#11886967 (10LSobanski) Untagging IF, please reach out if our input is needed. [14:50:27] 06cloud-services-team, 10Infrastructure Security: ops/puppet: generalize systemd resource control for users - https://phabricator.wikimedia.org/T215401#11886970 (10LSobanski) [15:48:37] 10Tool-wikicordo: Self-nomination filter for deletion requests in Wikicordo - https://phabricator.wikimedia.org/T425361 (10Josve05a) 03NEW [15:49:30] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-32 (2026-04-21 to 2026-05-05)), 07OKR-Work: Add linting rules for operations and paths - https://phabricator.wikimedia.org/T422504#11887198 (10KineticPelagic) [16:12:09] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team: linter: Refactor test suite to use helper spec builders instead of inline YAML strings - https://phabricator.wikimedia.org/T425068#11887300 (10KineticPelagic) [17:12:05] 10Tool-wikicordo: Self-nomination filter for deletion requests in Wikicordo - https://phabricator.wikimedia.org/T425361#11887421 (10Josve05a) Example case: https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Lee_Teng-Hui_signature.png [18:54:38] (03open) 10don-vip: Draft: Add NASA GAPE [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/33 [19:03:14] (03approved) 10aaron: feat(linter): unify and improve example validation rules (T423411) [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/18 (owner: 10aghirelli) [19:08:10] RECOVERY - all nova flavors are assigned necessary properties on cloudcontrol1007 is OK: All flavors are assigned to aggregates https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Host_aggregates [20:01:43] (03update) 10aghirelli: feat(linter): unify and improve example validation rules (T423411) [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/18 [20:09:49] (03update) 10aghirelli: feat(linter): unify and improve example validation rules (T423411) [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/18 [20:11:55] (03update) 10aghirelli: feat(linter): unify and improve example validation rules (T423411) [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/18 [20:16:20] (03merge) 10aghirelli: feat(linter): unify and improve example validation rules (T423411) [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/18 [20:24:49] (03update) 10danyya: Draft: Split views urls [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/32 [20:27:39] (03update) 10danyya: Split views urls [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/32 [20:27:42] (03merge) 10danyya: Split views urls [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/32 [20:33:22] (03open) 10danyya: Draft: Home page [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/33 [20:46:13] (03update) 10danyya: Draft: Home page [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/33 [20:58:18] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-32 (2026-04-21 to 2026-05-05)), 07OKR-Work: Add the remaining linting rules - https://phabricator.wikimedia.org/T422600#11887863 (10KineticPelagic) 05Open→03In progress [22:28:01] (03update) 10ejegg: Add basic undo functionality [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/61 (https://phabricator.wikimedia.org/T421061) [22:31:34] 10Tool-centralnotice-banner-editor, 13Patch-For-Review: Add undo functionality to the editor - https://phabricator.wikimedia.org/T421061#11887980 (10Ejegg) OK, I've added another commit to the MR adding a redo button. It seems to work well in basic testing. After making multiple changes then undoing them, you...