Fork me on GitHub

Wikimedia IRC logs browser - #wikimedia-operations

Filter:
Start date
End date

Displaying 1104 items:

2026-02-24 00:00:05 <jouncebot> Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T0000)
2026-02-24 00:03:15 <jinxer-wm> FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 11.15% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
2026-02-24 00:06:15 <jinxer-wm> FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-web releases routed via main (k8s) 897.1ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
2026-02-24 00:16:15 <jinxer-wm> RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-web releases routed via main (k8s) 862.4ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
2026-02-24 00:18:52 <wikibugs> ('PS1) ''Pppery: Generate our own logo thumbnails rather than using MediaWiki's [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242542 (https://phabricator.wikimedia.org/T414048)'
2026-02-24 00:19:48 <wikibugs> ('CR) ''CI reject: [V:''-1] Generate our own logo thumbnails rather than using MediaWiki's [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242542 (https://phabricator.wikimedia.org/T414048) (owner: ''Pppery)'
2026-02-24 00:19:58 <wikibugs> ('PS2) ''Pppery: Generate our own logo thumbnails rather than using MediaWiki's [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242542 (https://phabricator.wikimedia.org/T414048)'
2026-02-24 00:20:47 <wikibugs> ('PS3) ''Pppery: Generate our own logo thumbnails rather than using MediaWiki's [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242542 (https://phabricator.wikimedia.org/T414048)'
2026-02-24 00:20:49 <wikibugs> ('CR) ''CI reject: [V:''-1] Generate our own logo thumbnails rather than using MediaWiki's [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242542 (https://phabricator.wikimedia.org/T414048) (owner: ''Pppery)'
2026-02-24 00:25:06 <wikibugs> ('CR) ''Pppery: "Adding some people who looked at https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1217282 as reviewers for this follow-up." [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242542 (https://phabricator.wikimedia.org/T414048) (owner: ''Pppery)'
2026-02-24 00:28:17 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, February 25 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1233674 (https://phabricator.wikimedia.org/T413951) (owner: ''STran)'
2026-02-24 00:29:17 <wikibugs> ('CR) ''STran: [C:''+1] IPInfo: Grant ipinfo-view-arbitrary-ip to checkuser group [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242424 (https://phabricator.wikimedia.org/T374718) (owner: ''Kosta Harlan)'
2026-02-24 00:36:23 <jinxer-wm> FIRING: GnmiTargetDown: asw1-22-ulsfo is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
2026-02-24 00:38:33 <wikibugs> ('PS1) ''Catrope: Remove workaround for T370517, no longer needed [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242543 (https://phabricator.wikimedia.org/T370517)'
2026-02-24 00:39:01 <wikibugs> ('PS1) ''TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1242544'
2026-02-24 00:39:01 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1242544 (owner: ''TrainBranchBot)'
2026-02-24 00:40:09 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, February 26 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242543 (https://phabricator.wikimedia.org/T370517) (owner: ''Catrope)'
2026-02-24 00:41:22 <jinxer-wm> RESOLVED: GnmiTargetDown: asw1-22-ulsfo is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
2026-02-24 00:47:03 <wikibugs> ('CR) ''Dzahn: [C:''+2] "Jelto, when I tried to deploy this to staging I got this effect where the command line just sits there for a long time until it eventually" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1240412 (https://phabricator.wikimedia.org/T414098) (owner: ''Dzahn)'
2026-02-24 00:48:15 <jinxer-wm> RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 21.47% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
2026-02-24 00:51:00 <wikibugs> ('Merged) ''jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1242544 (owner: ''TrainBranchBot)'
2026-02-24 00:56:38 <wikibugs> ('PS2) ''Aaron Schulz: Copy rest_v1-wikimedia.json to standard-docroot [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1224228 (https://phabricator.wikimedia.org/T396807)'
2026-02-24 01:00:46 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 24 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1224228 (https://phabricator.wikimedia.org/T396807) (owner: ''Aaron Schulz)'
2026-02-24 01:02:15 <jinxer-wm> FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.15% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
2026-02-24 01:08:52 <wikibugs> ('PS1) ''TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1242559'
2026-02-24 01:08:52 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1242559 (owner: ''TrainBranchBot)'
2026-02-24 01:12:15 <jinxer-wm> RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.72% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
2026-02-24 01:22:15 <jinxer-wm> FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 23.08% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
2026-02-24 01:32:15 <jinxer-wm> RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.11% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
2026-02-24 01:33:11 <wikibugs> ('Merged) ''jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1242559 (owner: ''TrainBranchBot)'
2026-02-24 01:36:40 <jinxer-wm> FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 01:40:40 <wikibugs> ('PS1) ''Matthias Mullie: Squashed diff to master [extensions/ReaderExperiments] (wmf/1.46.0-wmf.16) - ''https://gerrit.wikimedia.org/r/1242568'
2026-02-24 01:41:33 <logmsgbot> !log pt1979@cumin2002 START - Cookbook sre.network.tls for network device asw1-23-ulsfo
2026-02-24 01:41:43 <logmsgbot> !log pt1979@cumin2002 END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
2026-02-24 01:42:15 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 24 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc"; [extensions/ReaderExperiments] (wmf/1.46.0-wmf.16) - ''https://gerrit.wikimedia.org/r/1242568 (owner: ''Matthias Mullie)'
2026-02-24 01:50:12 <wikibugs> ('PS3) ''Aaron Schulz: Copy rest_v1-wikimedia.json to standard-docroot [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1224228 (https://phabricator.wikimedia.org/T418188)'
2026-02-24 01:50:42 <wikibugs> ('CR) ''ArielGlenn: [C:''+1] "Dunno why gerrit removed my vote. Still valid." [deployment-charts] - ''https://gerrit.wikimedia.org/r/1239669 (owner: ''Daniel Kinzler)'
2026-02-24 01:50:52 <logmsgbot> !log pt1979@cumin2002 START - Cookbook sre.network.tls for network device asw1-23-ulsfo
2026-02-24 01:50:55 <wikibugs> ('PS2) ''Aaron Schulz: Switch math sandbox specs to plain wikimedia.org [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1224253 (https://phabricator.wikimedia.org/T418188)'
2026-02-24 01:51:02 <logmsgbot> !log pt1979@cumin2002 END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
2026-02-24 01:52:36 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 24 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1224253 (https://phabricator.wikimedia.org/T418188) (owner: ''Aaron Schulz)'
2026-02-24 02:01:14 <wikibugs> ('PS1) ''Aaron Schulz: [DNM] Simplify spec-json-wikimedia route and use meta.wikimedia.org [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242576 (https://phabricator.wikimedia.org/T418188)'
2026-02-24 02:03:15 <jinxer-wm> FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.15% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
2026-02-24 02:08:15 <jinxer-wm> RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.89% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
2026-02-24 02:08:22 <jinxer-wm> FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-02-24 02:08:50 <wikibugs> ('PS1) ''TrainBranchBot: Branch commit for wmf/1.46.0-wmf.17 [core] (wmf/1.46.0-wmf.17) - ''https://gerrit.wikimedia.org/r/1242581 (https://phabricator.wikimedia.org/T413808)'
2026-02-24 02:08:52 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] Branch commit for wmf/1.46.0-wmf.17 [core] (wmf/1.46.0-wmf.17) - ''https://gerrit.wikimedia.org/r/1242581 (https://phabricator.wikimedia.org/T413808) (owner: ''TrainBranchBot)'
2026-02-24 02:19:42 <jinxer-wm> FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-02-24 02:22:12 <wikibugs> ('Merged) ''jenkins-bot: Branch commit for wmf/1.46.0-wmf.17 [core] (wmf/1.46.0-wmf.17) - ''https://gerrit.wikimedia.org/r/1242581 (https://phabricator.wikimedia.org/T413808) (owner: ''TrainBranchBot)'
2026-02-24 02:22:58 <wikibugs> 'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11643245 (''Papaul) Both switches are now running version 25.10.2. Still can not get the Cookbook sre.network.tls to pass on asw1-23-ulsfo.'
2026-02-24 02:33:22 <jinxer-wm> RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-02-24 02:38:40 <jinxer-wm> FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs2008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 03:00:05 <jouncebot> Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T0300)
2026-02-24 03:33:15 <jinxer-wm> FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 24.61% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
2026-02-24 03:48:15 <jinxer-wm> RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 21.4% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
2026-02-24 04:00:04 <jouncebot> Deploy window Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T0400)
2026-02-24 04:15:15 <wikibugs> ('PS1) ''Aaron Schulz: [DNM] Add growthexperiments.v0 to $wgRestSandboxSpecs [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242613 (https://phabricator.wikimedia.org/T414470)'
2026-02-24 05:00:05 <jouncebot> Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T0500)
2026-02-24 05:01:12 <logmsgbot> !log mwpresync@deploy2002 Pruned MediaWiki: 1.46.0-wmf.14 (duration: 01m 10s)
2026-02-24 05:08:22 <jinxer-wm> FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-02-24 05:36:40 <jinxer-wm> FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 05:54:29 <wikibugs> ('PS1) ''Marostegui: Revert "pc1011,pc2011: Disable notifications" [puppet] - ''https://gerrit.wikimedia.org/r/1242748'
2026-02-24 05:55:24 <wikibugs> ('CR) ''Marostegui: [C:''+2] Revert "pc1011,pc2011: Disable notifications" [puppet] - ''https://gerrit.wikimedia.org/r/1242748 (owner: ''Marostegui)'
2026-02-24 05:56:03 <logmsgbot> !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1011: Repooling pc1 after migration to Debian trixie
2026-02-24 05:56:03 <logmsgbot> !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache
2026-02-24 05:56:17 <logmsgbot> !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
2026-02-24 05:56:17 <logmsgbot> !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool pc1011: Repooling pc1 after migration to Debian trixie
2026-02-24 06:04:48 <wikibugs> ('CR) ''Marostegui: "@fceratto@wikimedia.org could you please review this? I'd like to push it before the week ends." [puppet] - ''https://gerrit.wikimedia.org/r/1240680 (https://phabricator.wikimedia.org/T285079) (owner: ''Marostegui)'
2026-02-24 06:06:48 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Schema change
2026-02-24 06:08:02 <marostegui> !log Deploy schema change on dbstore1007:3314 T415786
2026-02-24 06:08:06 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 06:08:06 <stashbot> T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
2026-02-24 06:17:27 <wikibugs> ('PS1) ''Marostegui: site.pp: Reorganize pc8 host. [puppet] - ''https://gerrit.wikimedia.org/r/1242764'
2026-02-24 06:18:15 <wikibugs> ('CR) ''Marostegui: "This is a noop" [puppet] - ''https://gerrit.wikimedia.org/r/1242764 (owner: ''Marostegui)'
2026-02-24 06:18:17 <wikibugs> ('CR) ''Marostegui: [C:''+2] site.pp: Reorganize pc8 host. [puppet] - ''https://gerrit.wikimedia.org/r/1242764 (owner: ''Marostegui)'
2026-02-24 06:38:40 <jinxer-wm> FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs2008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 06:59:36 <kostajh> jouncebot: nowandnext
2026-02-24 06:59:37 <jouncebot> No deployments scheduled for the next 0 hour(s) and 0 minute(s)
2026-02-24 06:59:37 <jouncebot> In 0 hour(s) and 0 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T0700)
2026-02-24 06:59:37 <jouncebot> In 0 hour(s) and 0 minute(s): Primary database switchover (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T0700)
2026-02-24 07:00:02 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 24 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242424 (https://phabricator.wikimedia.org/T374718) (owner: ''Kosta Harlan)'
2026-02-24 07:00:05 <jouncebot> Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T0700)
2026-02-24 07:00:05 <jouncebot> marostegui, Amir1, and federico3: Your horoscope predicts another Primary database switchover deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T0700).
2026-02-24 07:02:42 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2247 (T415786)', diff saved to https://phabricator.wikimedia.org/P88998 and previous config saved to /var/cache/conftool/dbconfig/20260224-070241-marostegui.json
2026-02-24 07:02:46 <stashbot> T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
2026-02-24 07:17:50 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P88999 and previous config saved to /var/cache/conftool/dbconfig/20260224-071750-marostegui.json
2026-02-24 07:20:37 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] admin: hashar: disable fetch.prunetags [puppet] - ''https://gerrit.wikimedia.org/r/1242261 (https://phabricator.wikimedia.org/T418085) (owner: ''Hashar)'
2026-02-24 07:28:00 <wikibugs> ('CR) ''Muehlenhoff: [C:''+1] "LGTM" [puppet] - ''https://gerrit.wikimedia.org/r/1242473 (https://phabricator.wikimedia.org/T418010) (owner: ''Eevans)'
2026-02-24 07:29:57 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove obsolete acct toil class [puppet] - ''https://gerrit.wikimedia.org/r/1242292 (owner: ''Muehlenhoff)'
2026-02-24 07:32:59 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P89000 and previous config saved to /var/cache/conftool/dbconfig/20260224-073258-marostegui.json
2026-02-24 07:37:37 <wikibugs> 'SRE, ''ServiceOps new, ''ServiceOps-Mediawiki: Migrate Service Ops Docker images running in production away from Bullseye - https://phabricator.wikimedia.org/T418200 (''JMeybohm) ''NEW'
2026-02-24 07:38:23 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove obsolete config override for git protocol v2 [puppet] - ''https://gerrit.wikimedia.org/r/1242299 (owner: ''Muehlenhoff)'
2026-02-24 07:44:02 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove support for prometheus node exporter 0.17 [puppet] - ''https://gerrit.wikimedia.org/r/1242297 (owner: ''Muehlenhoff)'
2026-02-24 07:47:48 <wikibugs> ('CR) ''Gehel: [C:''+1] Move an HDFS journalnode to a newer host [puppet] - ''https://gerrit.wikimedia.org/r/1242508 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 07:48:03 <wikibugs> ('CR) ''Gehel: [C:''+1] Move a second journalnode to a newer host [puppet] - ''https://gerrit.wikimedia.org/r/1242511 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 07:48:07 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2247 (T415786)', diff saved to https://phabricator.wikimedia.org/P89001 and previous config saved to /var/cache/conftool/dbconfig/20260224-074806-marostegui.json
2026-02-24 07:48:11 <stashbot> T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
2026-02-24 07:48:24 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2248.codfw.wmnet with reason: Maintenance
2026-02-24 07:48:32 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2248 (T415786)', diff saved to https://phabricator.wikimedia.org/P89002 and previous config saved to /var/cache/conftool/dbconfig/20260224-074831-marostegui.json
2026-02-24 07:48:54 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] udev: Remove support for buster [puppet] - ''https://gerrit.wikimedia.org/r/1242294 (owner: ''Muehlenhoff)'
2026-02-24 07:49:28 <wikibugs> ('CR) ''Gehel: "Can we remove the nodes from the network topology AND put them in setup at the same time? Or do we need to apply the network topology firs" [puppet] - ''https://gerrit.wikimedia.org/r/1242513 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 07:49:54 <wikibugs> ('CR) ''Brouberol: [C:''+1] Move an HDFS journalnode to a newer host [puppet] - ''https://gerrit.wikimedia.org/r/1242508 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 07:50:11 <wikibugs> ('CR) ''Gehel: [C:''+1] Add the configuration for the new dse-k8s worker nodes that were an-worker [puppet] - ''https://gerrit.wikimedia.org/r/1242514 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 07:50:13 <wikibugs> ('CR) ''Brouberol: [C:''+1] Move a second journalnode to a newer host [puppet] - ''https://gerrit.wikimedia.org/r/1242511 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 07:50:50 <wikibugs> ('CR) ''Brouberol: [C:''+1] Prepare to decom the old an-worker hosts [puppet] - ''https://gerrit.wikimedia.org/r/1242513 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 07:51:44 <wikibugs> ('CR) ''Brouberol: "Once these hosts get configured as dse-k8s-workers, containers will start getting scheduled on them. Can we make sure that non of them hav" [puppet] - ''https://gerrit.wikimedia.org/r/1242514 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 07:52:52 <wikibugs> ('CR) ''Brouberol: Add the new druid-internal servers to site.pp and preseed.yaml (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1242529 (https://phabricator.wikimedia.org/T417430) (owner: ''Btullis)'
2026-02-24 07:53:10 <wikibugs> ('CR) ''Gehel: [C:''-1] Add the new druid-internal servers to site.pp and preseed.yaml (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1242529 (https://phabricator.wikimedia.org/T417430) (owner: ''Btullis)'
2026-02-24 07:53:11 <wikibugs> ('CR) ''Brouberol: [C:''+1] Add dbstore1010 to site.pp and preseed.yaml [puppet] - ''https://gerrit.wikimedia.org/r/1242533 (https://phabricator.wikimedia.org/T417948) (owner: ''Btullis)'
2026-02-24 07:56:26 <wikibugs> ('PS2) ''Matthias Mullie: Squashed diff to master [extensions/ReaderExperiments] (wmf/1.46.0-wmf.16) - ''https://gerrit.wikimedia.org/r/1242568'
2026-02-24 07:56:48 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] base::kernel: Unconditionally use the autoremove logic [puppet] - ''https://gerrit.wikimedia.org/r/1239696 (owner: ''Muehlenhoff)'
2026-02-24 08:00:05 <jouncebot> Amir1, Urbanecm, and awight: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T0800).
2026-02-24 08:00:05 <jouncebot> Pppery, matthiasmullie, and kostajh: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2026-02-24 08:00:16 <matthiasmullie> o/
2026-02-24 08:01:59 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Unconditionally install puppet-module-puppetlabs-augeas-core [puppet] - ''https://gerrit.wikimedia.org/r/1239889 (owner: ''Muehlenhoff)'
2026-02-24 08:04:27 <kostajh> hi
2026-02-24 08:05:15 <kostajh> I will sync out my config patch towards the end of this window
2026-02-24 08:05:21 <kostajh> stepping away for a while now, though
2026-02-24 08:05:55 <matthiasmullie> I'll get started with my patch now
2026-02-24 08:06:19 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] nftables: Remove support for buster [puppet] - ''https://gerrit.wikimedia.org/r/1219877 (owner: ''Muehlenhoff)'
2026-02-24 08:06:20 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by mlitn@deploy2002 using scap backport" [extensions/ReaderExperiments] (wmf/1.46.0-wmf.16) - ''https://gerrit.wikimedia.org/r/1242568 (owner: ''Matthias Mullie)'
2026-02-24 08:06:53 <wikibugs> ('CR) ''Slyngshede: [C:''+1] "LGTM" [puppet] - ''https://gerrit.wikimedia.org/r/1242407 (owner: ''Muehlenhoff)'
2026-02-24 08:07:50 <wikibugs> ('Merged) ''jenkins-bot: Squashed diff to master [extensions/ReaderExperiments] (wmf/1.46.0-wmf.16) - ''https://gerrit.wikimedia.org/r/1242568 (owner: ''Matthias Mullie)'
2026-02-24 08:08:16 <logmsgbot> !log mlitn@deploy2002 Started scap sync-world: Backport for [[gerrit:1242568|Squashed diff to master]]
2026-02-24 08:10:13 <wikibugs> ('CR) ''Federico Ceratto: [C:''+1] "Puppet is now enabled (doublechecked using sr.puppet(sr.remote().query('db2230*')).check_enabled() ) so the CI can be run again." [puppet] - ''https://gerrit.wikimedia.org/r/1240680 (https://phabricator.wikimedia.org/T285079) (owner: ''Marostegui)'
2026-02-24 08:10:16 <logmsgbot> !log mlitn@deploy2002 mlitn: Backport for [[gerrit:1242568|Squashed diff to master]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-02-24 08:11:44 <logmsgbot> !log mlitn@deploy2002 mlitn: Continuing with sync
2026-02-24 08:15:39 <logmsgbot> !log mlitn@deploy2002 Finished scap sync-world: Backport for [[gerrit:1242568|Squashed diff to master]] (duration: 07m 23s)
2026-02-24 08:17:09 <matthiasmullie> I'm done - outstanding patches: pppery (not here?) & kostajh (will get to his near end of window)
2026-02-24 08:17:53 <matthiasmullie> Oh, I just realized I have another one :D
2026-02-24 08:18:15 <wikibugs> ('PS1) ''Matthias Mullie: Minerva TOC: reserve space for the article page heading button [extensions/MobileFrontend] (wmf/1.46.0-wmf.16) - ''https://gerrit.wikimedia.org/r/1243004 (https://phabricator.wikimedia.org/T417932)'
2026-02-24 08:20:18 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by mlitn@deploy2002 using scap backport" [extensions/MobileFrontend] (wmf/1.46.0-wmf.16) - ''https://gerrit.wikimedia.org/r/1243004 (https://phabricator.wikimedia.org/T417932) (owner: ''Matthias Mullie)'
2026-02-24 08:29:51 <wikibugs> ('PS4) ''Federico Ceratto: service, trafficserver: Prepare "linked-artifacts" k8s pod [puppet] - ''https://gerrit.wikimedia.org/r/1227851 (https://phabricator.wikimedia.org/T414112)'
2026-02-24 08:33:30 <wikibugs> ('Merged) ''jenkins-bot: Minerva TOC: reserve space for the article page heading button [extensions/MobileFrontend] (wmf/1.46.0-wmf.16) - ''https://gerrit.wikimedia.org/r/1243004 (https://phabricator.wikimedia.org/T417932) (owner: ''Matthias Mullie)'
2026-02-24 08:33:49 <logmsgbot> !log mlitn@deploy2002 Started scap sync-world: Backport for [[gerrit:1243004|Minerva TOC: reserve space for the article page heading button (T417932)]]
2026-02-24 08:33:53 <stashbot> T417932: [Minerva TOC] design feedback - https://phabricator.wikimedia.org/T417932
2026-02-24 08:34:36 <wikibugs> ('PS1) ''Slyngshede: Allow blacklisting of domains for signup [software/bitu] - ''https://gerrit.wikimedia.org/r/1243007 (https://phabricator.wikimedia.org/T418201)'
2026-02-24 08:35:40 <logmsgbot> !log mlitn@deploy2002 mlitn: Backport for [[gerrit:1243004|Minerva TOC: reserve space for the article page heading button (T417932)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-02-24 08:36:27 <logmsgbot> !log mlitn@deploy2002 mlitn: Continuing with sync
2026-02-24 08:40:07 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Remove HPSA RAID support [puppet] - ''https://gerrit.wikimedia.org/r/1237499 (owner: ''Muehlenhoff)'
2026-02-24 08:40:22 <logmsgbot> !log mlitn@deploy2002 Finished scap sync-world: Backport for [[gerrit:1243004|Minerva TOC: reserve space for the article page heading button (T417932)]] (duration: 06m 33s)
2026-02-24 08:40:26 <stashbot> T417932: [Minerva TOC] design feedback - https://phabricator.wikimedia.org/T417932
2026-02-24 08:42:05 <matthiasmullie> I'm done - outstanding patches: pppery (not here?) & kostajh (will get to his near end of window)
2026-02-24 08:45:30 <wikibugs> 'SRE, ''Infrastructure-Foundations: decom cookbook used Junos commands on a Nokia switch - https://phabricator.wikimedia.org/T417428#11643702 (''ayounsi) ''Open''Resolved This is done.'
2026-02-24 08:56:34 <wikibugs> ('CR) ''Arnaudb: [C:''+2] gerrit: prevent NodeTextfileStale alert on nft throttling [alerts] - ''https://gerrit.wikimedia.org/r/1242413 (https://phabricator.wikimedia.org/T418139) (owner: ''Arnaudb)'
2026-02-24 08:57:37 <kostajh> ok, syncing my patch now
2026-02-24 08:57:42 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242424 (https://phabricator.wikimedia.org/T374718) (owner: ''Kosta Harlan)'
2026-02-24 08:58:09 <wikibugs> ('Merged) ''jenkins-bot: gerrit: prevent NodeTextfileStale alert on nft throttling [alerts] - ''https://gerrit.wikimedia.org/r/1242413 (https://phabricator.wikimedia.org/T418139) (owner: ''Arnaudb)'
2026-02-24 08:58:40 <wikibugs> ('Merged) ''jenkins-bot: IPInfo: Grant ipinfo-view-arbitrary-ip to checkuser group [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1242424 (https://phabricator.wikimedia.org/T374718) (owner: ''Kosta Harlan)'
2026-02-24 08:58:58 <logmsgbot> !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1242424|IPInfo: Grant ipinfo-view-arbitrary-ip to checkuser group (T374718)]]
2026-02-24 08:59:03 <stashbot> T374718: Allow Special:IPInfo to return IP information of arbitrary addresses for users with the correct permissions - https://phabricator.wikimedia.org/T374718
2026-02-24 09:00:52 <wikibugs> ('CR) ''Marostegui: "recheck" [puppet] - ''https://gerrit.wikimedia.org/r/1240680 (https://phabricator.wikimedia.org/T285079) (owner: ''Marostegui)'
2026-02-24 09:01:10 <wikibugs> ('CR) ''Marostegui: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1240680 (https://phabricator.wikimedia.org/T285079) (owner: ''Marostegui)'
2026-02-24 09:01:12 <logmsgbot> !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1242424|IPInfo: Grant ipinfo-view-arbitrary-ip to checkuser group (T374718)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-02-24 09:04:18 <logmsgbot> !log kharlan@deploy2002 kharlan: Continuing with sync
2026-02-24 09:07:34 <wikibugs> ('PS4) ''Marostegui: mariadb: Alert on pt-heartbeat not running [puppet] - ''https://gerrit.wikimedia.org/r/1240680 (https://phabricator.wikimedia.org/T285079)'
2026-02-24 09:08:27 <logmsgbot> !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1242424|IPInfo: Grant ipinfo-view-arbitrary-ip to checkuser group (T374718)]] (duration: 09m 29s)
2026-02-24 09:08:32 <stashbot> T374718: Allow Special:IPInfo to return IP information of arbitrary addresses for users with the correct permissions - https://phabricator.wikimedia.org/T374718
2026-02-24 09:09:36 <wikibugs> ('CR) ''Marostegui: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1240680 (https://phabricator.wikimedia.org/T285079) (owner: ''Marostegui)'
2026-02-24 09:09:42 <jinxer-wm> FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-02-24 09:11:58 <wikibugs> ('CR) ''Marostegui: "PCC reran: https://puppet-compiler.wmflabs.org/output/1240680/5893/"; [puppet] - ''https://gerrit.wikimedia.org/r/1240680 (https://phabricator.wikimedia.org/T285079) (owner: ''Marostegui)'
2026-02-24 09:12:33 <wikibugs> ('PS1) ''Brouberol: growhbook: disable frontend telemetry [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243023 (https://phabricator.wikimedia.org/T418211)'
2026-02-24 09:13:48 <wikibugs> ('CR) ''Jaime Nuche: [C:''+1] "Thank you Daniel!" [puppet] - ''https://gerrit.wikimedia.org/r/1242483 (https://phabricator.wikimedia.org/T418109) (owner: ''Dzahn)'
2026-02-24 09:17:27 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] analytics::cluster::packages::common: Remove support for buster [puppet] - ''https://gerrit.wikimedia.org/r/1237493 (owner: ''Muehlenhoff)'
2026-02-24 09:25:42 <wikibugs> ('PS1) ''Muehlenhoff: apt: Remove support for Buster [puppet] - ''https://gerrit.wikimedia.org/r/1243035'
2026-02-24 09:28:21 <wikibugs> ('CR) ''Muehlenhoff: [C:''+1] "Looks good!" [software/bitu] - ''https://gerrit.wikimedia.org/r/1243007 (https://phabricator.wikimedia.org/T418201) (owner: ''Slyngshede)'
2026-02-24 09:28:34 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243035 (owner: ''Muehlenhoff)'
2026-02-24 09:28:50 <wikibugs> ('PS5) ''Ayounsi: WIP: create cookbook to depool all services in a given rack [cookbooks] - ''https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300)'
2026-02-24 09:30:28 <wikibugs> 'SRE, ''Bitu, ''Infrastructure-Foundations, ''Patch-For-Review: wikimedia-l was signed up for a developer account - https://phabricator.wikimedia.org/T418201#11644006 (''Peachey88)'
2026-02-24 09:30:50 <wikibugs> ('CR) ''Brouberol: [C:''-1] "Now that the signature of the `get_next_clusters_nodes` method has been changed, the change needs to be reflected here as well" [cookbooks] - ''https://gerrit.wikimedia.org/r/1235113 (https://phabricator.wikimedia.org/T410577) (owner: ''Ryan Kemper)'
2026-02-24 09:31:46 <wikibugs> ('PS6) ''Ayounsi: WIP: create cookbook to depool all services in a given rack [cookbooks] - ''https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300)'
2026-02-24 09:34:19 <wikibugs> ('CR) ''Arnaudb: gerrit: swap gerrit-spare and gerrit-replica (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1242269 (https://phabricator.wikimedia.org/T406334) (owner: ''Arnaudb)'
2026-02-24 09:34:33 <wikibugs> ('CR) ''Slyngshede: [C:''+2] Allow blacklisting of domains for signup [software/bitu] - ''https://gerrit.wikimedia.org/r/1243007 (https://phabricator.wikimedia.org/T418201) (owner: ''Slyngshede)'
2026-02-24 09:34:38 <wikibugs> ('PS7) ''Ayounsi: WIP: create cookbook to depool all services in a given rack [cookbooks] - ''https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300)'
2026-02-24 09:35:24 <wikibugs> ('PS1) ''Brouberol: deployment_server: provision the dse-k8s opensearch-operator-3 kubeconfigs [puppet] - ''https://gerrit.wikimedia.org/r/1243041 (https://phabricator.wikimedia.org/T418176)'
2026-02-24 09:36:08 <wikibugs> ('PS1) ''Filippo Giunchedi: hieradata: route toolhub probe alerts to wmcs [puppet] - ''https://gerrit.wikimedia.org/r/1243042 (https://phabricator.wikimedia.org/T316682)'
2026-02-24 09:36:40 <jinxer-wm> FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 09:37:16 <wikibugs> ('PS1) ''Muehlenhoff: ferm: Remove obsolete OS check [puppet] - ''https://gerrit.wikimedia.org/r/1243045'
2026-02-24 09:37:26 <wikibugs> ('Merged) ''jenkins-bot: Allow blacklisting of domains for signup [software/bitu] - ''https://gerrit.wikimedia.org/r/1243007 (https://phabricator.wikimedia.org/T418201) (owner: ''Slyngshede)'
2026-02-24 09:38:00 <wikibugs> ('PS1) ''Brouberol: dse-k8s: define the opensearch-operator-3 namespace to all clusters [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243046 (https://phabricator.wikimedia.org/T418176)'
2026-02-24 09:38:54 <wikibugs> ('CR) ''Dpogorzelski: [C:''+2] kserve: fix dependency on cert-manager [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242439 (owner: ''Dpogorzelski)'
2026-02-24 09:40:07 <wikibugs> ('PS8) ''Ayounsi: WIP: create cookbook to depool all services in a given rack [cookbooks] - ''https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300)'
2026-02-24 09:40:36 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243045 (owner: ''Muehlenhoff)'
2026-02-24 09:41:58 <wikibugs> ('PS1) ''Awight: Subreferencing pilot wikis, phase 2 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243047 (https://phabricator.wikimedia.org/T418209)'
2026-02-24 09:42:10 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.ganeti.makevm for new host dborch1003.eqiad.wmnet
2026-02-24 09:42:12 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 09:42:21 <wikibugs> ('PS9) ''Ayounsi: WIP: create cookbook to depool all services in a given rack [cookbooks] - ''https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300)'
2026-02-24 09:42:49 <wikibugs> ('PS1) ''Muehlenhoff: mtail: Use the Debian version of mtail universally [puppet] - ''https://gerrit.wikimedia.org/r/1243048'
2026-02-24 09:43:11 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 24 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243047 (https://phabricator.wikimedia.org/T418209) (owner: ''Awight)'
2026-02-24 09:43:43 <logmsgbot> !log slyngshede@cumin1003 START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
2026-02-24 09:43:44 <wikibugs> ('CR) ''Majavah: ferm: Remove obsolete OS check (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1243045 (owner: ''Muehlenhoff)'
2026-02-24 09:44:52 <logmsgbot> !log fceratto@cumin1003 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
2026-02-24 09:45:21 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 09:45:33 <logmsgbot> !log fceratto@cumin1003 END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
2026-02-24 09:45:37 <logmsgbot> !log fceratto@cumin1003 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1003.eqiad.wmnet
2026-02-24 09:45:42 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.ganeti.makevm for new host dborch1003.eqiad.wmnet
2026-02-24 09:45:45 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 09:47:00 <wikibugs> ('CR) ''Marostegui: "As you +1ed and the PCC looks good on db2230 I am merging!" [puppet] - ''https://gerrit.wikimedia.org/r/1240680 (https://phabricator.wikimedia.org/T285079) (owner: ''Marostegui)'
2026-02-24 09:47:10 <wikibugs> ('CR) ''Marostegui: [C:''+2] mariadb: Alert on pt-heartbeat not running [puppet] - ''https://gerrit.wikimedia.org/r/1240680 (https://phabricator.wikimedia.org/T285079) (owner: ''Marostegui)'
2026-02-24 09:48:07 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243048 (owner: ''Muehlenhoff)'
2026-02-24 09:50:23 <wikibugs> ('CR) ''Majavah: hieradata: route toolhub probe alerts to wmcs (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1243042 (https://phabricator.wikimedia.org/T316682) (owner: ''Filippo Giunchedi)'
2026-02-24 09:51:14 <wikibugs> ('CR) ''Arnaudb: [C:''+2] gerrit: alert for broken replication [alerts] - ''https://gerrit.wikimedia.org/r/1242399 (https://phabricator.wikimedia.org/T418084) (owner: ''Arnaudb)'
2026-02-24 09:51:39 <logmsgbot> fceratto@cumin1003 makevm (PID 535318) is awaiting input
2026-02-24 09:51:41 <wikibugs> ('PS1) ''Dpogorzelski: ml-serve-eqiad: k8s upgrade [puppet] - ''https://gerrit.wikimedia.org/r/1243053'
2026-02-24 09:53:50 <wikibugs> ('PS1) ''Dpogorzelski: ml-serve-eqiad: k8s upgrade [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243054'
2026-02-24 09:53:59 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1003.eqiad.wmnet - fceratto@cumin1003"
2026-02-24 09:54:04 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1003.eqiad.wmnet - fceratto@cumin1003"
2026-02-24 09:54:04 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-02-24 09:54:04 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.dns.wipe-cache dborch1003.eqiad.wmnet on all recursors
2026-02-24 09:54:08 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1003.eqiad.wmnet on all recursors
2026-02-24 09:54:34 <logmsgbot> !log dpogorzelski@cumin1003 START - Cookbook sre.k8s.pool-depool-cluster depool all services in eqiad/ml-serve-eqiad: maintenance
2026-02-24 09:54:35 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dborch1003.eqiad.wmnet - fceratto@cumin1003"
2026-02-24 09:54:40 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dborch1003.eqiad.wmnet - fceratto@cumin1003"
2026-02-24 09:55:21 <logmsgbot> !log dpogorzelski@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) depool all services in eqiad/ml-serve-eqiad: maintenance
2026-02-24 09:55:40 <wikibugs> ('Merged) ''jenkins-bot: gerrit: alert for broken replication [alerts] - ''https://gerrit.wikimedia.org/r/1242399 (https://phabricator.wikimedia.org/T418084) (owner: ''Arnaudb)'
2026-02-24 09:55:42 <logmsgbot> !log dpogorzelski@cumin1003 conftool action : set/pooled=false; selector: dnsdisc=recommendation-api,name=codfw
2026-02-24 09:55:52 <icinga-wm> PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2007.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2010.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 09:55:56 <wikibugs> ('PS2) ''Elukey: profile::httpbb::docker-registry: improve tests [puppet] - ''https://gerrit.wikimedia.org/r/1242452 (https://phabricator.wikimedia.org/T414576)'
2026-02-24 09:56:26 <wikibugs> ('CR) ''Elukey: "Reworked all the tests and tested them on cumin1003 :)" [puppet] - ''https://gerrit.wikimedia.org/r/1242452 (https://phabricator.wikimedia.org/T414576) (owner: ''Elukey)'
2026-02-24 09:56:38 <wikibugs> ('CR) ''Btullis: [C:''+2] Move an HDFS journalnode to a newer host [puppet] - ''https://gerrit.wikimedia.org/r/1242508 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 09:56:52 <icinga-wm> RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 09:57:14 <wikibugs> ('PS10) ''Ayounsi: WIP: create cookbook to depool all services in a given rack [cookbooks] - ''https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300)'
2026-02-24 09:57:38 <icinga-wm> PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2015.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2008.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 09:57:41 <logmsgbot> fceratto@cumin1003 makevm (PID 535318) is awaiting input
2026-02-24 09:59:52 <icinga-wm> PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2013.codfw.wmnet, wdqs2021.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2022.codfw.wmnet, wdqs2008.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 10:00:23 <wikibugs> ('CR) ''Filippo Giunchedi: hieradata: route toolhub probe alerts to wmcs (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1243042 (https://phabricator.wikimedia.org/T316682) (owner: ''Filippo Giunchedi)'
2026-02-24 10:00:30 <wikibugs> ('CR) ''JMeybohm: [C:''+1] profile::httpbb::docker-registry: improve tests [puppet] - ''https://gerrit.wikimedia.org/r/1242452 (https://phabricator.wikimedia.org/T414576) (owner: ''Elukey)'
2026-02-24 10:00:36 <logmsgbot> !log dpogorzelski@cumin1003 START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster ml-serve-eqiad: Kubernetes upgrade
2026-02-24 10:01:07 <wikibugs> ('CR) ''Elukey: [C:''+2] profile::httpbb::docker-registry: improve tests [puppet] - ''https://gerrit.wikimedia.org/r/1242452 (https://phabricator.wikimedia.org/T414576) (owner: ''Elukey)'
2026-02-24 10:01:15 <wikibugs> ('CR) ''Thiemo Kreuz (WMDE): [C:''+1] Subreferencing pilot wikis, phase 2 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243047 (https://phabricator.wikimedia.org/T418209) (owner: ''Awight)'
2026-02-24 10:02:09 <wikibugs> ('CR) ''Hashar: gerrit: alert for broken replication (''1 comment) [alerts] - ''https://gerrit.wikimedia.org/r/1242399 (https://phabricator.wikimedia.org/T418084) (owner: ''Arnaudb)'
2026-02-24 10:02:58 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.hosts.reimage for host dborch1003.eqiad.wmnet with OS trixie
2026-02-24 10:02:58 <wikibugs> ('CR) ''Dpogorzelski: [C:''+2] ml-serve-eqiad: k8s upgrade [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243054 (owner: ''Dpogorzelski)'
2026-02-24 10:03:36 <icinga-wm> PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - inference_30443: Servers ml-serve1010.eqiad.wmnet, ml-serve1009.eqiad.wmnet, ml-serve1007.eqiad.wmnet, ml-serve1003.eqiad.wmnet, ml-serve1006.eqiad.wmnet, ml-serve1004.eqiad.wmnet are marked down but pooled: k8s-ingress-ml-serve_31443: Servers ml-serve1007.eqiad.wmnet, ml-serve1005.eqiad.wmnet, ml-serve1003.eqiad.wmnet, ml-serve1011.eqiad.wmnet, ml-s
2026-02-24 10:03:36 <icinga-wm> .eqiad.wmnet, ml-serve1002.eqiad.wmnet are marked down but pooled: ml-ctrl_6443: Servers ml-serve-ctrl1002.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 10:03:38 <icinga-wm> PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - ml-ctrl_6443: Servers ml-serve-ctrl1002.eqiad.wmnet are marked down but pooled: k8s-ingress-ml-serve_31443: Servers ml-serve1008.eqiad.wmnet, ml-serve1009.eqiad.wmnet, ml-serve1005.eqiad.wmnet, ml-serve1011.eqiad.wmnet, ml-serve1006.eqiad.wmnet, ml-serve1002.eqiad.wmnet are marked down but pooled: inference_30443: Servers ml-serve1008.eqiad.wmnet, ml
2026-02-24 10:03:38 <icinga-wm> 07.eqiad.wmnet, ml-serve1005.eqiad.wmnet, ml-serve1006.eqiad.wmnet, ml-serve1011.eqiad.wmnet, ml-serve1002.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 10:05:13 <wikibugs> ('CR) ''Dpogorzelski: [C:''+2] ml-serve-eqiad: k8s upgrade [puppet] - ''https://gerrit.wikimedia.org/r/1243053 (owner: ''Dpogorzelski)'
2026-02-24 10:05:34 <logmsgbot> dpogorzelski@cumin1003 wipe-cluster (PID 551887) is awaiting input
2026-02-24 10:06:19 <wikibugs> ('CR) ''Dpogorzelski: [V:''+2 C:''+2] ml-serve-eqiad: k8s upgrade [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243054 (owner: ''Dpogorzelski)'
2026-02-24 10:06:53 <wikibugs> 'SRE, ''ServiceOps new, ''Kubernetes, ''Patch-For-Review: Failing docker registry httpbb tests - https://phabricator.wikimedia.org/T414576#11644188 (''elukey) ''Open''Resolved ` elukey@cumin1003:~$ sudo httpbb --hosts registry2004.codfw.wmnet /srv/deployment/httpbb-tests/docker-registry/test_dock...'
2026-02-24 10:13:22 <jinxer-wm> FIRING: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
2026-02-24 10:17:36 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
2026-02-24 10:17:46 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
2026-02-24 10:17:55 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
2026-02-24 10:18:20 <logmsgbot> dpogorzelski@cumin1003 wipe-cluster (PID 551887) is awaiting input
2026-02-24 10:18:23 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
2026-02-24 10:18:53 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
2026-02-24 10:19:22 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
2026-02-24 10:19:38 <wikibugs> ('CR) ''Arnaudb: [C:''+2] gerrit: alert for broken replication (''1 comment) [alerts] - ''https://gerrit.wikimedia.org/r/1242399 (https://phabricator.wikimedia.org/T418084) (owner: ''Arnaudb)'
2026-02-24 10:21:39 <wikibugs> ('PS3) ''Arnaudb: gerrit: remove code for having multiple daemon users [puppet] - ''https://gerrit.wikimedia.org/r/1242467 (https://phabricator.wikimedia.org/T338470) (owner: ''Dzahn)'
2026-02-24 10:21:45 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
2026-02-24 10:21:52 <wikibugs> ('PS3) ''Dzahn: releases: upgrade Java version from 17 to 21 [puppet] - ''https://gerrit.wikimedia.org/r/1242483 (https://phabricator.wikimedia.org/T418109)'
2026-02-24 10:21:53 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
2026-02-24 10:22:01 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
2026-02-24 10:22:14 <wikibugs> ('CR) ''Arnaudb: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1242467 (https://phabricator.wikimedia.org/T338470) (owner: ''Dzahn)'
2026-02-24 10:22:46 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
2026-02-24 10:22:56 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
2026-02-24 10:23:42 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
2026-02-24 10:23:53 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
2026-02-24 10:23:58 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
2026-02-24 10:24:10 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
2026-02-24 10:24:13 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
2026-02-24 10:27:24 <wikibugs> ('PS1) ''MVernon: apus: add two new frontends in codfw [puppet] - ''https://gerrit.wikimedia.org/r/1243066 (https://phabricator.wikimedia.org/T416387)'
2026-02-24 10:27:27 <wikibugs> ('PS1) ''MVernon: apus: remove two codfw frontends for decom [puppet] - ''https://gerrit.wikimedia.org/r/1243067 (https://phabricator.wikimedia.org/T416387)'
2026-02-24 10:28:46 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
2026-02-24 10:28:47 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
2026-02-24 10:30:00 <wikibugs> ('CR) ''MVernon: "One query, which may be me missing something :)" [puppet] - ''https://gerrit.wikimedia.org/r/1242473 (https://phabricator.wikimedia.org/T418010) (owner: ''Eevans)'
2026-02-24 10:30:58 <wikibugs> ('CR) ''Marostegui: [C:''+1] apus: add two new frontends in codfw [puppet] - ''https://gerrit.wikimedia.org/r/1243066 (https://phabricator.wikimedia.org/T416387) (owner: ''MVernon)'
2026-02-24 10:32:41 <wikibugs> ('CR) ''MVernon: [C:''+2] apus: add two new frontends in codfw [puppet] - ''https://gerrit.wikimedia.org/r/1243066 (https://phabricator.wikimedia.org/T416387) (owner: ''MVernon)'
2026-02-24 10:33:41 <wikibugs> ('PS1) ''Muehlenhoff: Remove OS check for nrpe2nodexp [puppet] - ''https://gerrit.wikimedia.org/r/1243068'
2026-02-24 10:34:01 <icinga-wm> PROBLEM - Blazegraph Port for wdqs-blazegraph on wdqs2008 is CRITICAL: connect to address 127.0.0.1 and port 9999: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
2026-02-24 10:34:39 <icinga-wm> RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 10:34:40 <logmsgbot> !log slyngshede@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie
2026-02-24 10:34:57 <icinga-wm> RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 10:34:59 <icinga-wm> RECOVERY - Blazegraph Port for wdqs-blazegraph on wdqs2008 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
2026-02-24 10:35:10 <logmsgbot> !log slyngshede@cumin1003 START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
2026-02-24 10:36:14 <logmsgbot> !log slyngshede@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie
2026-02-24 10:38:40 <jinxer-wm> FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs2008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 10:38:41 <wikibugs> ('PS1) ''Muehlenhoff: syslog::remote: Remove buster workarounds [puppet] - ''https://gerrit.wikimedia.org/r/1243069'
2026-02-24 10:41:42 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
2026-02-24 10:42:48 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to deployment for Eileen McFarland - https://phabricator.wikimedia.org/T418221 (''EMcFarland-WMF) ''NEW'
2026-02-24 10:44:15 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] envoy: Allow inboundonly drain and support min wait time [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1242462 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 10:44:33 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
2026-02-24 10:45:11 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] mesh: Set traffic_direction to INBOUND on local TLS listeners [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242518 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 10:46:16 <wikibugs> ('PS1) ''Marostegui: db2230: Enable notifications [puppet] - ''https://gerrit.wikimedia.org/r/1243070 (https://phabricator.wikimedia.org/T285079)'
2026-02-24 10:46:27 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] mesh: Support injection of extra env vars into envoy container [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242520 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 10:46:44 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] mediawiki: Bump mesh.configuration and mesh.deployment [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242521 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 10:48:03 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243068 (owner: ''Muehlenhoff)'
2026-02-24 10:48:06 <wikibugs> ('Abandoned) ''Clément Goubert: mw-debug: Immediately drain envoy on termination [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242354 (https://phabricator.wikimedia.org/T364245) (owner: ''Clément Goubert)'
2026-02-24 10:48:22 <jinxer-wm> RESOLVED: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
2026-02-24 10:49:24 <logmsgbot> dpogorzelski@cumin1003 wipe-cluster (PID 551887) is awaiting input
2026-02-24 10:49:54 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243069 (owner: ''Muehlenhoff)'
2026-02-24 10:51:04 <logmsgbot> !log fceratto@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dborch1003.eqiad.wmnet with OS trixie
2026-02-24 10:51:04 <logmsgbot> !log fceratto@cumin1003 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host dborch1003.eqiad.wmnet
2026-02-24 10:51:06 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
2026-02-24 10:51:18 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.ganeti.makevm for new host dborch1003.eqiad.wmnet
2026-02-24 10:51:19 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 10:51:34 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
2026-02-24 10:51:52 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
2026-02-24 10:52:02 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
2026-02-24 10:52:17 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
2026-02-24 10:53:38 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
2026-02-24 10:53:45 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
2026-02-24 10:54:03 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] mesh: Support injection of extra env vars into envoy container (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242520 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 10:54:04 <wikibugs> ('CR) ''Muehlenhoff: [C:''+1] "Note that Bookworm doesn't provide Java 21 natively, this is a locally maintained component. We're updating it mostly to allow the Gerrit " [puppet] - ''https://gerrit.wikimedia.org/r/1242483 (https://phabricator.wikimedia.org/T418109) (owner: ''Dzahn)'
2026-02-24 10:54:09 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
2026-02-24 10:54:14 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] mw-debug: Pilot new drain configuration [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242522 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 10:54:23 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
2026-02-24 10:54:42 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
2026-02-24 10:54:55 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
2026-02-24 10:55:05 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
2026-02-24 10:55:33 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
2026-02-24 10:55:51 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
2026-02-24 10:56:11 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
2026-02-24 10:56:23 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
2026-02-24 10:56:35 <icinga-wm> RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 10:56:39 <icinga-wm> RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 10:56:49 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
2026-02-24 10:56:52 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1252 (T415786)', diff saved to https://phabricator.wikimedia.org/P89003 and previous config saved to /var/cache/conftool/dbconfig/20260224-105651-marostegui.json
2026-02-24 10:56:56 <stashbot> T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
2026-02-24 10:57:03 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
2026-02-24 10:57:05 <logmsgbot> fceratto@cumin1003 makevm (PID 606166) is awaiting input
2026-02-24 10:57:20 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
2026-02-24 10:58:57 <logmsgbot> !log dpogorzelski@cumin1003 END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster ml-serve-eqiad: Kubernetes upgrade
2026-02-24 10:59:15 <logmsgbot> !log dpogorzelski@cumin1003 START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance
2026-02-24 11:00:05 <jouncebot> Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1100)
2026-02-24 11:00:12 <logmsgbot> !log dpogorzelski@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance
2026-02-24 11:00:17 <logmsgbot> !log dpogorzelski@cumin1003 conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad
2026-02-24 11:00:23 <logmsgbot> !log dpogorzelski@cumin1003 conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=codfw
2026-02-24 11:01:39 <icinga-wm> PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2008.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2013.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:01:57 <icinga-wm> PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2012.codfw.wmnet, wdqs2013.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:02:41 <wikibugs> ('CR) ''Muehlenhoff: Inform about gitlab profile updating quirks (''1 comment) [software/bitu] - ''https://gerrit.wikimedia.org/r/1242389 (https://phabricator.wikimedia.org/T416898) (owner: ''Slyngshede)'
2026-02-24 11:04:30 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''ServiceOps new, ''ServiceOps-Datastores: Upgrade Kafka to version 3.5 - https://phabricator.wikimedia.org/T416669#11644404 (''JMeybohm)'
2026-02-24 11:07:39 <icinga-wm> RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:07:57 <icinga-wm> RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:08:29 <wikibugs> ('PS1) ''JMeybohm: sre.k8s.pool-depool-node: Fix type annotation [cookbooks] - ''https://gerrit.wikimedia.org/r/1243075 (https://phabricator.wikimedia.org/T410537)'
2026-02-24 11:12:00 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89005 and previous config saved to /var/cache/conftool/dbconfig/20260224-111159-marostegui.json
2026-02-24 11:12:24 <logmsgbot> !log mvernon@cumin2002 conftool action : set/weight=40; selector: service=apus,name=apus-fe2004.codfw.wmnet
2026-02-24 11:12:39 <icinga-wm> PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2013.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:12:57 <icinga-wm> PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2013.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:13:20 <logmsgbot> !log slyngshede@cumin1003 START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
2026-02-24 11:13:43 <logmsgbot> !log mvernon@cumin2002 conftool action : set/weight=40; selector: service=apus,name=apus-fe2005.codfw.wmnet
2026-02-24 11:13:55 <logmsgbot> !log mvernon@cumin2002 conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2004.codfw.wmnet
2026-02-24 11:14:01 <logmsgbot> !log mvernon@cumin2002 conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2005.codfw.wmnet
2026-02-24 11:14:03 <logmsgbot> !log dpogorzelski@cumin1003 START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance
2026-02-24 11:14:03 <logmsgbot> !log dpogorzelski@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance
2026-02-24 11:14:43 <wikibugs> ('PS2) ''Fabfur: hiera: test haproxy 3.0 on cp7001 [puppet] - ''https://gerrit.wikimedia.org/r/1242427'
2026-02-24 11:15:18 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Run IDM spec tests on Bookworm/Trixie [puppet] - ''https://gerrit.wikimedia.org/r/1240840 (owner: ''Muehlenhoff)'
2026-02-24 11:17:45 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Gerrit-Privilege-Requests, ''Release-Engineering-Team, ''Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11644447 (''IBerker-WMF) As Riku's manager, I approve.'
2026-02-24 11:20:37 <icinga-wm> RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:21:31 <Emperor> !log depool moss-fe200{1,2} prep for decommissioning T416387
2026-02-24 11:21:35 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 11:21:36 <stashbot> T416387: Q3:rack/setup/install apus-fe200[4-5] - https://phabricator.wikimedia.org/T416387
2026-02-24 11:24:15 <wikibugs> ('CR) ''Jcrespo: [C:''+1] apus: remove two codfw frontends for decom [puppet] - ''https://gerrit.wikimedia.org/r/1243067 (https://phabricator.wikimedia.org/T416387) (owner: ''MVernon)'
2026-02-24 11:24:37 <icinga-wm> PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2008.codfw.wmnet, wdqs2010.codfw.wmnet, wdqs2013.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:24:56 <wikibugs> ('CR) ''MVernon: [C:''+2] apus: remove two codfw frontends for decom [puppet] - ''https://gerrit.wikimedia.org/r/1243067 (https://phabricator.wikimedia.org/T416387) (owner: ''MVernon)'
2026-02-24 11:26:28 <logmsgbot> fceratto@cumin1003 makevm (PID 606166) is awaiting input
2026-02-24 11:26:57 <icinga-wm> RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:27:09 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89006 and previous config saved to /var/cache/conftool/dbconfig/20260224-112708-marostegui.json
2026-02-24 11:27:37 <icinga-wm> RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:28:07 <wikibugs> ('PS1) ''Ayounsi: [WIP] Add depool strategy for rack depool cookbook [puppet] - ''https://gerrit.wikimedia.org/r/1243077 (https://phabricator.wikimedia.org/T327300)'
2026-02-24 11:28:29 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] puppetdb: Drop firewall rule for access to Puppet 5 servers [puppet] - ''https://gerrit.wikimedia.org/r/1239647 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 11:29:25 <logmsgbot> !log mvernon@cumin2002 START - Cookbook sre.hosts.decommission for hosts moss-fe[2001-2002].codfw.wmnet
2026-02-24 11:32:39 <icinga-wm> PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2011.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:33:57 <wikibugs> ('PS2) ''Slyngshede: Inform about gitlab profile updating quirks [software/bitu] - ''https://gerrit.wikimedia.org/r/1242389 (https://phabricator.wikimedia.org/T416898)'
2026-02-24 11:34:06 <wikibugs> ('CR) ''Slyngshede: Inform about gitlab profile updating quirks (''1 comment) [software/bitu] - ''https://gerrit.wikimedia.org/r/1242389 (https://phabricator.wikimedia.org/T416898) (owner: ''Slyngshede)'
2026-02-24 11:36:04 <logmsgbot> !log mvernon@cumin2002 START - Cookbook sre.dns.netbox
2026-02-24 11:36:58 <icinga-wm> PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2007.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:38:32 <logmsgbot> !log fceratto@cumin1003 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
2026-02-24 11:38:36 <logmsgbot> !log fceratto@cumin1003 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1003.eqiad.wmnet
2026-02-24 11:39:44 <icinga-wm> RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:39:58 <icinga-wm> RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 11:40:28 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host an-worker1204.eqiad.wmnet
2026-02-24 11:42:17 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1252 (T415786)', diff saved to https://phabricator.wikimedia.org/P89007 and previous config saved to /var/cache/conftool/dbconfig/20260224-114217-marostegui.json
2026-02-24 11:42:21 <stashbot> T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
2026-02-24 11:42:35 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1260.eqiad.wmnet with reason: Maintenance
2026-02-24 11:42:43 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1260 (T415786)', diff saved to https://phabricator.wikimedia.org/P89008 and previous config saved to /var/cache/conftool/dbconfig/20260224-114242-marostegui.json
2026-02-24 11:45:09 <logmsgbot> mvernon@cumin2002 decommission (PID 2599856) is awaiting input
2026-02-24 11:47:17 <jinxer-wm> FIRING: [3x] ProbeDown: Service wdqs2007:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-02-24 11:48:21 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1204.eqiad.wmnet
2026-02-24 11:49:09 <logmsgbot> mvernon@cumin2002 decommission (PID 2599856) is awaiting input
2026-02-24 11:52:02 <logmsgbot> !log slyngshede@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie
2026-02-24 11:52:17 <jinxer-wm> FIRING: [6x] ProbeDown: Service wdqs2007:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-02-24 11:52:46 <logmsgbot> !log slyngshede@cumin1003 START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
2026-02-24 11:55:32 <wikibugs> ('PS5) ''Tiziano Fogli: ldap_users_sync.py: add non-blocking errors handling [puppet] - ''https://gerrit.wikimedia.org/r/1243063 (https://phabricator.wikimedia.org/T418118)'
2026-02-24 11:55:48 <wikibugs> ('PS1) ''Tiziano Fogli: ldap_users_sync.py: format code [puppet] - ''https://gerrit.wikimedia.org/r/1243062 (https://phabricator.wikimedia.org/T418118)'
2026-02-24 12:01:44 <wikibugs> ('CR) ''Muehlenhoff: [C:''+1] "Look good!" [software/bitu] - ''https://gerrit.wikimedia.org/r/1242389 (https://phabricator.wikimedia.org/T416898) (owner: ''Slyngshede)'
2026-02-24 12:02:49 <logmsgbot> mvernon@cumin2002 decommission (PID 2599856) is awaiting input
2026-02-24 12:03:05 <wikibugs> ('PS1) ''Muehlenhoff: Remove various Hiera files only necessary for Puppet 5 [puppet] - ''https://gerrit.wikimedia.org/r/1243087 (https://phabricator.wikimedia.org/T365798)'
2026-02-24 12:03:16 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] sre.k8s.pool-depool-node: Fix type annotation [cookbooks] - ''https://gerrit.wikimedia.org/r/1243075 (https://phabricator.wikimedia.org/T410537) (owner: ''JMeybohm)'
2026-02-24 12:05:26 <logmsgbot> !log slyngshede@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie
2026-02-24 12:05:45 <logmsgbot> !log slyngshede@cumin1003 START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
2026-02-24 12:05:51 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] Switch math sandbox specs to plain wikimedia.org [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1224253 (https://phabricator.wikimedia.org/T418188) (owner: ''Aaron Schulz)'
2026-02-24 12:06:25 <wikibugs> ('PS1) ''Muehlenhoff: Remove create_ecdsa_cert [puppet] - ''https://gerrit.wikimedia.org/r/1243090 (https://phabricator.wikimedia.org/T365798)'
2026-02-24 12:07:05 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243087 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 12:10:22 <wikibugs> ('CR) ''Filippo Giunchedi: [C:''+1] "Neat" [puppet] - ''https://gerrit.wikimedia.org/r/1243069 (owner: ''Muehlenhoff)'
2026-02-24 12:14:06 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243090 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 12:17:17 <jinxer-wm> FIRING: [6x] ProbeDown: Service wdqs2007:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-02-24 12:23:11 <logmsgbot> !log dpogorzelski@cumin1003 START - Cookbook sre.k8s.pool-depool-cluster pool all services in codfw/ml-staging-codfw: maintenance
2026-02-24 12:23:11 <logmsgbot> !log dpogorzelski@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in codfw/ml-staging-codfw: maintenance
2026-02-24 12:24:48 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] puppetserver: Update two hooks to the variants from the puppetserver module [puppet] - ''https://gerrit.wikimedia.org/r/1240924 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 12:28:25 <wikibugs> ('PS1) ''Muehlenhoff: Revert "puppetserver: Update two hooks to the variants from the puppetserver module" [puppet] - ''https://gerrit.wikimedia.org/r/1243100'
2026-02-24 12:29:17 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Revert "puppetserver: Update two hooks to the variants from the puppetserver module" [puppet] - ''https://gerrit.wikimedia.org/r/1243100 (owner: ''Muehlenhoff)'
2026-02-24 12:29:31 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
2026-02-24 12:29:45 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
2026-02-24 12:30:23 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
2026-02-24 12:30:40 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
2026-02-24 12:30:55 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
2026-02-24 12:32:18 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
2026-02-24 12:32:37 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
2026-02-24 12:32:58 <wikibugs> ('PS1) ''JMeybohm: sre.k8s.pool-depool-node: Support racks without L2 adjacency to LVS [cookbooks] - ''https://gerrit.wikimedia.org/r/1243101 (https://phabricator.wikimedia.org/T418142)'
2026-02-24 12:33:39 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
2026-02-24 12:33:48 <wikibugs> ('PS4) ''Daniel Kinzler: rest gateway: expose headers [deployment-charts] - ''https://gerrit.wikimedia.org/r/1240388 (https://phabricator.wikimedia.org/T417780)'
2026-02-24 12:34:41 <wikibugs> ('PS5) ''Daniel Kinzler: rest gateway: expose headers [deployment-charts] - ''https://gerrit.wikimedia.org/r/1240388 (https://phabricator.wikimedia.org/T417780)'
2026-02-24 12:35:02 <logmsgbot> !log aikochou@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
2026-02-24 12:35:05 <wikibugs> ('CR) ''Daniel Kinzler: "Done" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1240388 (https://phabricator.wikimedia.org/T417780) (owner: ''Daniel Kinzler)'
2026-02-24 12:35:11 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
2026-02-24 12:35:24 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
2026-02-24 12:36:04 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
2026-02-24 12:36:07 <logmsgbot> !log aikochou@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
2026-02-24 12:36:31 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
2026-02-24 12:37:04 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
2026-02-24 12:37:17 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
2026-02-24 12:37:38 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
2026-02-24 12:37:59 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
2026-02-24 12:38:19 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
2026-02-24 12:38:29 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
2026-02-24 12:38:43 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
2026-02-24 12:40:14 <wikibugs> ('PS4) ''Santiago Faci: test-kitchen kubernetes chart: New config property [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242438 (https://phabricator.wikimedia.org/T418088)'
2026-02-24 12:40:18 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
2026-02-24 12:41:58 <wikibugs> ('CR) ''Btullis: "Thanks. Good question. I've opted to do both at the same time here. The topology change affects the an-master hosts mainly and requires a " [puppet] - ''https://gerrit.wikimedia.org/r/1242513 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 12:42:12 <wikibugs> ('CR) ''Matthieulec: [C:''+1] "Thanks for catching that!" [cookbooks] - ''https://gerrit.wikimedia.org/r/1243075 (https://phabricator.wikimedia.org/T410537) (owner: ''JMeybohm)'
2026-02-24 12:43:38 <wikibugs> ('CR) ''Btullis: [C:''+2] Add dbstore1010 to site.pp and preseed.yaml [puppet] - ''https://gerrit.wikimedia.org/r/1242533 (https://phabricator.wikimedia.org/T417948) (owner: ''Btullis)'
2026-02-24 12:46:20 <wikibugs> ('CR) ''Btullis: Add the new druid-internal servers to site.pp and preseed.yaml (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/1242529 (https://phabricator.wikimedia.org/T417430) (owner: ''Btullis)'
2026-02-24 12:46:50 <wikibugs> ('CR) ''Btullis: "recheck" [puppet] - ''https://gerrit.wikimedia.org/r/1242529 (https://phabricator.wikimedia.org/T417430) (owner: ''Btullis)'
2026-02-24 12:47:42 <logmsgbot> mvernon@cumin2002 decommission (PID 2599856) is awaiting input
2026-02-24 12:48:17 <wikibugs> ('CR) ''Btullis: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1242529 (https://phabricator.wikimedia.org/T417430) (owner: ''Btullis)'
2026-02-24 12:51:04 <wikibugs> ('CR) ''David Caro: "This is breaking puppetdb servers in cloud (tools/toolsbeta), I'll revert and then we can look at it more camly" [puppet] - ''https://gerrit.wikimedia.org/r/1239647 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 12:51:30 <wikibugs> ('PS1) ''David Caro: Revert "puppetdb: Drop firewall rule for access to Puppet 5 servers" [puppet] - ''https://gerrit.wikimedia.org/r/1243106'
2026-02-24 12:51:56 <wikibugs> ('PS1) ''Muehlenhoff: Reapply "puppetserver: Update two hooks to the variants from the puppetserver module" [puppet] - ''https://gerrit.wikimedia.org/r/1243107 (https://phabricator.wikimedia.org/T365798)'
2026-02-24 12:52:34 <wikibugs> ('CR) ''CI reject: [V:''-1] Reapply "puppetserver: Update two hooks to the variants from the puppetserver module" [puppet] - ''https://gerrit.wikimedia.org/r/1243107 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 12:52:52 <logmsgbot> !log fceratto@dns1004 START - running authdns-update
2026-02-24 12:54:00 <wikibugs> ('PS2) ''David Caro: Revert "puppetdb: Drop firewall rule for access to Puppet 5 servers" [puppet] - ''https://gerrit.wikimedia.org/r/1243106 (https://phabricator.wikimedia.org/T365798)'
2026-02-24 12:54:20 <wikibugs> ('CR) ''David Caro: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243106 (https://phabricator.wikimedia.org/T365798) (owner: ''David Caro)'
2026-02-24 12:54:36 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] sre.k8s.pool-depool-node: Fix type annotation (''1 comment) [cookbooks] - ''https://gerrit.wikimedia.org/r/1243075 (https://phabricator.wikimedia.org/T410537) (owner: ''JMeybohm)'
2026-02-24 12:55:47 <wikibugs> ('CR) ''Muehlenhoff: [C:''+1] "LGTM" [puppet] - ''https://gerrit.wikimedia.org/r/1243106 (https://phabricator.wikimedia.org/T365798) (owner: ''David Caro)'
2026-02-24 12:55:51 <logmsgbot> !log slyngshede@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2045.codfw.wmnet with OS trixie
2026-02-24 12:56:25 <wikibugs> ('PS2) ''Arnaudb: gerrit: limit GerritHAProxyServiceUnavailable scope [alerts] - ''https://gerrit.wikimedia.org/r/1243102 (https://phabricator.wikimedia.org/T418084)'
2026-02-24 12:56:25 <wikibugs> ('CR) ''Arnaudb: "Side effect: the newly added rule was triggering `AlertLintProblem` because we don't expose that metric on all sites (https://w.wiki/HyeU)" [alerts] - ''https://gerrit.wikimedia.org/r/1243102 (https://phabricator.wikimedia.org/T418084) (owner: ''Arnaudb)'
2026-02-24 12:56:40 <wikibugs> ('CR) ''Kamila Součková: [C:''+1] rest-gateway: disable external_services for minikube [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242428 (https://phabricator.wikimedia.org/T414333) (owner: ''Daniel Kinzler)'
2026-02-24 12:58:56 <wikibugs> ('CR) ''Kamila Součková: [C:''+1] rest-gateway: use MINUTE limits in staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/1239669 (owner: ''Daniel Kinzler)'
2026-02-24 13:00:05 <jouncebot> Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1300)
2026-02-24 13:00:15 <wikibugs> ('PS2) ''Btullis: Move a second journalnode to a newer host [puppet] - ''https://gerrit.wikimedia.org/r/1242511 (https://phabricator.wikimedia.org/T414948)'
2026-02-24 13:01:40 <wikibugs> ('PS1) ''Dpogorzelski: ml-serve: fix istio/transparentproxy config [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243112'
2026-02-24 13:01:51 <logmsgbot> !log fceratto@dns1004 START - running authdns-update
2026-02-24 13:02:17 <jinxer-wm> FIRING: [6x] ProbeDown: Service wdqs2007:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-02-24 13:02:30 <wikibugs> ('PS1) ''Muehlenhoff: puppetdb: Allow access for cloud puppetservers [puppet] - ''https://gerrit.wikimedia.org/r/1243113 (https://phabricator.wikimedia.org/T365798)'
2026-02-24 13:02:43 <wikibugs> ('CR) ''Arnaudb: [C:''+2] gerrit: swap gerrit-replica and gerrit-spare [dns] - ''https://gerrit.wikimedia.org/r/1242268 (https://phabricator.wikimedia.org/T417247) (owner: ''Arnaudb)'
2026-02-24 13:02:53 <logmsgbot> !log arnaudb@dns1004 START - running authdns-update
2026-02-24 13:04:01 <logmsgbot> !log fceratto@dns1004 START - running authdns-update
2026-02-24 13:04:46 <wikibugs> ('CR) ''CI reject: [V:''-1] puppetdb: Allow access for cloud puppetservers [puppet] - ''https://gerrit.wikimedia.org/r/1243113 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 13:04:46 <wikibugs> ('CR) ''Arnaudb: [C:''+2] gerrit: swap gerrit-spare and gerrit-replica [puppet] - ''https://gerrit.wikimedia.org/r/1242269 (https://phabricator.wikimedia.org/T406334) (owner: ''Arnaudb)'
2026-02-24 13:05:14 <wikibugs> ('CR) ''Arnaudb: [C:''+2] gerrit: disable service on gerrit2002 to reimage [puppet] - ''https://gerrit.wikimedia.org/r/1242272 (https://phabricator.wikimedia.org/T417247) (owner: ''Arnaudb)'
2026-02-24 13:06:25 <jinxer-wm> RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 13:06:28 <wikibugs> ('PS2) ''Muehlenhoff: puppetdb: Allow access for cloud puppetservers [puppet] - ''https://gerrit.wikimedia.org/r/1243113 (https://phabricator.wikimedia.org/T365798)'
2026-02-24 13:07:38 <logmsgbot> !log fceratto@dns1004 START - running authdns-update
2026-02-24 13:07:45 <wikibugs> ('CR) ''Kamila Součková: [C:''+1] "+1 with the "I'm fine with deploying this" hat, but I do not currently have the brain to check that the test functionality is equivalent. " [deployment-charts] - ''https://gerrit.wikimedia.org/r/1239972 (owner: ''Daniel Kinzler)'
2026-02-24 13:08:43 <wikibugs> ('CR) ''CI reject: [V:''-1] puppetdb: Allow access for cloud puppetservers [puppet] - ''https://gerrit.wikimedia.org/r/1243113 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 13:09:42 <jinxer-wm> FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-02-24 13:10:50 <wikibugs> ('PS3) ''Muehlenhoff: puppetdb: Allow access for cloud puppetservers [puppet] - ''https://gerrit.wikimedia.org/r/1243113 (https://phabricator.wikimedia.org/T365798)'
2026-02-24 13:11:16 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 13:12:17 <jinxer-wm> FIRING: [10x] ProbeDown: Service wdqs2007:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-02-24 13:13:25 <jinxer-wm> FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs2008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 13:14:00 <logmsgbot> !log brouberol@deploy2002 helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
2026-02-24 13:14:20 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243113 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 13:14:26 <logmsgbot> !log brouberol@deploy2002 helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
2026-02-24 13:14:32 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns7001 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:15:15 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Deploy manual changes from netbox - fceratto@cumin1003"
2026-02-24 13:15:34 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns2004 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:25 <wikibugs> ('CR) ''David Caro: [C:''+1] "LGTM" [puppet] - ''https://gerrit.wikimedia.org/r/1243113 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 13:16:42 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns2005 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:42 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns1006 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:42 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns2006 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:42 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns1005 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:44 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns4003 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:44 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns4004 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:44 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns3003 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:44 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns3004 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:44 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns6001 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:45 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns6002 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:46 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns5003 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:46 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns5004 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:16:46 <icinga-wm> PROBLEM - check if authdns-update was run after a change was merged to operations/dns.git on dns7002 is CRITICAL: Local zone files are NOT in sync with operations/dns.git (SHA: local is 092081508dde94683e62f13137da8749ac4dfc7c, dns.git is 3e0cdc75cf6c0cffabb6e1f0fa146fd2ac0f7fa5) https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:18:19 <logmsgbot> fceratto@cumin1003 netbox (PID 747116) is awaiting input
2026-02-24 13:20:01 <logmsgbot> !log arnaudb@cumin1003 START - Cookbook sre.dns.wipe-cache gerrit-replica.discovery.wmnet gerrit-spare.discovery.wmnet on all recursors
2026-02-24 13:20:05 <logmsgbot> !log arnaudb@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gerrit-replica.discovery.wmnet gerrit-spare.discovery.wmnet on all recursors
2026-02-24 13:20:47 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
2026-02-24 13:21:50 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
2026-02-24 13:22:17 <jinxer-wm> FIRING: [12x] ProbeDown: Service wdqs2007:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-02-24 13:23:46 <jinxer-wm> FIRING: GerritReplicationUnavailable: Gerrit replication on gerrit.wikimedia.org:443 is lagging for more than 15 minutes. - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritReplicationUnavailable - https://grafana.wikimedia.org/goto/8VXsGHdDR?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DGerritReplicationUnavailable
2026-02-24 13:24:28 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Deploy manual changes from netbox - fceratto@cumin1003"
2026-02-24 13:24:28 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-02-24 13:24:32 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 13:25:03 <logmsgbot> !log fceratto@dns1004 START - running authdns-update
2026-02-24 13:25:56 <wikibugs> ('PS1) ''Arnaudb: gerrit: fix discovery record [dns] - ''https://gerrit.wikimedia.org/r/1243119 (https://phabricator.wikimedia.org/T417247)'
2026-02-24 13:26:08 <logmsgbot> !log arnaudb@dns1004 START - running authdns-update
2026-02-24 13:26:22 <logmsgbot> !log fceratto@dns1004 END - running authdns-update
2026-02-24 13:26:31 <wikibugs> ('CR) ''Santiago Faci: test-kitchen kubernetes chart: New config property (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242438 (https://phabricator.wikimedia.org/T418088) (owner: ''Santiago Faci)'
2026-02-24 13:27:04 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-02-24 13:27:32 <logmsgbot> !log arnaudb@dns1004 END - running authdns-update
2026-02-24 13:27:52 <logmsgbot> !log arnaudb@cumin1003 START - Cookbook sre.dns.wipe-cache gerrit-replica.discovery.wmnet gerrit-spare.discovery.wmnet on all recursors
2026-02-24 13:27:55 <logmsgbot> !log arnaudb@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gerrit-replica.discovery.wmnet gerrit-spare.discovery.wmnet on all recursors
2026-02-24 13:29:32 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns7001 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:29:46 <logmsgbot> !log mvernon@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moss-fe[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
2026-02-24 13:29:52 <logmsgbot> !log mvernon@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moss-fe[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
2026-02-24 13:29:52 <logmsgbot> !log mvernon@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-02-24 13:29:53 <logmsgbot> !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts moss-fe[2001-2002].codfw.wmnet
2026-02-24 13:30:20 <logmsgbot> !log arnaudb@cumin1003 START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bookworm
2026-02-24 13:30:34 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns2004 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:30:52 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] puppetdb: Allow access for cloud puppetservers [puppet] - ''https://gerrit.wikimedia.org/r/1243113 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 13:31:42 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns2005 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:42 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns1006 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:42 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns2006 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:42 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns1005 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:44 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns4003 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:44 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns4004 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:44 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns3004 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:44 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns3003 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:44 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns6001 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:45 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns6002 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:46 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns5004 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:46 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns5003 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:31:46 <icinga-wm> RECOVERY - check if authdns-update was run after a change was merged to operations/dns.git on dns7002 is OK: Local zone files and operations/dns.git are in sync https://wikitech.wikimedia.org/wiki/DNS%23authdns_update_run
2026-02-24 13:34:12 <jinxer-wm> FIRING: HelmReleaseBadStatus: Helm release kserve/kserve on k8s-mlstaging@codfw in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s-mlstaging&var-namespace=kserve - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
2026-02-24 13:38:43 <logmsgbot> !log slyngshede@cumin1003 START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
2026-02-24 13:40:22 <wikibugs> ('PS1) ''Filippo Giunchedi: pontoon: always fetch project name from keystone [puppet] - ''https://gerrit.wikimedia.org/r/1243125 (https://phabricator.wikimedia.org/T418236)'
2026-02-24 13:44:53 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
2026-02-24 13:45:53 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
2026-02-24 13:49:50 <logmsgbot> slyngshede@cumin1003 reimage (PID 775282) is awaiting input
2026-02-24 13:50:29 <logmsgbot> !log arnaudb@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
2026-02-24 13:50:35 <wikibugs> ('PS2) ''Filippo Giunchedi: pontoon: always fetch project name from keystone [puppet] - ''https://gerrit.wikimedia.org/r/1243125 (https://phabricator.wikimedia.org/T418236)'
2026-02-24 13:53:38 <logmsgbot> !log slyngshede@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2045.codfw.wmnet with OS trixie
2026-02-24 13:53:53 <wikibugs> ('PS2) ''JMeybohm: sre.k8s.pool-depool-node: Support racks without L2 adjacency to LVS [cookbooks] - ''https://gerrit.wikimedia.org/r/1243101 (https://phabricator.wikimedia.org/T418142)'
2026-02-24 13:54:47 <logmsgbot> !log arnaudb@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
2026-02-24 13:54:52 <wikibugs> ('PS1) ''Michael Große: feat: if Minerva personal menu is enabled, flip discovery site notice [extensions/GrowthExperiments] (wmf/1.46.0-wmf.17) - ''https://gerrit.wikimedia.org/r/1243127 (https://phabricator.wikimedia.org/T416656)'
2026-02-24 13:55:10 <wikibugs> ('PS2) ''Muehlenhoff: Reapply "Update two hooks to the variants from the puppetserver module" [puppet] - ''https://gerrit.wikimedia.org/r/1243107 (https://phabricator.wikimedia.org/T365798)'
2026-02-24 13:56:44 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 24 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo"; [extensions/GrowthExperiments] (wmf/1.46.0-wmf.17) - ''https://gerrit.wikimedia.org/r/1243127 (https://phabricator.wikimedia.org/T416656) (owner: ''Michael Große)'
2026-02-24 13:58:10 <wikibugs> ('PS3) ''JMeybohm: sre.k8s.pool-depool-node: Support racks without L2 adjacency to LVS [cookbooks] - ''https://gerrit.wikimedia.org/r/1243101 (https://phabricator.wikimedia.org/T418142)'
2026-02-24 13:58:21 <wikibugs> ('PS1) ''Muehlenhoff: Remove two spec tests [puppet] - ''https://gerrit.wikimedia.org/r/1243129'
2026-02-24 13:59:17 <wikibugs> 'SRE, ''Infrastructure-Foundations: Integrate Bookworm 12.13 point update - https://phabricator.wikimedia.org/T414205#11645098 (''MoritzMuehlenhoff)'
2026-02-24 14:00:05 <jouncebot> Lucas_WMDE, Urbanecm, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1400)
2026-02-24 14:00:05 <jouncebot> awight and MichaelG_WMF: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2026-02-24 14:00:11 <awight> Hi, I can deploy my config patch.
2026-02-24 14:00:14 <MichaelG_WMF> is here
2026-02-24 14:00:21 <urbanecm> awight: go ahead
2026-02-24 14:00:24 <urbanecm> MichaelG_WMF: and i can deploy for you
2026-02-24 14:00:33 <MichaelG_WMF> urbanecm: Thanks!
2026-02-24 14:00:43 <awight> ack :-)
2026-02-24 14:01:05 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] sre.k8s.pool-depool-node: Support racks without L2 adjacency to LVS [cookbooks] - ''https://gerrit.wikimedia.org/r/1243101 (https://phabricator.wikimedia.org/T418142) (owner: ''JMeybohm)'
2026-02-24 14:01:20 <wikibugs> ('CR) ''CI reject: [V:''-1] feat: if Minerva personal menu is enabled, flip discovery site notice [extensions/GrowthExperiments] (wmf/1.46.0-wmf.17) - ''https://gerrit.wikimedia.org/r/1243127 (https://phabricator.wikimedia.org/T416656) (owner: ''Michael Große)'
2026-02-24 14:01:38 <MichaelG_WMF> I'll have a look at the CI failure
2026-02-24 14:01:51 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by awight@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243047 (https://phabricator.wikimedia.org/T418209) (owner: ''Awight)'
2026-02-24 14:02:03 <MichaelG_WMF> git error, unrelated
2026-02-24 14:02:08 <wikibugs> ('CR) ''Clément Goubert: sre.k8s.pool-depool-node: Support racks without L2 adjacency to LVS [cookbooks] - ''https://gerrit.wikimedia.org/r/1243101 (https://phabricator.wikimedia.org/T418142) (owner: ''JMeybohm)'
2026-02-24 14:02:17 <wikibugs> ('CR) ''Michael Große: "recheck" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.17) - ''https://gerrit.wikimedia.org/r/1243127 (https://phabricator.wikimedia.org/T416656) (owner: ''Michael Große)'
2026-02-24 14:02:46 <wikibugs> ('Merged) ''jenkins-bot: Subreferencing pilot wikis, phase 2 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243047 (https://phabricator.wikimedia.org/T418209) (owner: ''Awight)'
2026-02-24 14:03:03 <logmsgbot> !log awight@deploy2002 Started scap sync-world: Backport for [[gerrit:1243047|Subreferencing pilot wikis, phase 2 (T418209)]]
2026-02-24 14:03:08 <stashbot> T418209: Deploy subreferencing: pilot wikis phase 2 - https://phabricator.wikimedia.org/T418209
2026-02-24 14:03:32 <wikibugs> ('PS2) ''Btullis: Add the new druid-internal servers to site.pp and preseed.yaml [puppet] - ''https://gerrit.wikimedia.org/r/1242529 (https://phabricator.wikimedia.org/T417430)'
2026-02-24 14:03:51 <wikibugs> ('CR) ''CI reject: [V:''-1] sre.k8s.pool-depool-node: Support racks without L2 adjacency to LVS [cookbooks] - ''https://gerrit.wikimedia.org/r/1243101 (https://phabricator.wikimedia.org/T418142) (owner: ''JMeybohm)'
2026-02-24 14:04:29 <urbanecm> MichaelG_WMF: i have a feeling that might be permanent...
2026-02-24 14:04:40 <urbanecm> ...as i just had it on https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/1243126 two times in a row
2026-02-24 14:05:01 <logmsgbot> !log awight@deploy2002 awight: Backport for [[gerrit:1243047|Subreferencing pilot wikis, phase 2 (T418209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-02-24 14:05:05 <urbanecm> let's see
2026-02-24 14:05:12 <wikibugs> ('CR) ''Btullis: [C:''+2] Move a second journalnode to a newer host [puppet] - ''https://gerrit.wikimedia.org/r/1242511 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 14:05:23 <wikibugs> ('PS4) ''JMeybohm: sre.k8s.pool-depool-node: Support racks without L2 adjacency to LVS [cookbooks] - ''https://gerrit.wikimedia.org/r/1243101 (https://phabricator.wikimedia.org/T418142)'
2026-02-24 14:05:26 <urbanecm> awight: fyi i'm +2ing the backport to save CI time, will wait on handover before touching prod
2026-02-24 14:05:29 <wikibugs> ('CR) ''Urbanecm: [C:''+2] feat: if Minerva personal menu is enabled, flip discovery site notice [extensions/GrowthExperiments] (wmf/1.46.0-wmf.17) - ''https://gerrit.wikimedia.org/r/1243127 (https://phabricator.wikimedia.org/T416656) (owner: ''Michael Große)'
2026-02-24 14:06:40 <wikibugs> ('PS2) ''Arnaudb: gerrit: prepare replication resume for gerrit2002 [puppet] - ''https://gerrit.wikimedia.org/r/1242275 (https://phabricator.wikimedia.org/T338470)'
2026-02-24 14:06:49 <wikibugs> ('CR) ''Clément Goubert: [C:''+1] sre.k8s.pool-depool-node: Support racks without L2 adjacency to LVS [cookbooks] - ''https://gerrit.wikimedia.org/r/1243101 (https://phabricator.wikimedia.org/T418142) (owner: ''JMeybohm)'
2026-02-24 14:07:00 <wikibugs> ('PS2) ''Arnaudb: gerrit: resume replication on gerrit-spare [puppet] - ''https://gerrit.wikimedia.org/r/1242279 (https://phabricator.wikimedia.org/T417247)'
2026-02-24 14:07:10 <awight> looks good, continuing
2026-02-24 14:07:20 <awight> urbanecm: makes sense, thanks for the note!
2026-02-24 14:07:25 <logmsgbot> !log awight@deploy2002 awight: Continuing with sync
2026-02-24 14:09:33 <wikibugs> ('CR) ''JMeybohm: [V:''+2 C:''+2] sre.k8s.pool-depool-node: Fix type annotation (''1 comment) [cookbooks] - ''https://gerrit.wikimedia.org/r/1243075 (https://phabricator.wikimedia.org/T410537) (owner: ''JMeybohm)'
2026-02-24 14:10:46 <MichaelG_WMF> is right back
2026-02-24 14:11:20 <logmsgbot> !log awight@deploy2002 Finished scap sync-world: Backport for [[gerrit:1243047|Subreferencing pilot wikis, phase 2 (T418209)]] (duration: 08m 16s)
2026-02-24 14:11:24 <stashbot> T418209: Deploy subreferencing: pilot wikis phase 2 - https://phabricator.wikimedia.org/T418209
2026-02-24 14:12:21 <wikibugs> ('PS3) ''Arnaudb: gerrit: prepare replication resume for gerrit2002 [puppet] - ''https://gerrit.wikimedia.org/r/1242275 (https://phabricator.wikimedia.org/T338470)'
2026-02-24 14:12:21 <wikibugs> ('PS3) ''Arnaudb: gerrit: install gerrit and sync-instances [puppet] - ''https://gerrit.wikimedia.org/r/1242279 (https://phabricator.wikimedia.org/T417247)'
2026-02-24 14:12:26 <awight> urbanecm: All yours, thanks for taking on the other deployment
2026-02-24 14:12:41 <urbanecm> thanks!
2026-02-24 14:12:44 <urbanecm> waiting on CI
2026-02-24 14:12:55 <wikibugs> ('CR) ''JMeybohm: [C:''+2] sre.k8s.pool-depool-node: Support racks without L2 adjacency to LVS [cookbooks] - ''https://gerrit.wikimedia.org/r/1243101 (https://phabricator.wikimedia.org/T418142) (owner: ''JMeybohm)'
2026-02-24 14:14:01 <logmsgbot> !log arnaudb@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bookworm
2026-02-24 14:14:26 <wikibugs> ('PS4) ''Arnaudb: gerrit: install gerrit and sync-instances [puppet] - ''https://gerrit.wikimedia.org/r/1242279 (https://phabricator.wikimedia.org/T417247)'
2026-02-24 14:14:48 <wikibugs> ('PS4) ''Arnaudb: gerrit: migrate gerrit2 system user to gerrit [puppet] - ''https://gerrit.wikimedia.org/r/1242275 (https://phabricator.wikimedia.org/T338470)'
2026-02-24 14:15:02 <urbanecm> MichaelG_WMF: we're not on testwikis, so +2ing will be enough
2026-02-24 14:15:13 <wikibugs> ('Merged) ''jenkins-bot: sre.k8s.pool-depool-node: Fix type annotation [cookbooks] - ''https://gerrit.wikimedia.org/r/1243075 (https://phabricator.wikimedia.org/T410537) (owner: ''JMeybohm)'
2026-02-24 14:15:45 <wikibugs> ('CR) ''Arnaudb: "I've added one more step to the relation chain because we were changing the gerrit role too soon the reimage process → that needed to be d" [puppet] - ''https://gerrit.wikimedia.org/r/1242275 (https://phabricator.wikimedia.org/T338470) (owner: ''Arnaudb)'
2026-02-24 14:16:12 <MichaelG_WMF> is back
2026-02-24 14:16:23 <MichaelG_WMF> urbanecm: yes, that was my understanding as well
2026-02-24 14:16:39 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops, ''Data-Platform-SRE (2026-02-13 - 2026-03-06): Degraded RAID on an-worker1204 - https://phabricator.wikimedia.org/T414861#11645218 (''BTullis) ''Open''Resolved This is complete now.'
2026-02-24 14:16:45 <urbanecm> in that case, let's wait for CI and be done with it :)
2026-02-24 14:16:47 <wikibugs> ('CR) ''Arnaudb: [C:''+2] gerrit: migrate gerrit2 system user to gerrit [puppet] - ''https://gerrit.wikimedia.org/r/1242275 (https://phabricator.wikimedia.org/T338470) (owner: ''Arnaudb)'
2026-02-24 14:16:58 <urbanecm> (one of simpler problems to handle for today...)
2026-02-24 14:17:41 <logmsgbot> !log arnaudb@cumin1003 START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bookworm
2026-02-24 14:18:17 <wikibugs> ('Merged) ''jenkins-bot: sre.k8s.pool-depool-node: Support racks without L2 adjacency to LVS [cookbooks] - ''https://gerrit.wikimedia.org/r/1243101 (https://phabricator.wikimedia.org/T418142) (owner: ''JMeybohm)'
2026-02-24 14:18:33 <wikibugs> ('PS1) ''Elukey: ml-services: move revertrisk away from the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243132'
2026-02-24 14:18:55 <wikibugs> ('PS2) ''Dpogorzelski: ml-serve: fix istio/transparentproxy config [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243112'
2026-02-24 14:19:32 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops, ''Data-Platform-SRE (2026-02-13 - 2026-03-06): Q3:rack/setup/install dbstore1010 - https://phabricator.wikimedia.org/T417948#11645235 (''BTullis) a:''BTullis''None'
2026-02-24 14:19:44 <wikibugs> ('Merged) ''jenkins-bot: feat: if Minerva personal menu is enabled, flip discovery site notice [extensions/GrowthExperiments] (wmf/1.46.0-wmf.17) - ''https://gerrit.wikimedia.org/r/1243127 (https://phabricator.wikimedia.org/T416656) (owner: ''Michael Große)'
2026-02-24 14:19:59 <wikibugs> ('CR) ''Dpogorzelski: [C:''+1] ml-services: move revertrisk away from the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243132 (owner: ''Elukey)'
2026-02-24 14:20:15 <wikibugs> ('PS5) ''Tiziano Fogli: Thanos/Store: add support for multi-instance setup [puppet] - ''https://gerrit.wikimedia.org/r/1219145 (https://phabricator.wikimedia.org/T412924)'
2026-02-24 14:20:15 <wikibugs> ('PS6) ''Tiziano Fogli: Thanos/Store: add a ruler(s)-dedicated store gateway [puppet] - ''https://gerrit.wikimedia.org/r/1219146 (https://phabricator.wikimedia.org/T412924)'
2026-02-24 14:20:15 <wikibugs> ('PS1) ''Tiziano Fogli: thanos/querier (TMP): filter out non local ruler from query configs [puppet] - ''https://gerrit.wikimedia.org/r/1243133 (https://phabricator.wikimedia.org/T412924)'
2026-02-24 14:21:59 <urbanecm> MichaelG_WMF: so, should be done
2026-02-24 14:22:15 <MichaelG_WMF> Great, thanks!
2026-02-24 14:22:16 <wikibugs> ('CR) ''Jgreen: [C:''+1] Fix hostname for frmx SPF records [dns] - ''https://gerrit.wikimedia.org/r/1242532 (https://phabricator.wikimedia.org/T417958) (owner: ''Dwisehaupt)'
2026-02-24 14:22:52 <wikibugs> ('CR) ''CI reject: [V:''-1] Thanos/Store: add support for multi-instance setup [puppet] - ''https://gerrit.wikimedia.org/r/1219145 (https://phabricator.wikimedia.org/T412924) (owner: ''Tiziano Fogli)'
2026-02-24 14:23:47 <wikibugs> ('CR) ''CI reject: [V:''-1] Thanos/Store: add a ruler(s)-dedicated store gateway [puppet] - ''https://gerrit.wikimedia.org/r/1219146 (https://phabricator.wikimedia.org/T412924) (owner: ''Tiziano Fogli)'
2026-02-24 14:23:50 <wikibugs> ('CR) ''CI reject: [V:''-1] thanos/querier (TMP): filter out non local ruler from query configs [puppet] - ''https://gerrit.wikimedia.org/r/1243133 (https://phabricator.wikimedia.org/T412924) (owner: ''Tiziano Fogli)'
2026-02-24 14:23:52 <wikibugs> ('PS1) ''Muehlenhoff: thumbor-plugins: Stop using pkg_resources [software/thumbor-plugins] - ''https://gerrit.wikimedia.org/r/1243135'
2026-02-24 14:25:52 <wikibugs> ('PS2) ''Tiziano Fogli: thanos/querier (TMP): filter out non local ruler from query configs [puppet] - ''https://gerrit.wikimedia.org/r/1243133 (https://phabricator.wikimedia.org/T412924)'
2026-02-24 14:25:52 <wikibugs> ('PS6) ''Tiziano Fogli: Thanos/Store: add support for multi-instance setup [puppet] - ''https://gerrit.wikimedia.org/r/1219145 (https://phabricator.wikimedia.org/T412924)'
2026-02-24 14:25:52 <wikibugs> ('PS7) ''Tiziano Fogli: Thanos/Store: add a ruler(s)-dedicated store gateway [puppet] - ''https://gerrit.wikimedia.org/r/1219146 (https://phabricator.wikimedia.org/T412924)'
2026-02-24 14:26:55 <wikibugs> ('PS2) ''Elukey: ml-services: move away from the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243132'
2026-02-24 14:27:37 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243107 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 14:28:21 <wikibugs> ('CR) ''CI reject: [V:''-1] thanos/querier (TMP): filter out non local ruler from query configs [puppet] - ''https://gerrit.wikimedia.org/r/1243133 (https://phabricator.wikimedia.org/T412924) (owner: ''Tiziano Fogli)'
2026-02-24 14:29:02 <wikibugs> ('CR) ''CI reject: [V:''-1] thumbor-plugins: Stop using pkg_resources [software/thumbor-plugins] - ''https://gerrit.wikimedia.org/r/1243135 (owner: ''Muehlenhoff)'
2026-02-24 14:29:08 <wikibugs> ('CR) ''CI reject: [V:''-1] Thanos/Store: add support for multi-instance setup [puppet] - ''https://gerrit.wikimedia.org/r/1219145 (https://phabricator.wikimedia.org/T412924) (owner: ''Tiziano Fogli)'
2026-02-24 14:29:09 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] syslog::remote: Remove buster workarounds [puppet] - ''https://gerrit.wikimedia.org/r/1243069 (owner: ''Muehlenhoff)'
2026-02-24 14:29:29 <wikibugs> ('CR) ''CI reject: [V:''-1] Thanos/Store: add a ruler(s)-dedicated store gateway [puppet] - ''https://gerrit.wikimedia.org/r/1219146 (https://phabricator.wikimedia.org/T412924) (owner: ''Tiziano Fogli)'
2026-02-24 14:29:38 <logmsgbot> !log slyngshede@cumin1003 START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
2026-02-24 14:29:38 <wikibugs> ('CR) ''Dpogorzelski: [C:''+1] ml-services: move away from the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243132 (owner: ''Elukey)'
2026-02-24 14:31:40 <wikibugs> ('Abandoned) ''Elukey: ml-services: move away from the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243132 (owner: ''Elukey)'
2026-02-24 14:33:25 <jinxer-wm> RESOLVED: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs2013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 14:34:16 <wikibugs> ('PS1) ''Muehlenhoff: wmflib::service::probe::tcp_module_options: Remove support for Buster [puppet] - ''https://gerrit.wikimedia.org/r/1243137'
2026-02-24 14:34:27 <wikibugs> ('PS1) ''Elukey: ml-services: force Revert Risk to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243138'
2026-02-24 14:34:30 <logmsgbot> !log slyngshede@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie
2026-02-24 14:36:34 <wikibugs> ('CR) ''CI reject: [V:''-1] wmflib::service::probe::tcp_module_options: Remove support for Buster [puppet] - ''https://gerrit.wikimedia.org/r/1243137 (owner: ''Muehlenhoff)'
2026-02-24 14:37:05 <wikibugs> ('PS2) ''Federico Ceratto: site.pp: Setup dborch1003 [puppet] - ''https://gerrit.wikimedia.org/r/1243134 (https://phabricator.wikimedia.org/T317179)'
2026-02-24 14:37:30 <logmsgbot> !log arnaudb@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
2026-02-24 14:38:26 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] mariadb::packages_client: Remove support for buster [puppet] - ''https://gerrit.wikimedia.org/r/1219874 (owner: ''Muehlenhoff)'
2026-02-24 14:43:55 <logmsgbot> !log arnaudb@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
2026-02-24 14:43:56 <wikibugs> ('CR) ''Dpogorzelski: [C:''+1] ml-services: force Revert Risk to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243138 (owner: ''Elukey)'
2026-02-24 14:46:01 <icinga-wm> PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2010.codfw.wmnet, wdqs2015.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 14:48:49 <icinga-wm> PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2014.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2010.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2013.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 14:49:47 <wikibugs> ('CR) ''AikoChou: [C:''+1] ml-services: force Revert Risk to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243138 (owner: ''Elukey)'
2026-02-24 14:50:16 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] docker: Remove check for memory_cgroup [puppet] - ''https://gerrit.wikimedia.org/r/1223184 (owner: ''Muehlenhoff)'
2026-02-24 14:50:32 <wikibugs> ('CR) ''Dpogorzelski: [C:''+2] ml-services: force Revert Risk to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243138 (owner: ''Elukey)'
2026-02-24 14:51:47 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''Mail: Remove mail alias/fork from dmarc-rua@wikimedia.org to dmarc@donate.wikimedia.org - https://phabricator.wikimedia.org/T417941#11645548 (''Jgreen) >>! In T417941#11636764, @Dzahn wrote: > @Jgreen I removed the dmarc@donate.wikimedia.org line from that alias. > > I...'
2026-02-24 14:52:32 <wikibugs> ('Merged) ''jenkins-bot: ml-services: force Revert Risk to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243138 (owner: ''Elukey)'
2026-02-24 14:53:01 <wikibugs> ('CR) ''Federico Ceratto: "I'm getting an error in the automatically started CI test named "test" due to... missing jpg images it seems." [puppet] - ''https://gerrit.wikimedia.org/r/1243134 (https://phabricator.wikimedia.org/T317179) (owner: ''Federico Ceratto)'
2026-02-24 14:53:15 <wikibugs> ('CR) ''Federico Ceratto: [V:''+2] site.pp: Setup dborch1003 [puppet] - ''https://gerrit.wikimedia.org/r/1243134 (https://phabricator.wikimedia.org/T317179) (owner: ''Federico Ceratto)'
2026-02-24 14:53:25 <wikibugs> ('CR) ''Federico Ceratto: site.pp: Setup dborch1003 [puppet] - ''https://gerrit.wikimedia.org/r/1243134 (https://phabricator.wikimedia.org/T317179) (owner: ''Federico Ceratto)'
2026-02-24 14:53:33 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
2026-02-24 14:54:03 <wikibugs> ('PS2) ''Fabfur: hiera: test haproxy 3.0 on cp7001 [puppet] - ''https://gerrit.wikimedia.org/r/1242427'
2026-02-24 14:54:07 <wikibugs> ('PS2) ''Muehlenhoff: wmflib::service::probe::tcp_module_options: Remove support for Buster [puppet] - ''https://gerrit.wikimedia.org/r/1243137'
2026-02-24 14:56:17 <wikibugs> ('CR) ''CI reject: [V:''-1] wmflib::service::probe::tcp_module_options: Remove support for Buster [puppet] - ''https://gerrit.wikimedia.org/r/1243137 (owner: ''Muehlenhoff)'
2026-02-24 14:56:23 <wikibugs> ('CR) ''Slyngshede: [C:''+1] "Looks good." [puppet] - ''https://gerrit.wikimedia.org/r/1242427 (owner: ''Fabfur)'
2026-02-24 14:57:24 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''netops, ''Data-Platform-SRE (2026-02-13 - 2026-03-06), ''Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11645599 (''BTullis) Just a data point. We're still seeing an ever-increasing value for these open soc...'
2026-02-24 14:58:04 <wikibugs> ('CR) ''Btullis: [C:''+2] Prepare to decom the old an-worker hosts [puppet] - ''https://gerrit.wikimedia.org/r/1242513 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 14:58:22 <jinxer-wm> FIRING: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
2026-02-24 14:58:38 <wikibugs> 'SRE, ''DC-Ops, ''ServiceOps new, ''ServiceOps-Upgrades-Hardware: Reimage sretest2009 as a wikikube worker and assess performance - https://phabricator.wikimedia.org/T400871#11645606 (''Clement_Goubert) ''Open''Declined Abandoning as I think these are the hosts we got in the last refresh.'
2026-02-24 14:58:41 <wikibugs> ('CR) ''Btullis: [C:''+2] Add the configuration for the new dse-k8s worker nodes that were an-worker [puppet] - ''https://gerrit.wikimedia.org/r/1242514 (https://phabricator.wikimedia.org/T414948) (owner: ''Btullis)'
2026-02-24 14:59:05 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
2026-02-24 15:00:05 <jouncebot> Deploy window Test Kitchen UI Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1500)
2026-02-24 15:00:10 <wikibugs> ('PS1) ''PipelineBot: citoid: pipeline bot promote [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243140'
2026-02-24 15:01:13 <logmsgbot> !log arnaudb@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bookworm
2026-02-24 15:01:31 <wikibugs> ('CR) ''Arnaudb: [C:''+2] gerrit: install gerrit and sync-instances [puppet] - ''https://gerrit.wikimedia.org/r/1242279 (https://phabricator.wikimedia.org/T417247) (owner: ''Arnaudb)'
2026-02-24 15:02:21 <wikibugs> ('CR) ''AikoChou: [C:''+1] ml-serve: fix istio/transparentproxy config [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243112 (owner: ''Dpogorzelski)'
2026-02-24 15:02:37 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
2026-02-24 15:02:56 <wikibugs> ('CR) ''Ayounsi: [C:''+1] wikimedia.org: add IPv6 glue records for ns0 and ns2 [dns] - ''https://gerrit.wikimedia.org/r/1242423 (https://phabricator.wikimedia.org/T81605) (owner: ''Ssingh)'
2026-02-24 15:08:11 <logmsgbot> !log arnaudb@cumin1003 START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
2026-02-24 15:09:39 <wikibugs> ('CR) ''Dpogorzelski: [C:''+2] ml-serve: fix istio/transparentproxy config [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243112 (owner: ''Dpogorzelski)'
2026-02-24 15:11:14 <logmsgbot> !log arnaudb@cumin1003 END (ERROR) - Cookbook sre.gerrit.sync-instances (exit_code=97) sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
2026-02-24 15:11:58 <wikibugs> ('CR) ''Federico Ceratto: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1243134 (https://phabricator.wikimedia.org/T317179) (owner: ''Federico Ceratto)'
2026-02-24 15:12:09 <logmsgbot> !log arnaudb@cumin1003 START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
2026-02-24 15:15:28 <logmsgbot> !log arnaudb@cumin1003 END (FAIL) - Cookbook sre.gerrit.sync-instances (exit_code=99) sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
2026-02-24 15:16:35 <wikibugs> ('PS1) ''Dpogorzelski: ml-services: force articletopic to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243144'
2026-02-24 15:17:14 <logmsgbot> !log arnaudb@cumin1003 START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
2026-02-24 15:18:00 <wikibugs> ('CR) ''Elukey: [C:''+1] ml-services: force articletopic to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243144 (owner: ''Dpogorzelski)'
2026-02-24 15:18:45 <wikibugs> ('CR) ''Dpogorzelski: [C:''+2] ml-services: force articletopic to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243144 (owner: ''Dpogorzelski)'
2026-02-24 15:19:37 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
2026-02-24 15:20:33 <wikibugs> ('PS2) ''Arnaudb: gerrit: resume replication on gerrit-spare [puppet] - ''https://gerrit.wikimedia.org/r/1243131 (https://phabricator.wikimedia.org/T417247)'
2026-02-24 15:22:16 <wikibugs> ('CR) ''Herron: [C:''+1] Remove OS check for nrpe2nodexp [puppet] - ''https://gerrit.wikimedia.org/r/1243068 (owner: ''Muehlenhoff)'
2026-02-24 15:22:47 <wikibugs> ('CR) ''Herron: [C:''+1] mtail: Use the Debian version of mtail universally [puppet] - ''https://gerrit.wikimedia.org/r/1243048 (owner: ''Muehlenhoff)'
2026-02-24 15:23:23 <wikibugs> ('CR) ''Herron: [C:''+1] meta-monitoring: add rewrite rule to redirect home to Wikitech [puppet] - ''https://gerrit.wikimedia.org/r/1241014 (https://phabricator.wikimedia.org/T417900) (owner: ''Tiziano Fogli)'
2026-02-24 15:27:59 <wikibugs> ('PS5) ''Brouberol: Use importlib.metadata instead of pkg_resources, now deprecated/removed. [software/spicerack] - ''https://gerrit.wikimedia.org/r/1240850'
2026-02-24 15:30:05 <jouncebot> Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1530)
2026-02-24 15:30:30 <wikibugs> ('PS3) ''Ayounsi: Nokia: add local-as to k8s BGP sessions [homer/public] - ''https://gerrit.wikimedia.org/r/1242410 (https://phabricator.wikimedia.org/T417817)'
2026-02-24 15:31:58 <wikibugs> ('CR) ''Ayounsi: [C:''+2] Nokia: add local-as to k8s BGP sessions [homer/public] - ''https://gerrit.wikimedia.org/r/1242410 (https://phabricator.wikimedia.org/T417817) (owner: ''Ayounsi)'
2026-02-24 15:33:17 <wikibugs> ('Merged) ''jenkins-bot: Nokia: add local-as to k8s BGP sessions [homer/public] - ''https://gerrit.wikimedia.org/r/1242410 (https://phabricator.wikimedia.org/T417817) (owner: ''Ayounsi)'
2026-02-24 15:35:25 <logmsgbot> !log arnaudb@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet,phab[1004-1005].eqiad.wmnet with reason: T418256
2026-02-24 15:35:30 <stashbot> T418256: Deploy Phab/Phorge 2026-02-24 - https://phabricator.wikimedia.org/T418256
2026-02-24 15:37:11 <wikibugs> ('CR) ''Brouberol: [C:''+1] Add the new druid-internal servers to site.pp and preseed.yaml [puppet] - ''https://gerrit.wikimedia.org/r/1242529 (https://phabricator.wikimedia.org/T417430) (owner: ''Btullis)'
2026-02-24 15:38:10 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.hosts.decommission for hosts an-worker[1119-1130,1135-1141].eqiad.wmnet
2026-02-24 15:40:18 <logmsgbot> !log slyngshede@cumin1003 START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
2026-02-24 15:42:23 <logmsgbot> !log jforrester@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: sync
2026-02-24 15:42:38 <logmsgbot> !log dwisehaupt@dns1004 START - running authdns-update
2026-02-24 15:43:16 <logmsgbot> !log jforrester@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
2026-02-24 15:44:02 <logmsgbot> !log dwisehaupt@dns1004 END - running authdns-update
2026-02-24 15:44:23 <urbanecm> !log Remove Phabricator MFA for EMcFarland-WMF (T418260)
2026-02-24 15:44:27 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 15:44:28 <stashbot> T418260: Reset MFA for EMcFarland-WMF on Phabricator - https://phabricator.wikimedia.org/T418260
2026-02-24 15:46:01 <logmsgbot> !log jforrester@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
2026-02-24 15:46:32 <logmsgbot> !log jforrester@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
2026-02-24 15:46:56 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Gerrit-Privilege-Requests, ''Release-Engineering-Team, ''Security-Team: Request membership in deployment (and wmf-deployment group) for Rsilvola - https://phabricator.wikimedia.org/T418004#11646285 (''Dzahn)'
2026-02-24 15:47:08 <wikibugs> 'SRE, ''Data-Engineering, ''Data-Engineering-Icebox, ''Product Safety and Integrity, and 3 others: Include User-Agent Client Hints in WebRequest logs - https://phabricator.wikimedia.org/T337947#11646291 (''Dreamy_Jazz)'
2026-02-24 15:47:30 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Gerrit-Privilege-Requests, ''Release-Engineering-Team, ''Security-Team: Request membership in deployment (and wmf-deployment group) for Rsilvola - https://phabricator.wikimedia.org/T418004#11646296 (''Dzahn)'
2026-02-24 15:48:06 <wikibugs> ('CR) ''Dwisehaupt: [C:''+2] Fix hostname for frmx SPF records [dns] - ''https://gerrit.wikimedia.org/r/1242532 (https://phabricator.wikimedia.org/T417958) (owner: ''Dwisehaupt)'
2026-02-24 15:48:38 <logmsgbot> !log sukhe@dns1004 START - running authdns-update
2026-02-24 15:48:56 <sukhe> !log enable IPv6 glue records for ns[02].wikimedia.org: T81605
2026-02-24 15:48:59 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 15:49:00 <stashbot> T81605: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605
2026-02-24 15:49:21 <logmsgbot> !log arnaudb@cumin1003 END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
2026-02-24 15:50:18 <logmsgbot> !log sukhe@dns1004 END - running authdns-update
2026-02-24 15:51:24 <wikibugs> ('CR) ''JHathaway: [C:''+1] apt: Remove support for Buster [puppet] - ''https://gerrit.wikimedia.org/r/1243035 (owner: ''Muehlenhoff)'
2026-02-24 15:52:18 <wikibugs> ('CR) ''JHathaway: [C:''+1] Reapply "Update two hooks to the variants from the puppetserver module" [puppet] - ''https://gerrit.wikimedia.org/r/1243107 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 15:52:43 <logmsgbot> !log ayounsi@cumin1003 START - Cookbook sre.network.tls for network device asw1-23-ulsfo
2026-02-24 15:52:45 <wikibugs> ('CR) ''JHathaway: [C:''+1] Remove create_ecdsa_cert [puppet] - ''https://gerrit.wikimedia.org/r/1243090 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 15:52:50 <logmsgbot> !log slyngshede@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2045.codfw.wmnet with OS trixie
2026-02-24 15:53:08 <logmsgbot> !log ayounsi@cumin1003 END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
2026-02-24 15:53:21 <wikibugs> ('CR) ''JHathaway: [C:''+1] Remove various Hiera files only necessary for Puppet 5 [puppet] - ''https://gerrit.wikimedia.org/r/1243087 (https://phabricator.wikimedia.org/T365798) (owner: ''Muehlenhoff)'
2026-02-24 15:54:03 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Gerrit-Privilege-Requests, ''Release-Engineering-Team, ''Security-Team: Request membership in deployment (and wmf-deployment group) for Rsilvola - https://phabricator.wikimedia.org/T418004#11646397 (''Dzahn)'
2026-02-24 15:54:19 <mutante> jouncebot: nowandnext
2026-02-24 15:54:19 <jouncebot> For the next 0 hour(s) and 5 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1530)
2026-02-24 15:54:19 <jouncebot> In 0 hour(s) and 5 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1600)
2026-02-24 15:54:29 <wikibugs> ('PS1) ''Dpogorzelski: ml-services: force revertrisk-multi to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243149'
2026-02-24 15:54:33 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Gerrit-Privilege-Requests, ''Release-Engineering-Team, ''Security-Team: Request membership in deployment (and wmf-deployment group) for Rsilvola - https://phabricator.wikimedia.org/T418004#11646421 (''Dzahn)'
2026-02-24 15:54:49 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Gerrit-Privilege-Requests, ''Release-Engineering-Team, ''Security-Team: Request membership in deployment (and wmf-deployment group) for Rsilvola - https://phabricator.wikimedia.org/T418004#11646432 (''Dzahn) Thanks. Most things here are done. The SSH key needs to be verifi...'
2026-02-24 15:55:04 <wikibugs> 'SRE-SLO, ''Abstract Wikipedia team, ''serviceops, ''ServiceOps new: wikifunctions-backend-combined-v1 SLI error budget has been rapidly dropping over Feb 2026 - https://phabricator.wikimedia.org/T418160#11646436 (''Jdforrester-WMF) Over the past 24 hours it's now dropped from 12% to 0.1% and will likely...'
2026-02-24 15:55:16 <wikibugs> ('CR) ''Ssingh: [C:''+2] wikimedia.org: add IPv6 glue records for ns0 and ns2 [dns] - ''https://gerrit.wikimedia.org/r/1242423 (https://phabricator.wikimedia.org/T81605) (owner: ''Ssingh)'
2026-02-24 15:55:20 <wikibugs> ('PS1) ''Volans: wmcs: infra-tracing-nfs improve requests failures [puppet] - ''https://gerrit.wikimedia.org/r/1243151 (https://phabricator.wikimedia.org/T399313)'
2026-02-24 15:55:38 <wikibugs> ('CR) ''Arnaudb: [C:''+2] gerrit: resume replication on gerrit-spare [puppet] - ''https://gerrit.wikimedia.org/r/1243131 (https://phabricator.wikimedia.org/T417247) (owner: ''Arnaudb)'
2026-02-24 15:55:44 <logmsgbot> !log dzahn@cumin2002 START - Cookbook sre.gerrit.restart-gerrit Restarting Gerrit on gerrit2003
2026-02-24 15:55:50 <wikibugs> ('CR) ''CI reject: [V:''-1] wmcs: infra-tracing-nfs improve requests failures [puppet] - ''https://gerrit.wikimedia.org/r/1243151 (https://phabricator.wikimedia.org/T399313) (owner: ''Volans)'
2026-02-24 15:56:01 <logmsgbot> !log dzahn@cumin2002 END (FAIL) - Cookbook sre.gerrit.restart-gerrit (exit_code=99) Restarting Gerrit on gerrit2003
2026-02-24 15:56:14 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Gerrit-Privilege-Requests, ''Release-Engineering-Team, ''Security-Team: Request membership in deployment (and wmf-deployment group) for Rsilvola - https://phabricator.wikimedia.org/T418004#11646452 (''Rsilvola) Thank you! At the moment, I only expect to do occasional depl...'
2026-02-24 15:56:42 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops: Degraded RAID on kubestage2004 - https://phabricator.wikimedia.org/T416726#11646457 (''Jhancock.wm) ''Open''Resolved a:''Jhancock.wm @JMeybohm disk has been replaced.'
2026-02-24 15:57:43 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
2026-02-24 15:58:46 <jinxer-wm> FIRING: [6x] GerritHAProxyBackendUnavailable: Gerrit backend is unavilable for tcp-proxy (HAProxy) gerrit_ssh - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyBackendUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyBackendUnavailable
2026-02-24 15:58:57 <jinxer-wm> FIRING: GerritHAProxyServiceUnavailable: Gerrit tcp-proxy (HAProxy) service gerrit_ssh is DOWN in eqiad - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyServiceUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyServiceUnavailable
2026-02-24 15:59:50 <inflatador> !log bking@local restarting wdqs codfw main to deal with 5xx errors
2026-02-24 15:59:56 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 16:00:05 <jouncebot> jelto, arnoldokoth, mutante, and arnaudb: OwO what's this, a deployment window?? SRE Collaboration Services office hours. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1600). nyaa~
2026-02-24 16:00:07 <mutante> !log gerrit2003 was restarted for maintenance reasons - expecting recovery soon
2026-02-24 16:00:09 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 16:00:18 <logmsgbot> !log dpogorzelski@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
2026-02-24 16:01:13 <icinga-wm> PROBLEM - PyBal IPVS diff check on lvs2014 is CRITICAL: (CRITICAL: Mismatch between IPVS and PyBal https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 16:01:27 <wikibugs> 'SRE, ''SRE-swift-storage, ''Ceph, ''Data-Persistence, and 2 others: Onboard the Docker Registry to apus - https://phabricator.wikimedia.org/T394476#11646575 (''elukey) Matthew upgraded apus to the latest Reef patch (thanks!) and I tried today to push some Docker images to the new /test prefix: ` elukey@...'
2026-02-24 16:02:16 <logmsgbot> !log brennen@deploy2002 Started deploy [phabricator/deployment@aad109e]: deploy phab2002 for T418256
2026-02-24 16:02:20 <stashbot> T418256: Deploy Phab/Phorge 2026-02-24 - https://phabricator.wikimedia.org/T418256
2026-02-24 16:03:22 <jinxer-wm> RESOLVED: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
2026-02-24 16:03:44 <logmsgbot> !log brennen@deploy2002 Finished deploy [phabricator/deployment@aad109e]: deploy phab2002 for T418256 (duration: 01m 28s)
2026-02-24 16:03:46 <jinxer-wm> RESOLVED: [13x] GerritHAProxyBackendUnavailable: Gerrit backend is unavilable for tcp-proxy (HAProxy) gerrit_ssh - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyBackendUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyBackendUnavailable
2026-02-24 16:03:57 <jinxer-wm> RESOLVED: [4x] GerritHAProxyServiceUnavailable: Gerrit tcp-proxy (HAProxy) service gerrit_ssh is DOWN in codfw - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyServiceUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyServiceUnavailable
2026-02-24 16:04:04 <logmsgbot> !log brennen@deploy2002 Started deploy [phabricator/deployment@aad109e]: deploy phab1004 for T418256
2026-02-24 16:04:13 <icinga-wm> PROBLEM - PyBal IPVS diff check on lvs2013 is CRITICAL: (CRITICAL: Mismatch between IPVS and PyBal https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 16:05:13 <wikibugs> ('PS2) ''Muehlenhoff: ferm: Remove obsolete OS check [puppet] - ''https://gerrit.wikimedia.org/r/1243045'
2026-02-24 16:05:16 <wikibugs> ('PS2) ''Volans: wmcs: infra-tracing-nfs improve requests failures [puppet] - ''https://gerrit.wikimedia.org/r/1243151 (https://phabricator.wikimedia.org/T399313)'
2026-02-24 16:05:32 <wikibugs> ('CR) ''Muehlenhoff: ferm: Remove obsolete OS check (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1243045 (owner: ''Muehlenhoff)'
2026-02-24 16:05:59 <logmsgbot> !log brennen@deploy2002 Finished deploy [phabricator/deployment@aad109e]: deploy phab1004 for T418256 (duration: 01m 55s)
2026-02-24 16:07:41 <wikibugs> ('CR) ''CI reject: [V:''-1] ferm: Remove obsolete OS check [puppet] - ''https://gerrit.wikimedia.org/r/1243045 (owner: ''Muehlenhoff)'
2026-02-24 16:07:43 <logmsgbot> !log brennen@deploy2002 Started deploy [phabricator/deployment@01119c5]: re-deploy phab2002 for T418256 (for real this time)
2026-02-24 16:07:47 <stashbot> T418256: Deploy Phab/Phorge 2026-02-24 - https://phabricator.wikimedia.org/T418256
2026-02-24 16:08:14 <wikibugs> ('CR) ''CI reject: [V:''-1] wmcs: infra-tracing-nfs improve requests failures [puppet] - ''https://gerrit.wikimedia.org/r/1243151 (https://phabricator.wikimedia.org/T399313) (owner: ''Volans)'
2026-02-24 16:08:14 <logmsgbot> !log brennen@deploy2002 Finished deploy [phabricator/deployment@01119c5]: re-deploy phab2002 for T418256 (for real this time) (duration: 00m 31s)
2026-02-24 16:08:22 <jinxer-wm> FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-02-24 16:08:37 <logmsgbot> !log brennen@deploy2002 Started deploy [phabricator/deployment@01119c5]: re-deploy phab1004 for T418256 (for real this time)
2026-02-24 16:09:22 <wikibugs> ('CR) ''Hashar: gerrit: alert for broken replication (''1 comment) [alerts] - ''https://gerrit.wikimedia.org/r/1242399 (https://phabricator.wikimedia.org/T418084) (owner: ''Arnaudb)'
2026-02-24 16:09:39 <logmsgbot> !log brennen@deploy2002 Finished deploy [phabricator/deployment@01119c5]: re-deploy phab1004 for T418256 (for real this time) (duration: 01m 01s)
2026-02-24 16:10:38 <wikibugs> ('PS1) ''Muehlenhoff: Remove now obsolete spec test [puppet] - ''https://gerrit.wikimedia.org/r/1243166'
2026-02-24 16:11:12 <icinga-wm> RECOVERY - PyBal IPVS diff check on lvs2014 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 16:12:28 <wikibugs> ('PS1) ''Arnaudb: gerrit: fix gerrit_proxy_spec [puppet] - ''https://gerrit.wikimedia.org/r/1243168'
2026-02-24 16:13:01 <wikibugs> ('CR) ''Hashar: [C:''+1] "😊" [puppet] - ''https://gerrit.wikimedia.org/r/1243168 (owner: ''Arnaudb)'
2026-02-24 16:14:12 <icinga-wm> RECOVERY - PyBal IPVS diff check on lvs2013 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 16:14:26 <wikibugs> ('CR) ''Arnaudb: [C:''+2] gerrit: fix gerrit_proxy_spec [puppet] - ''https://gerrit.wikimedia.org/r/1243168 (owner: ''Arnaudb)'
2026-02-24 16:15:01 <wikibugs> ('CR) ''Tiziano Fogli: [C:''+2] meta-monitoring: add rewrite rule to redirect home to Wikitech [puppet] - ''https://gerrit.wikimedia.org/r/1241014 (https://phabricator.wikimedia.org/T417900) (owner: ''Tiziano Fogli)'
2026-02-24 16:15:09 <wikibugs> ('PS3) ''Volans: wmcs: infra-tracing-nfs improve requests failures [puppet] - ''https://gerrit.wikimedia.org/r/1243151 (https://phabricator.wikimedia.org/T399313)'
2026-02-24 16:15:24 <wikibugs> ('CR) ''Tiziano Fogli: [C:''+2] Remove OS check for nrpe2nodexp [puppet] - ''https://gerrit.wikimedia.org/r/1243068 (owner: ''Muehlenhoff)'
2026-02-24 16:18:26 <wikibugs> ('PS4) ''Dzahn: gerrit: remove code for having multiple daemon users [puppet] - ''https://gerrit.wikimedia.org/r/1242467 (https://phabricator.wikimedia.org/T338470)'
2026-02-24 16:20:08 <wikibugs> ('PS5) ''Dzahn: gerrit: remove code for having multiple daemon users [puppet] - ''https://gerrit.wikimedia.org/r/1242467 (https://phabricator.wikimedia.org/T338470)'
2026-02-24 16:21:35 <wikibugs> ('CR) ''Dzahn: [C:''+1] "following-up with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1242467"; [puppet] - ''https://gerrit.wikimedia.org/r/1243168 (owner: ''Arnaudb)'
2026-02-24 16:22:30 <wikibugs> ('CR) ''Elukey: [C:''+1] ml-services: force revertrisk-multi to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243149 (owner: ''Dpogorzelski)'
2026-02-24 16:25:17 <wikibugs> ('CR) ''Dzahn: [V:''+1 C:''+1] "https://puppet-compiler.wmflabs.org/output/1242467/8132/"; [puppet] - ''https://gerrit.wikimedia.org/r/1242467 (https://phabricator.wikimedia.org/T338470) (owner: ''Dzahn)'
2026-02-24 16:27:21 <wikibugs> ('CR) ''Ladsgroup: [C:''+1] "The resulting ferm config file is a bit different but makes sense (and might be even even faster given no DNS resolve?): https://puppet-co"; [puppet] - ''https://gerrit.wikimedia.org/r/1242430 (owner: ''Muehlenhoff)'
2026-02-24 16:27:29 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''Mail: Remove mail alias/fork from dmarc-rua@wikimedia.org to dmarc@donate.wikimedia.org - https://phabricator.wikimedia.org/T417941#11646872 (''jhathaway) >>! In T417941#11636764, @Dzahn wrote: > @Jgreen I removed the dmarc@donate.wikimedia.org line from that alias. >...'
2026-02-24 16:30:40 <wikibugs> 'SRE, ''Data-Platform-SRE, ''LDAP-Access-Requests: Grant Access to airflow-analytics-ops for akhatun - https://phabricator.wikimedia.org/T418270 (''AKhatun_WMF) ''NEW'
2026-02-24 16:31:26 <logmsgbot> btullis@cumin1003 decommission (PID 897548) is awaiting input
2026-02-24 16:32:34 <wikibugs> ('CR) ''Hashar: [C:''+1] "Excellent! That is a NOOP! :tada:" [puppet] - ''https://gerrit.wikimedia.org/r/1242467 (https://phabricator.wikimedia.org/T338470) (owner: ''Dzahn)'
2026-02-24 16:32:53 <wikibugs> 'SRE, ''SRE-swift-storage, ''Ceph, ''Data-Persistence, and 2 others: Onboard the Docker Registry to apus - https://phabricator.wikimedia.org/T394476#11646967 (''elukey) Tried also with another big image: ` elukey@build2001:~$ sudo docker push docker-registry.discovery.wmnet/test/amd-gpu-tester:latest The...'
2026-02-24 16:33:22 <jinxer-wm> FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-02-24 16:34:42 <jinxer-wm> RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2026-02-24 16:35:02 <wikibugs> ('CR) ''Btullis: [C:''+2] Add the new druid-internal servers to site.pp and preseed.yaml [puppet] - ''https://gerrit.wikimedia.org/r/1242529 (https://phabricator.wikimedia.org/T417430) (owner: ''Btullis)'
2026-02-24 16:35:28 <wikibugs> 'SRE, ''LDAP-Access-Requests, ''Data-Platform-SRE (2026-02-13 - 2026-03-06): Grant Access to airflow-analytics-ops for akhatun - https://phabricator.wikimedia.org/T418270#11647007 (''brouberol)'
2026-02-24 16:35:34 <wikibugs> 'SRE, ''LDAP-Access-Requests, ''Data-Platform-SRE (2026-02-13 - 2026-03-06): Grant Access to airflow-analytics-ops for akhatun - https://phabricator.wikimedia.org/T418270#11647009 (''brouberol) ''Open''In progress'
2026-02-24 16:35:45 <wikibugs> 'SRE, ''LDAP-Access-Requests, ''Data-Platform-SRE (2026-02-13 - 2026-03-06): Grant Access to airflow-analytics-ops for akhatun - https://phabricator.wikimedia.org/T418270#11647010 (''brouberol) a:''brouberol'
2026-02-24 16:36:16 <wikibugs> ('CR) ''AikoChou: [C:''+1] ml-services: force revertrisk-multi to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243149 (owner: ''Dpogorzelski)'
2026-02-24 16:36:40 <wikibugs> 'SRE-swift-storage, ''Data-Persistence, ''MediaViewer, ''Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11647019 (''MatthewVernon) I spent quite a bit of time with codesearch last quarter trying to track down thumbnail size (ab)use, but...'
2026-02-24 16:37:22 <wikibugs> 'SRE, ''LDAP-Access-Requests, ''Data-Platform-SRE (2026-02-13 - 2026-03-06): Grant Access to airflow-analytics-ops for akhatun - https://phabricator.wikimedia.org/T418270#11647035 (''brouberol) @AKhatun_WMF Your username is now listed under https://ldap.toolforge.org/group/airflow-analytics-ops. Go to https...'
2026-02-24 16:39:25 <wikibugs> 'SRE, ''LDAP-Access-Requests, ''Data-Platform-SRE (2026-02-13 - 2026-03-06): Grant Access to airflow-analytics-ops for akhatun - https://phabricator.wikimedia.org/T418270#11647044 (''AKhatun_WMF) Yas! I now have admin access! Thanks.'
2026-02-24 16:39:47 <wikibugs> 'SRE, ''LDAP-Access-Requests, ''Data-Platform-SRE (2026-02-13 - 2026-03-06): Grant Access to airflow-analytics-ops for akhatun - https://phabricator.wikimedia.org/T418270#11647045 (''brouberol) Nice!'
2026-02-24 16:39:53 <wikibugs> 'SRE, ''LDAP-Access-Requests, ''Data-Platform-SRE (2026-02-13 - 2026-03-06): Grant Access to airflow-analytics-ops for akhatun - https://phabricator.wikimedia.org/T418270#11647047 (''brouberol) ''In progress''Resolved'
2026-02-24 16:42:17 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-02-24 16:43:22 <jinxer-wm> FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-02-24 16:43:44 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 16:46:53 <icinga-wm> PROBLEM - Backup freshness on backup1014 is CRITICAL: Stale: 1 (gerrit2002), Fresh: 138 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
2026-02-24 16:47:18 <wikibugs> 'SRE-swift-storage, ''Data-Persistence, ''MediaViewer, ''Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11647087 (''Tacsipacsi) >>! In T414805#11640558, @Ladsgroup wrote: > The point is that in order to be cached, it need to have a miss...'
2026-02-24 16:49:25 <logmsgbot> btullis@cumin1003 decommission (PID 897548) is awaiting input
2026-02-24 16:50:18 <wikibugs> ('PS1) ''JHathaway: dmarc: remove unused ruf tags [dns] - ''https://gerrit.wikimedia.org/r/1243174'
2026-02-24 16:50:26 <wikibugs> ('CR) ''Elukey: locking: Add a mechanism for a global Spicerack lock. (''2 comments) [software/spicerack] - ''https://gerrit.wikimedia.org/r/1239368 (https://phabricator.wikimedia.org/T330997) (owner: ''Blake)'
2026-02-24 16:50:37 <wikibugs> ('CR) ''Dpogorzelski: [C:''+2] ml-services: force revertrisk-multi to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243149 (owner: ''Dpogorzelski)'
2026-02-24 16:50:41 <wikibugs> ('PS2) ''JHathaway: dmarc: remove unused ruf tags [dns] - ''https://gerrit.wikimedia.org/r/1243174 (https://phabricator.wikimedia.org/T417941)'
2026-02-24 16:50:54 <wikibugs> ('CR) ''MVernon: [C:''+1] "LGTM, thanks." [puppet] - ''https://gerrit.wikimedia.org/r/1242430 (owner: ''Muehlenhoff)'
2026-02-24 16:51:45 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1119-1130,1135-1141].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
2026-02-24 16:52:02 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1119-1130,1135-1141].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
2026-02-24 16:52:03 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-02-24 16:52:04 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker[1119-1130,1135-1141].eqiad.wmnet
2026-02-24 16:52:45 <wikibugs> ('Merged) ''jenkins-bot: ml-services: force revertrisk-multi to skip the transparent proxy settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243149 (owner: ''Dpogorzelski)'
2026-02-24 16:54:09 <wikibugs> ('PS1) ''Elukey: .wmfconfig: remove Buster [software/pywmflib] - ''https://gerrit.wikimedia.org/r/1243175'
2026-02-24 16:55:43 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2248 (T415786)', diff saved to https://phabricator.wikimedia.org/P89010 and previous config saved to /var/cache/conftool/dbconfig/20260224-165542-marostegui.json
2026-02-24 16:55:48 <stashbot> T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
2026-02-24 16:57:43 <logmsgbot> !log fceratto@deploy2002 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
2026-02-24 17:00:05 <jouncebot> jhathaway and rzl: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1700).
2026-02-24 17:00:05 <jouncebot> A_smart_kitten: A patch you scheduled for Puppet request window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2026-02-24 17:00:43 <A_smart_kitten> here :)
2026-02-24 17:00:48 <rzl> A_smart_kitten: o/ looking
2026-02-24 17:02:27 <A_smart_kitten> for my patch, on the subject of testing it, i guess a way to test it post-deployment might be to manually trigger alerts for those components, and see if the task(s) then get filed with the right tags? (if manually triggering alerts like that is possible, that is)
2026-02-24 17:04:23 <rzl> A_smart_kitten: so, in serviceops we're in the middle of redoing our phab workflow (cc matthieulec) -- I want to run the proposal by the team before +2ing, just for social reasons not technical ones :)
2026-02-24 17:05:37 <rzl> sorry for the extra delay, I know it's frustrating especially because I see you got a positive reply on the task already, I just want to make sure we get a chance to discuss
2026-02-24 17:06:59 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Gerrit-Privilege-Requests, ''Release-Engineering-Team, ''Security-Team: Request membership in deployment (and wmf-deployment group) for Rsilvola - https://phabricator.wikimedia.org/T418004#11647253 (''Dzahn) @Rsilvola Gotcha! We just need to verify it's really you and your...'
2026-02-24 17:06:59 <A_smart_kitten> rzl: sounds okay to me, but thanks for acknowledging the situation re the positive reply on the task :) [I probably assumed it represented the okay from serviceops generally]
2026-02-24 17:07:02 <A_smart_kitten> do you want me to reschedule in the future for another puppet request window, or should I leave it to serviceops to deploy as/when?
2026-02-24 17:08:00 <rzl> good question -- you can consider this handed off, if the team is happy with it I'll merge it async and no need for another window
2026-02-24 17:08:30 <A_smart_kitten> rzl: ty, will leave it with you :)
2026-02-24 17:08:52 <rzl> if you don't hear back in, let's say a week, please do ping me directly
2026-02-24 17:09:11 <A_smart_kitten> will do (probably on the task)
2026-02-24 17:09:18 <rzl> sgtm!
2026-02-24 17:10:09 <wikibugs> 'SRE, ''SRE-swift-storage, ''Ceph, ''Data-Persistence, and 2 others: Onboard the Docker Registry to apus - https://phabricator.wikimedia.org/T394476#11647274 (''MatthewVernon) At least so far, no issues with sync getting far behind either.'
2026-02-24 17:10:51 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2248', diff saved to https://phabricator.wikimedia.org/P89011 and previous config saved to /var/cache/conftool/dbconfig/20260224-171051-marostegui.json
2026-02-24 17:10:52 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to deployment for Eileen McFarland - https://phabricator.wikimedia.org/T418221#11647276 (''Dzahn) a:''thcipriani'
2026-02-24 17:12:23 <wikibugs> ('CR) ''Dillon: [C:''+1] Enable revert risk filters for first batch of wikis: < 1000 monthly edits [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1240672 (https://phabricator.wikimedia.org/T411485) (owner: ''Kgraessle)'
2026-02-24 17:13:30 <wikibugs> 'SRE, ''LDAP-Access-Requests: Request to deactivate/disable AndreiJirohOnDevsCentral LDAP dev account - https://phabricator.wikimedia.org/T418068#11647285 (''Dzahn) What is the goal to be achieved here?'
2026-02-24 17:15:06 <wikibugs> ('PS1) ''BCornwall: haproxy: Conditionally set cpu-map when >1 CPU [puppet] - ''https://gerrit.wikimedia.org/r/1243180 (https://phabricator.wikimedia.org/T418182)'
2026-02-24 17:15:18 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops, ''Data-Platform-SRE (2026-02-13 - 2026-03-06), ''Patch-For-Review: Q3:rack/setup/install druid-internal100[1-6] - https://phabricator.wikimedia.org/T417430#11647290 (''BTullis) a:''BTullis''None'
2026-02-24 17:16:36 <wikibugs> ('CR) ''BCornwall: [V:''+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8133/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1243180 (https://phabricator.wikimedia.org/T418182) (owner: ''BCornwall)'
2026-02-24 17:17:14 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.hosts.rename from an-worker1117 to dse-k8s-worker1024
2026-02-24 17:17:35 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 17:20:39 <wikibugs> ('CR) ''Fabfur: [C:''+1] haproxy: Conditionally set cpu-map when >1 CPU [puppet] - ''https://gerrit.wikimedia.org/r/1243180 (https://phabricator.wikimedia.org/T418182) (owner: ''BCornwall)'
2026-02-24 17:21:12 <wikibugs> ('CR) ''Dzahn: [V:''+1 C:''+2] gerrit: remove code for having multiple daemon users [puppet] - ''https://gerrit.wikimedia.org/r/1242467 (https://phabricator.wikimedia.org/T338470) (owner: ''Dzahn)'
2026-02-24 17:21:51 <icinga-wm> PROBLEM - Blazegraph Port for wdqs-blazegraph on wdqs2011 is CRITICAL: connect to address 127.0.0.1 and port 9999: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
2026-02-24 17:22:30 <wikibugs> ('CR) ''BCornwall: [V:''+1 C:''+2] haproxy: Conditionally set cpu-map when >1 CPU [puppet] - ''https://gerrit.wikimedia.org/r/1243180 (https://phabricator.wikimedia.org/T418182) (owner: ''BCornwall)'
2026-02-24 17:22:45 <icinga-wm> RECOVERY - Blazegraph Port for wdqs-blazegraph on wdqs2011 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
2026-02-24 17:23:18 <logmsgbot> btullis@cumin1003 rename (PID 1001646) is awaiting input
2026-02-24 17:26:00 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2248', diff saved to https://phabricator.wikimedia.org/P89012 and previous config saved to /var/cache/conftool/dbconfig/20260224-172559-marostegui.json
2026-02-24 17:28:32 <wikibugs> ('CR) ''Federico Ceratto: "recheck" [puppet] - ''https://gerrit.wikimedia.org/r/1243134 (https://phabricator.wikimedia.org/T317179) (owner: ''Federico Ceratto)'
2026-02-24 17:28:55 <wikibugs> ('CR) ''Federico Ceratto: "recheck" [puppet] - ''https://gerrit.wikimedia.org/r/1243134 (https://phabricator.wikimedia.org/T317179) (owner: ''Federico Ceratto)'
2026-02-24 17:29:22 <logmsgbot> !log fceratto@deploy2002 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
2026-02-24 17:34:27 <jinxer-wm> FIRING: HelmReleaseBadStatus: Helm release kserve/kserve on k8s-mlstaging@codfw in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s-mlstaging&var-namespace=kserve - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
2026-02-24 17:35:11 <icinga-wm> RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 17:35:19 <icinga-wm> RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 17:35:33 <wikibugs> ('PS1) ''Dzahn: backup: adjust gerrit file set after renaming of gerrit2 [puppet] - ''https://gerrit.wikimedia.org/r/1243183 (https://phabricator.wikimedia.org/T417247)'
2026-02-24 17:36:04 <wikibugs> ('PS2) ''Dzahn: backup: adjust gerrit file set after renaming of gerrit2 [puppet] - ''https://gerrit.wikimedia.org/r/1243183 (https://phabricator.wikimedia.org/T417247)'
2026-02-24 17:36:48 <wikibugs> ('PS1) ''BCornwall: ats: Set secondary nvme drives for new codfw hosts [puppet] - ''https://gerrit.wikimedia.org/r/1243184 (https://phabricator.wikimedia.org/T401832)'
2026-02-24 17:39:00 <wikibugs> ('CR) ''BCornwall: [V:''+1] "PCC SUCCESS (NOOP 1 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/"; [puppet] - ''https://gerrit.wikimedia.org/r/1243184 (https://phabricator.wikimedia.org/T401832) (owner: ''BCornwall)'
2026-02-24 17:39:00 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1117 to dse-k8s-worker1024 - btullis@cumin1003"
2026-02-24 17:41:08 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2248 (T415786)', diff saved to https://phabricator.wikimedia.org/P89013 and previous config saved to /var/cache/conftool/dbconfig/20260224-174107-marostegui.json
2026-02-24 17:41:12 <stashbot> T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
2026-02-24 17:41:25 <wikibugs> ('PS1) ''Dzahn: gerrit: cleanup Hiera and tests after gerrit2 renaming [puppet] - ''https://gerrit.wikimedia.org/r/1243187 (https://phabricator.wikimedia.org/T338470)'
2026-02-24 17:41:26 <wikibugs> ('CR) ''BCornwall: [C:''+1] dmarc: remove unused ruf tags [dns] - ''https://gerrit.wikimedia.org/r/1243174 (https://phabricator.wikimedia.org/T417941) (owner: ''JHathaway)'
2026-02-24 17:41:46 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1117 to dse-k8s-worker1024 - btullis@cumin1003"
2026-02-24 17:41:46 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-02-24 17:41:46 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache dse-k8s-worker1024 on all recursors
2026-02-24 17:41:50 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1024 on all recursors
2026-02-24 17:41:50 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1024
2026-02-24 17:42:23 <wikibugs> ('PS1) ''Dzahn: admin: rename gerrit system user [puppet] - ''https://gerrit.wikimedia.org/r/1243188 (https://phabricator.wikimedia.org/T338470)'
2026-02-24 17:43:04 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1024
2026-02-24 17:43:40 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1117 to dse-k8s-worker1024
2026-02-24 17:46:51 <icinga-wm> RECOVERY - Backup freshness on backup1014 is OK: Fresh: 139 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
2026-02-24 17:51:00 <wikibugs> ('CR) ''A smart kitten: "[FTR, current status as at T417020#11647567]" [puppet] - ''https://gerrit.wikimedia.org/r/1238369 (https://phabricator.wikimedia.org/T417020) (owner: ''A smart kitten)'
2026-02-24 17:52:42 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to deployment for Eileen McFarland - https://phabricator.wikimedia.org/T418221#11647570 (''thcipriani) Reason for access makes sense: approved for `deployment` group. @EMcFarland-WMF to deploy backports you'll also need to request `spiderpig-access` on https://i...'
2026-02-24 17:56:42 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''Mail, ''Patch-For-Review: Remove mail alias/fork from dmarc-rua@wikimedia.org to dmarc@donate.wikimedia.org - https://phabricator.wikimedia.org/T417941#11647590 (''Jgreen) ''Open''Resolved a:''Jgreen >>! In T417941#11646872, @jhathaway wrote: >>>! In T41794...'
2026-02-24 17:57:17 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2026-02-24 17:58:11 <wikibugs> 'SRE, ''LDAP-Access-Requests: Request to deactivate/disable AndreiJirohOnDevsCentral LDAP dev account - https://phabricator.wikimedia.org/T418068#11647596 (''ajhalili2006) >>! In T418068#11647285, @Dzahn wrote: > What is the goal to be achieved here? Since I manually renamed my Wikimedia developer account in...'
2026-02-24 18:00:05 <jouncebot> Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1800)
2026-02-24 18:00:27 <wikibugs> ('PS5) ''Urbanecm: [Growth] Enable on all open Wikipedias [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1239949 (https://phabricator.wikimedia.org/T417023)'
2026-02-24 18:00:52 <wikibugs> ('CR) ''Urbanecm: "rebase done by ignoring conflicts and using `composer manage-dblist update` to re-generate dblists-index.php" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1239949 (https://phabricator.wikimedia.org/T417023) (owner: ''Urbanecm)'
2026-02-24 18:03:06 <wikibugs> ('PS1) ''Urbanecm: feat(DataProvider): Allow logging of read validation failures [extensions/CommunityConfiguration] (wmf/1.46.0-wmf.16) - ''https://gerrit.wikimedia.org/r/1243190 (https://phabricator.wikimedia.org/T417893)'
2026-02-24 18:07:50 <wikibugs> ('CR) ''Hashar: "I assume the previously backed up `/var/lib/gerrit2` will remain present in the backup system and this will only apply to the future backu" [puppet] - ''https://gerrit.wikimedia.org/r/1243183 (https://phabricator.wikimedia.org/T417247) (owner: ''Dzahn)'
2026-02-24 18:11:14 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops, ''Data-Platform-SRE (2026-02-13 - 2026-03-06): Unusually high disk errors on the an-worker nodes since upgrading the disks - https://phabricator.wikimedia.org/T415002#11647680 (''wiki_willy) Hi @BTullis - sure, that sounds like a good test plan. One thing to keep in mind thou...'
2026-02-24 18:17:56 <wikibugs> ('PS2) ''Cwhite: admin: add keys for cwhite [puppet] - ''https://gerrit.wikimedia.org/r/1242411'
2026-02-24 18:18:24 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.hosts.decommission for hosts an-worker[1118,1131,1133-1134].eqiad.wmnet
2026-02-24 18:28:04 <wikibugs> ('CR) ''Dzahn: "I don't understand why removing some brackets limits it to eqiad and codfw but the intention sounds good." [alerts] - ''https://gerrit.wikimedia.org/r/1243102 (https://phabricator.wikimedia.org/T418084) (owner: ''Arnaudb)'
2026-02-24 18:29:45 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 18:30:01 <wikibugs> ('CR) ''Dzahn: "Yea, it will affect what will be backed up next time. But also we generally can't expect anything to remain in the backup system for truly" [puppet] - ''https://gerrit.wikimedia.org/r/1243183 (https://phabricator.wikimedia.org/T417247) (owner: ''Dzahn)'
2026-02-24 18:32:31 <wikibugs> 'SRE, ''LDAP-Access-Requests: Request to deactivate/disable AndreiJirohOnDevsCentral LDAP dev account - https://phabricator.wikimedia.org/T418068#11647766 (''Dzahn) That seems the same thing as just setting a random password and not logging in anymore?'
2026-02-24 18:34:07 <wikibugs> 'SRE, ''Infrastructure Security, ''LDAP-Access-Requests: Request to deactivate/disable AndreiJirohOnDevsCentral LDAP dev account - https://phabricator.wikimedia.org/T418068#11647782 (''Dzahn)'
2026-02-24 18:34:35 <wikibugs> 'SRE, ''Infrastructure Security, ''Infrastructure-Foundations, ''LDAP-Access-Requests: Request to deactivate/disable AndreiJirohOnDevsCentral LDAP dev account - https://phabricator.wikimedia.org/T418068#11647788 (''A_smart_kitten) >>! In T418068#11647766, @Dzahn wrote: > Not sure if another type of "lock...'
2026-02-24 18:34:56 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1118,1131,1133-1134].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
2026-02-24 18:36:26 <wikibugs> 'SRE, ''Infrastructure Security, ''Infrastructure-Foundations, ''LDAP-Access-Requests: Request to deactivate/disable AndreiJirohOnDevsCentral LDAP dev account - https://phabricator.wikimedia.org/T418068#11647792 (''Dzahn) I am not sure if users who are simply not active need to be banned. Leaving that to...'
2026-02-24 18:37:09 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1118,1131,1133-1134].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
2026-02-24 18:37:09 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-02-24 18:37:11 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker[1118,1131,1133-1134].eqiad.wmnet
2026-02-24 18:40:15 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.hosts.decommission for hosts dse-k8s-worker1024.eqiad.wmnet
2026-02-24 18:42:28 <icinga-wm> PROBLEM - Confd vcl based reload on cp6014 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish
2026-02-24 18:43:02 <icinga-wm> PROBLEM - Confd vcl based reload on cp2035 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish
2026-02-24 18:43:02 <icinga-wm> PROBLEM - Confd vcl based reload on cp2033 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish
2026-02-24 18:43:06 <sukhe> uh?
2026-02-24 18:43:24 <sukhe> brett: ^ anything to know about?
2026-02-24 18:45:18 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 18:46:20 <wikibugs> 'SRE, ''Traffic, ''Patch-For-Review: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11647903 (''ssingh) ''Open''Resolved a:''ssingh With the rollout of ns[02] IPv6 glue records today, we have IPv6 support on all ns[0-2].wikimedia.org. There is some more work here: we h...'
2026-02-24 18:47:33 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Gerrit-Privilege-Requests, ''Release-Engineering-Team, ''Security-Team: Request membership in deployment (and wmf-deployment group) for Rsilvola - https://phabricator.wikimedia.org/T418004#11647908 (''Dzahn)'
2026-02-24 18:47:37 <wikibugs> ('PS5) ''Ssingh: P:bird::anycast: automatically detect IPv6 support [puppet] - ''https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605)'
2026-02-24 18:47:42 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Gerrit-Privilege-Requests, ''Release-Engineering-Team, ''Security-Team: Request membership in deployment (and wmf-deployment group) for Rsilvola - https://phabricator.wikimedia.org/T418004#11647909 (''Dzahn) Thanks! SSH key confirmed out-of-band.'
2026-02-24 18:48:17 <wikibugs> ('PS1) ''BCornwall: varnishkafka: Only enable prom exporter for text [puppet] - ''https://gerrit.wikimedia.org/r/1243195 (https://phabricator.wikimedia.org/T401832)'
2026-02-24 18:49:58 <wikibugs> ('CR) ''Ssingh: [V:''+1] "[Still looking for a review" [puppet] - ''https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605) (owner: ''Ssingh)'
2026-02-24 18:50:38 <wikibugs> ('PS1) ''Dzahn: admin: add rsilvola to deployment group [puppet] - ''https://gerrit.wikimedia.org/r/1243196 (https://phabricator.wikimedia.org/T418004)'
2026-02-24 18:50:46 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1024.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
2026-02-24 18:52:06 <wikibugs> 'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11647948 (''Papaul) a:''Papaul''ayounsi'
2026-02-24 18:52:37 <wikibugs> ('CR) ''BCornwall: [V:''+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8137/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1243195 (https://phabricator.wikimedia.org/T401832) (owner: ''BCornwall)'
2026-02-24 18:53:07 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Superset for mikez - https://phabricator.wikimedia.org/T418098#11647956 (''Dzahn)'
2026-02-24 18:53:23 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to analytics-platform-eng-admins for milimetric - https://phabricator.wikimedia.org/T417906#11647957 (''Dzahn)'
2026-02-24 18:53:31 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1024.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
2026-02-24 18:53:31 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-02-24 18:53:32 <logmsgbot> !log btullis@cumin1003 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dse-k8s-worker1024.eqiad.wmnet
2026-02-24 18:56:27 <logmsgbot> !log fceratto@deploy2002 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
2026-02-24 19:00:04 <jouncebot> dduvall and dancy: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T1900).
2026-02-24 19:05:29 <jinxer-wm> FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 19:05:41 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.dns.netbox
2026-02-24 19:05:52 <dduvall> dancy: pretrain failed due to an unclean `/srv/patches` but it seems fine now. i'll roll testwikis and then group0 shortly after
2026-02-24 19:06:38 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops: Q3:rack/setup/install frqueue2004 - https://phabricator.wikimedia.org/T416251#11648006 (''Jgreen) ''Open''In progress p:''Triage''Medium a:''Jgreen'
2026-02-24 19:07:00 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops, ''fundraising-tech-ops: Q3:rack/setup/install frqueue2004 - https://phabricator.wikimedia.org/T416251#11648009 (''Jgreen)'
2026-02-24 19:07:03 <dancy> dduvall: sounds good
2026-02-24 19:08:02 <icinga-wm> RECOVERY - Confd vcl based reload on cp2035 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
2026-02-24 19:08:02 <icinga-wm> RECOVERY - Confd vcl based reload on cp2033 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
2026-02-24 19:08:14 <wikibugs> ('PS1) ''TrainBranchBot: testwikis to 1.46.0-wmf.17 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243197 (https://phabricator.wikimedia.org/T413808)'
2026-02-24 19:08:16 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Initiated by dduvall@deploy2002" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243197 (https://phabricator.wikimedia.org/T413808) (owner: ''TrainBranchBot)'
2026-02-24 19:08:30 <icinga-wm> RECOVERY - Confd vcl based reload on cp6014 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
2026-02-24 19:09:52 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating records after renaming and moving vlan of some an-worker hosts - btullis@cumin1003"
2026-02-24 19:09:56 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating records after renaming and moving vlan of some an-worker hosts - btullis@cumin1003"
2026-02-24 19:09:57 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2026-02-24 19:10:02 <icinga-wm> PROBLEM - Confd vcl based reload on cp2031 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish
2026-02-24 19:10:10 <wikibugs> ('Merged) ''jenkins-bot: testwikis to 1.46.0-wmf.17 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243197 (https://phabricator.wikimedia.org/T413808) (owner: ''TrainBranchBot)'
2026-02-24 19:10:25 <jinxer-wm> RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2026-02-24 19:11:37 <wikibugs> ('PS1) ''BCornwall: kafka::webrequest: Only use varnishkafka when on [puppet] - ''https://gerrit.wikimedia.org/r/1243199'
2026-02-24 19:11:37 <wikibugs> ('PS1) ''BCornwall: kafka::webrequest: Tighten monitoring guard [puppet] - ''https://gerrit.wikimedia.org/r/1243200'
2026-02-24 19:12:09 <logmsgbot> !log dduvall@deploy2002 Started scap sync-world: testwikis to 1.46.0-wmf.17 refs T413808
2026-02-24 19:12:13 <stashbot> T413808: 1.46.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T413808
2026-02-24 19:13:50 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1024
2026-02-24 19:13:56 <wikibugs> ('CR) ''CI reject: [V:''-1] kafka::webrequest: Only use varnishkafka when on [puppet] - ''https://gerrit.wikimedia.org/r/1243199 (owner: ''BCornwall)'
2026-02-24 19:14:15 <logmsgbot> !log btullis@cumin1003 END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dse-k8s-worker1024
2026-02-24 19:14:21 <wikibugs> ('CR) ''CI reject: [V:''-1] kafka::webrequest: Tighten monitoring guard [puppet] - ''https://gerrit.wikimedia.org/r/1243200 (owner: ''BCornwall)'
2026-02-24 19:14:38 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1024
2026-02-24 19:15:50 <wikibugs> ('PS2) ''BCornwall: kafka::webrequest: Tighten monitoring guard [puppet] - ''https://gerrit.wikimedia.org/r/1243200'
2026-02-24 19:15:55 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1024
2026-02-24 19:15:59 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1025
2026-02-24 19:17:27 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1025
2026-02-24 19:18:14 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1026
2026-02-24 19:18:33 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1026
2026-02-24 19:18:39 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1027
2026-02-24 19:18:53 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1027
2026-02-24 19:18:57 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1028
2026-02-24 19:19:11 <logmsgbot> !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1028
2026-02-24 19:20:11 <logmsgbot> !log btullis@cumin1003 START - Cookbook sre.hosts.provision for host dse-k8s-worker1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
2026-02-24 19:20:56 <wikibugs> ('CR) ''BCornwall: [V:''+1] "PCC SUCCESS (NOOP 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8138/console"; [puppet] - ''https://gerrit.wikimedia.org/r/1243200 (owner: ''BCornwall)'
2026-02-24 19:25:08 <logmsgbot> btullis@cumin1003 provision (PID 1103540) is awaiting input
2026-02-24 19:25:37 <wikibugs> ('CR) ''Dzahn: "ACK! I made https://phabricator.wikimedia.org/T418299 just now to track that." [puppet] - ''https://gerrit.wikimedia.org/r/1242483 (https://phabricator.wikimedia.org/T418109) (owner: ''Dzahn)'
2026-02-24 19:25:57 <wikibugs> ('CR) ''Dzahn: [C:''+2] releases: upgrade Java version from 17 to 21 [puppet] - ''https://gerrit.wikimedia.org/r/1242483 (https://phabricator.wikimedia.org/T418109) (owner: ''Dzahn)'
2026-02-24 19:26:04 <wikibugs> ('PS4) ''Dzahn: releases: upgrade Java version from 17 to 21 [puppet] - ''https://gerrit.wikimedia.org/r/1242483 (https://phabricator.wikimedia.org/T418109)'
2026-02-24 19:26:52 <logmsgbot> !log btullis@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
2026-02-24 19:27:31 <wikibugs> ('PS2) ''BCornwall: varnishkafka: Only enable prom exporter for text [puppet] - ''https://gerrit.wikimedia.org/r/1243195 (https://phabricator.wikimedia.org/T401832)'
2026-02-24 19:27:56 <wikibugs> ('CR) ''Dzahn: [C:''+2] releases: upgrade Java version from 17 to 21 [puppet] - ''https://gerrit.wikimedia.org/r/1242483 (https://phabricator.wikimedia.org/T418109) (owner: ''Dzahn)'
2026-02-24 19:28:18 <wikibugs> ('PS3) ''BCornwall: varnishkafka: Only enable for text [puppet] - ''https://gerrit.wikimedia.org/r/1243195 (https://phabricator.wikimedia.org/T401832)'
2026-02-24 19:29:41 <wikibugs> ('CR) ''BCornwall: [V:''+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8139/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1243195 (https://phabricator.wikimedia.org/T401832) (owner: ''BCornwall)'
2026-02-24 19:30:41 <wikibugs> ('CR) ''Dzahn: [C:''+2] "and .. it failed." [puppet] - ''https://gerrit.wikimedia.org/r/1242483 (https://phabricator.wikimedia.org/T418109) (owner: ''Dzahn)'
2026-02-24 19:32:18 <wikibugs> ('CR) ''Dzahn: [C:''+2] "E: The list of sources could not be read" [puppet] - ''https://gerrit.wikimedia.org/r/1242483 (https://phabricator.wikimedia.org/T418109) (owner: ''Dzahn)'
2026-02-24 19:33:04 <wikibugs> ('Abandoned) ''BCornwall: kafka::webrequest: Only use varnishkafka when on [puppet] - ''https://gerrit.wikimedia.org/r/1243199 (owner: ''BCornwall)'
2026-02-24 19:33:20 <wikibugs> ('CR) ''Dzahn: [C:''+2] "E: Conflicting values set for option Signed-By regarding source http://apt.wikimedia.org/wikimedia/ bookworm-wikimedia: /etc/apt/keyrings/" [puppet] - ''https://gerrit.wikimedia.org/r/1242483 (https://phabricator.wikimedia.org/T418109) (owner: ''Dzahn)'
2026-02-24 19:33:30 <wikibugs> ('PS1) ''Joal: Extend webrequest and other data retention [puppet] - ''https://gerrit.wikimedia.org/r/1243205 (https://phabricator.wikimedia.org/T418162)'
2026-02-24 19:34:15 <wikibugs> ('PS1) ''Dzahn: Revert "releases: upgrade Java version from 17 to 21" [puppet] - ''https://gerrit.wikimedia.org/r/1243206'
2026-02-24 19:34:29 <wikibugs> ('CR) ''Dzahn: [C:''+2] Revert "releases: upgrade Java version from 17 to 21" [puppet] - ''https://gerrit.wikimedia.org/r/1243206 (owner: ''Dzahn)'
2026-02-24 19:38:45 <wikibugs> ('CR) ''JHathaway: [C:''+2] dmarc: remove unused ruf tags [dns] - ''https://gerrit.wikimedia.org/r/1243174 (https://phabricator.wikimedia.org/T417941) (owner: ''JHathaway)'
2026-02-24 19:39:54 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops, ''fundraising-tech-ops: Q3:rack/setup/install frqueue2004 - https://phabricator.wikimedia.org/T416251#11648202 (''Jgreen)'
2026-02-24 19:40:02 <logmsgbot> !log jhathaway@dns1004 START - running authdns-update
2026-02-24 19:41:28 <logmsgbot> !log jhathaway@dns1004 END - running authdns-update
2026-02-24 19:43:11 <wikibugs> ('CR) ''JavierMonton: [C:''+1] Extend webrequest and other data retention [puppet] - ''https://gerrit.wikimedia.org/r/1243205 (https://phabricator.wikimedia.org/T418162) (owner: ''Joal)'
2026-02-24 19:56:48 <logmsgbot> !log dduvall@deploy2002 Finished scap sync-world: testwikis to 1.46.0-wmf.17 refs T413808 (duration: 44m 39s)
2026-02-24 19:56:52 <stashbot> T413808: 1.46.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T413808
2026-02-24 19:58:57 <wikibugs> ('PS1) ''TrainBranchBot: group0 to 1.46.0-wmf.17 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243211 (https://phabricator.wikimedia.org/T413808)'
2026-02-24 19:58:59 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Initiated by dduvall@deploy2002" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243211 (https://phabricator.wikimedia.org/T413808) (owner: ''TrainBranchBot)'
2026-02-24 20:00:00 <wikibugs> ('Merged) ''jenkins-bot: group0 to 1.46.0-wmf.17 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1243211 (https://phabricator.wikimedia.org/T413808) (owner: ''TrainBranchBot)'
2026-02-24 20:08:22 <jinxer-wm> FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-02-24 20:08:30 <logmsgbot> !log dduvall@deploy2002 rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.17 refs T413808
2026-02-24 20:08:34 <stashbot> T413808: 1.46.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T413808
2026-02-24 20:19:59 <wikibugs> ('CR) ''Btullis: [C:''+2] Extend webrequest and other data retention [puppet] - ''https://gerrit.wikimedia.org/r/1243205 (https://phabricator.wikimedia.org/T418162) (owner: ''Joal)'
2026-02-24 20:23:21 <icinga-wm> PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2014.codfw.wmnet, wdqs2013.codfw.wmnet, wdqs2015.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 20:23:25 <icinga-wm> PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2015.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 20:36:21 <icinga-wm> RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 20:39:25 <icinga-wm> RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 20:42:18 <wikibugs> ('CR) ''RLazarus: [C:''+1] "LGTM as discussed offline -- this already should've been like this, and I agree it looks like a no-op for non-drain-related cases like tra" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242518 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 20:42:43 <dduvall> done with train
2026-02-24 20:48:22 <jinxer-wm> FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2026-02-24 20:50:50 <wikibugs> ('CR) ''RLazarus: [C:''+1] "This is starting to feel like it's straining the boundaries of what's reasonable to do in bash before rewriting it in Python. The changes " [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1242462 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 20:53:48 <wikibugs> ('PS1) ''Aqu: Bump Blunderbuss image [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243221 (https://phabricator.wikimedia.org/T415874)'
2026-02-24 20:55:44 <wikibugs> ('PS2) ''Aqu: Bump Blunderbuss image [deployment-charts] - ''https://gerrit.wikimedia.org/r/1243221 (https://phabricator.wikimedia.org/T415874)'
2026-02-24 20:55:44 <wikibugs> ('CR) ''RLazarus: [C:''+1] "This doesn't need to touch package.json as it's a patch version only, but `sextant update` would update the version numbers in package.loc" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242521 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 21:00:05 <jouncebot> RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T2100).
2026-02-24 21:00:05 <jouncebot> AaronSchulz: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2026-02-24 21:00:49 <jinxer-wm> FIRING: PuppetDisabled: Puppet disabled on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=relforge&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled
2026-02-24 21:05:51 <wikibugs> ('PS4) ''Pppery: Add Comments namespace for shnwikinews [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226024 (https://phabricator.wikimedia.org/T414403) (owner: ''Shivaansh Singh)'
2026-02-24 21:06:37 <wikibugs> ('CR) ''CI reject: [V:''-1] Add Comments namespace for shnwikinews [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226024 (https://phabricator.wikimedia.org/T414403) (owner: ''Shivaansh Singh)'
2026-02-24 21:07:25 <wikibugs> ('PS5) ''Pppery: Add Comments namespace for shnwikinews [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1226024 (https://phabricator.wikimedia.org/T414403) (owner: ''Shivaansh Singh)'
2026-02-24 21:09:50 <AaronSchulz> guess it's just me
2026-02-24 21:11:15 <wikibugs> ('PS27) ''CDobbins: prometheus: add pooled host check [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641)'
2026-02-24 21:11:54 <wikibugs> ('CR) ''CI reject: [V:''-1] prometheus: add pooled host check [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: ''CDobbins)'
2026-02-24 21:14:26 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by aaron@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1224253 (https://phabricator.wikimedia.org/T418188) (owner: ''Aaron Schulz)'
2026-02-24 21:14:27 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by aaron@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1224228 (https://phabricator.wikimedia.org/T418188) (owner: ''Aaron Schulz)'
2026-02-24 21:15:52 <wikibugs> ('Merged) ''jenkins-bot: Switch math sandbox specs to plain wikimedia.org [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1224253 (https://phabricator.wikimedia.org/T418188) (owner: ''Aaron Schulz)'
2026-02-24 21:16:04 <wikibugs> ('Merged) ''jenkins-bot: Copy rest_v1-wikimedia.json to standard-docroot [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1224228 (https://phabricator.wikimedia.org/T418188) (owner: ''Aaron Schulz)'
2026-02-24 21:16:23 <logmsgbot> !log aaron@deploy2002 Started scap sync-world: Backport for [[gerrit:1224253|Switch math sandbox specs to plain wikimedia.org (T418188)]], [[gerrit:1224228|Copy rest_v1-wikimedia.json to standard-docroot (T418188)]]
2026-02-24 21:16:28 <stashbot> T418188: Simplify static Restbase json spec file configuration - https://phabricator.wikimedia.org/T418188
2026-02-24 21:17:48 <Squee-D> Hey folks. We've noticed the mirror of ubuntu looks about 22 days behind. Does that seem correct? I am basing this on "curl -sI "https://mirrors.wikimedia.org/ubuntu/dists/jammy-updates/Release"; | grep -i last-modified"
2026-02-24 21:19:02 <wikibugs> ('PS28) ''CDobbins: prometheus: add pooled host check [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641)'
2026-02-24 21:19:03 <logmsgbot> !log aaron@deploy2002 aaron: Backport for [[gerrit:1224253|Switch math sandbox specs to plain wikimedia.org (T418188)]], [[gerrit:1224228|Copy rest_v1-wikimedia.json to standard-docroot (T418188)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2026-02-24 21:19:45 <logmsgbot> !log aaron@deploy2002 aaron: Continuing with sync
2026-02-24 21:22:26 <wikibugs> 'SRE, ''Fundraising-Backlog, ''Fundraising-Tech-Roadmap, ''MediaWiki-extensions-CentralNotice, ''Traffic: Set expiry time for GeoIP cookies - https://phabricator.wikimedia.org/T122097#11648602 (''AKanji-WMF)'
2026-02-24 21:23:43 <logmsgbot> !log aaron@deploy2002 Finished scap sync-world: Backport for [[gerrit:1224253|Switch math sandbox specs to plain wikimedia.org (T418188)]], [[gerrit:1224228|Copy rest_v1-wikimedia.json to standard-docroot (T418188)]] (duration: 07m 20s)
2026-02-24 21:23:47 <stashbot> T418188: Simplify static Restbase json spec file configuration - https://phabricator.wikimedia.org/T418188
2026-02-24 21:24:02 <AaronSchulz> done
2026-02-24 21:28:00 <wikibugs> ('PS1) ''JHathaway: dmarc: add dmarc records for domains which do not send email [dns] - ''https://gerrit.wikimedia.org/r/1243225'
2026-02-24 21:34:27 <jinxer-wm> FIRING: HelmReleaseBadStatus: Helm release kserve/kserve on k8s-mlstaging@codfw in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s-mlstaging&var-namespace=kserve - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
2026-02-24 21:37:05 <wikibugs> ('CR) ''BCornwall: [C:''+1] dmarc: add dmarc records for domains which do not send email [dns] - ''https://gerrit.wikimedia.org/r/1243225 (owner: ''JHathaway)'
2026-02-24 21:40:18 <wikibugs> ('PS29) ''CDobbins: prometheus: add pooled host check [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641)'
2026-02-24 21:41:19 <icinga-wm> RECOVERY - Ubuntu mirror in sync with upstream on mirror1001 is OK: /srv/mirrors/ubuntu is over 1 hours old. https://wikitech.wikimedia.org/wiki/Mirrors
2026-02-24 21:42:05 <icinga-wm> RECOVERY - Confd vcl based reload on cp2031 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
2026-02-24 21:42:24 <wikibugs> ('CR) ''CI reject: [V:''-1] prometheus: add pooled host check [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: ''CDobbins)'
2026-02-24 21:45:55 <wikibugs> ('PS30) ''CDobbins: prometheus: add pooled host check [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641)'
2026-02-24 21:48:01 <wikibugs> ('CR) ''CI reject: [V:''-1] prometheus: add pooled host check [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: ''CDobbins)'
2026-02-24 21:53:24 <wikibugs> ('CR) ''RLazarus: [C:''+1] "Just to add a little excitement: I had a moment of uncertainty whether the `env` map in the container spec is applied to the lifecycle hoo" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242520 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 21:59:46 <wikibugs> ('PS31) ''CDobbins: prometheus: add pooled host check [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641)'
2026-02-24 22:00:04 <jouncebot> Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T2200)
2026-02-24 22:05:16 <wikibugs> ('CR) ''CDobbins: [V:''+1] "PCC SUCCESS (NOOP 2 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/"; [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: ''CDobbins)'
2026-02-24 22:06:36 <wikibugs> ('CR) ''RLazarus: [C:''+1] envoy: Allow inboundonly drain and support min wait time (''1 comment) [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1242462 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 22:07:03 <wikibugs> ('CR) ''CDobbins: "Done" [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: ''CDobbins)'
2026-02-24 22:14:39 <wikibugs> ('CR) ''RLazarus: [C:''+1] "Helm diffs look good!" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242522 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 22:22:52 <wikibugs> ('CR) ''Cwhite: [C:''+1] "LGTM from my side! Thanks!" [puppet] - ''https://gerrit.wikimedia.org/r/1237215 (https://phabricator.wikimedia.org/T255568) (owner: ''Majavah)'
2026-02-24 22:24:40 <wikibugs> 'SRE, ''Observability-Metrics, ''Prod-Kubernetes, ''ServiceOps new, ''SRE Observability (FY2025/2026-Q3): write some recording rules for queries used in the appserver RED k8s dashboard - https://phabricator.wikimedia.org/T249663#11648811 (''colewhite)'
2026-02-24 22:28:04 <wikibugs> ('PS3) ''Hashar: Revert^2 "Gerrit: Disable auto reloading replication config" [puppet] - ''https://gerrit.wikimedia.org/r/1238043 (https://phabricator.wikimedia.org/T416929)'
2026-02-24 22:29:24 <wikibugs> ('CR) ''Hashar: "I have removed the link to T379714 which is "Upgrade to Gerrit 3.11" it is unrelated. Though MAYBE the replication plugin has a fix for t" [puppet] - ''https://gerrit.wikimedia.org/r/1238043 (https://phabricator.wikimedia.org/T416929) (owner: ''Hashar)'
2026-02-24 22:29:51 <hashar> jouncebot: nowandnext
2026-02-24 22:29:51 <jouncebot> For the next 0 hour(s) and 30 minute(s): Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260224T2200)
2026-02-24 22:29:51 <jouncebot> In 8 hour(s) and 30 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260225T0700)
2026-02-24 22:30:01 <hashar> I need to restart Gerrit
2026-02-24 22:31:40 <wikibugs> ('CR) ''BCornwall: "Really close!" [puppet] - ''https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: ''CDobbins)'
2026-02-24 22:35:02 <hashar> !log Restarted Gerrit due to a replication config issue
2026-02-24 22:35:05 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 22:37:20 <brett> !log import ncmonitor 3.1.0~deb13u1 into trixie-wikimedia (T401832)
2026-02-24 22:37:21 <icinga-wm> PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 22:37:24 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 22:37:25 <stashbot> T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832
2026-02-24 22:37:25 <icinga-wm> PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 22:40:39 <logmsgbot> !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host ncmonitor1001.eqiad.wmnet with OS trixie
2026-02-24 22:43:57 <wikibugs> ('CR) ''Cwhite: [C:''+1] mtail: Use the Debian version of mtail universally [puppet] - ''https://gerrit.wikimedia.org/r/1243048 (owner: ''Muehlenhoff)'
2026-02-24 22:45:23 <icinga-wm> PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs1020.eqiad.wmnet, wdqs1015.eqiad.wmnet, wdqs1013.eqiad.wmnet, wdqs1019.eqiad.wmnet, wdqs1012.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 22:47:23 <icinga-wm> PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs1017.eqiad.wmnet, wdqs1018.eqiad.wmnet, wdqs1015.eqiad.wmnet, wdqs1011.eqiad.wmnet, wdqs1021.eqiad.wmnet, wdqs1016.eqiad.wmnet, wdqs1012.eqiad.wmnet, wdqs1022.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 22:52:13 <logmsgbot> !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ncmonitor1001.eqiad.wmnet with reason: host reimage
2026-02-24 22:58:23 <icinga-wm> RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 22:58:23 <icinga-wm> RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 22:58:32 <ryankemper> !log [WDQS] `ryankemper@cumin2002:~$ sudo -E cumin 'A:wdqs-main AND P{wdqs1*}' 'systemctl restart wdqs-blazegraph'`
2026-02-24 22:58:34 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 22:59:46 <logmsgbot> !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncmonitor1001.eqiad.wmnet with reason: host reimage
2026-02-24 23:04:52 <ryankemper> !log [WDQS] `ryankemper@cumin2002:~$ sudo -E cumin 'A:wdqs-main AND P{wdqs2*} AND NOT P{wdqs2012*}' 'systemctl restart wdqs-blazegraph'` (2012 still seems healthy, rest are all not)
2026-02-24 23:04:52 <wikibugs> ('CR) ''Scott French: "Thanks, Moritz." [puppet] - ''https://gerrit.wikimedia.org/r/1243166 (owner: ''Muehlenhoff)'
2026-02-24 23:04:54 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 23:05:25 <icinga-wm> RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 23:05:27 <icinga-wm> RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 23:08:25 <wikibugs> ('PS7) ''Aaron Schulz: trafficserver: cleanup redundant lint-related rest gateway routing config [puppet] - ''https://gerrit.wikimedia.org/r/1210631'
2026-02-24 23:08:27 <icinga-wm> PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2014.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 23:09:08 <wikibugs> ('PS2) ''Aaron Schulz: Simplify spec-json-wikimedia route and use meta.wikimedia.org [deployment-charts] - ''https://gerrit.wikimedia.org/r/1242576 (https://phabricator.wikimedia.org/T418188)'
2026-02-24 23:11:27 <icinga-wm> PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2007.codfw.wmnet, wdqs2008.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 23:19:50 <wikibugs> ('PS1) ''Hashar: gerrit: update gerrit2002 after reimaging [puppet] - ''https://gerrit.wikimedia.org/r/1243257 (https://phabricator.wikimedia.org/T417247)'
2026-02-24 23:24:39 <wikibugs> ('PS2) ''Scott French: envoy: Allow inboundonly drain and support min wait time [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1242462 (https://phabricator.wikimedia.org/T364245)'
2026-02-24 23:26:22 <wikibugs> ('PS1) ''BCornwall: ncmonitor: Add ncmonitor sysuser [puppet] - ''https://gerrit.wikimedia.org/r/1243258'
2026-02-24 23:28:27 <icinga-wm> RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 23:28:27 <icinga-wm> RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2026-02-24 23:28:38 <wikibugs> ('PS2) ''BCornwall: ncmonitor: Add ncmonitor sysuser [puppet] - ''https://gerrit.wikimedia.org/r/1243258'
2026-02-24 23:29:39 <wikibugs> ('CR) ''Scott French: "100% agreed. On balance, I'm hopeful that we can get rid of this with the transition to sidecar containers, since this is actually somethi" [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1242462 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 23:29:52 <logmsgbot> !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncmonitor1001.eqiad.wmnet with OS trixie
2026-02-24 23:31:25 <wikibugs> ('CR) ''BCornwall: [V:''+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8141/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1243258 (owner: ''BCornwall)'
2026-02-24 23:33:02 <wikibugs> ('CR) ''Bking: [C:''+2] gerrit: update gerrit2002 after reimaging [puppet] - ''https://gerrit.wikimedia.org/r/1243257 (https://phabricator.wikimedia.org/T417247) (owner: ''Hashar)'
2026-02-24 23:35:01 <wikibugs> ('CR) ''Scott French: [V:''+2] "Built and verified against local envoy test setup (again)." [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1242462 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 23:35:23 <wikibugs> ('CR) ''RLazarus: [C:''+1] envoy: Allow inboundonly drain and support min wait time [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1242462 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 23:36:43 <wikibugs> ('CR) ''ArielGlenn: "I think (with a caveat, see the one comment) that this is ok. I am uneasy that we don't have an end to end test for this in staging, which" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1240388 (https://phabricator.wikimedia.org/T417780) (owner: ''Daniel Kinzler)'
2026-02-24 23:38:20 <wikibugs> ('CR) ''Scott French: [V:''+2 C:''+2] envoy: Allow inboundonly drain and support min wait time [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/1242462 (https://phabricator.wikimedia.org/T364245) (owner: ''Scott French)'
2026-02-24 23:41:32 <swfrench-wmf> !log built envoy images (1.35.7-3) - T364245
2026-02-24 23:41:36 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2026-02-24 23:41:36 <stashbot> T364245: Recentchanges and cu_changes tables are occasionally missing revisions on multiple wikis - https://phabricator.wikimedia.org/T364245
2026-02-24 23:42:17 <wikibugs> ('PS3) ''BCornwall: ncmonitor: Add ncmonitor sysuser [puppet] - ''https://gerrit.wikimedia.org/r/1243258'
2026-02-24 23:43:03 <wikibugs> ('CR) ''CI reject: [V:''-1] ncmonitor: Add ncmonitor sysuser [puppet] - ''https://gerrit.wikimedia.org/r/1243258 (owner: ''BCornwall)'
2026-02-24 23:45:33 <wikibugs> ('PS4) ''BCornwall: ncmonitor: Add ncmonitor sysuser [puppet] - ''https://gerrit.wikimedia.org/r/1243258'
2026-02-24 23:47:33 <wikibugs> ('CR) ''BCornwall: [V:''+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8143/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1243258 (owner: ''BCornwall)'

This page is generated from SQL logs, you can also download static txt files from here