[00:00:25] <icinga-wm>	 PROBLEM - Host cirrussearch2089 is DOWN: PING CRITICAL - Packet loss = 100%
[00:01:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[00:05:14] <logmsgbot>	 !log krinkle@deploy1003 Finished scap sync-world: Backport for [[gerrit:1169737|multiversion: Fix "Class Wikimedia\MWConfig\Exception not found"]] (duration: 21m 59s)
[00:08:30] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1170445
[00:08:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1170445 (owner: 10TrainBranchBot)
[00:14:10] <wikibugs>	 (03PS13) 10Krinkle: beta: redirect misc *.beta.wmflabs.org to *.beta.wmcloud.org [puppet] - 10https://gerrit.wikimedia.org/r/1170188 (https://phabricator.wikimedia.org/T289318)
[00:24:52] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1170445 (owner: 10TrainBranchBot)
[00:39:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[00:40:49] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: enable multiprocessing for kowiki-damaging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170447 (https://phabricator.wikimedia.org/T363336)
[00:44:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[00:46:39] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by krinkle@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170208 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle)
[00:47:33] <wikibugs>	 (03Merged) 10jenkins-bot: beta: Remove routing for *.beta.wmflabs.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170208 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle)
[00:47:53] <logmsgbot>	 !log krinkle@deploy1003 Started scap sync-world: Backport for [[gerrit:1170208|beta: Remove routing for *.beta.wmflabs.org (T289318)]]
[00:47:57] <stashbot>	 T289318: Move *.beta.wmflabs.org to *.beta.wmcloud.org - https://phabricator.wikimedia.org/T289318
[00:49:10] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[00:49:48] <logmsgbot>	 !log krinkle@deploy1003 krinkle: Backport for [[gerrit:1170208|beta: Remove routing for *.beta.wmflabs.org (T289318)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[00:55:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[00:59:55] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:04:52] <logmsgbot>	 !log krinkle@deploy1003 krinkle: Continuing with sync
[01:05:55] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:07:40] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:10:06] <logmsgbot>	 !log krinkle@deploy1003 Finished scap sync-world: Backport for [[gerrit:1170208|beta: Remove routing for *.beta.wmflabs.org (T289318)]] (duration: 22m 13s)
[01:10:10] <stashbot>	 T289318: Move *.beta.wmflabs.org to *.beta.wmcloud.org - https://phabricator.wikimedia.org/T289318
[01:10:55] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:12:40] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:20:55] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:22:40] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:41:43] <wikibugs>	 (03CR) 10Kevin Bazira: "thank you so much for the merge Luca." [alerts] - 10https://gerrit.wikimedia.org/r/1170107 (https://phabricator.wikimedia.org/T399683) (owner: 10Kevin Bazira)
[01:43:27] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] "Thanks for cleaning up!" [puppet] - 10https://gerrit.wikimedia.org/r/1170096 (https://phabricator.wikimedia.org/T394072) (owner: 10Muehlenhoff)
[01:44:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:49:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:01:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:06:40] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:18:01] <wikibugs>	 10ops-codfw, 06DC-Ops: Inbound errors on interface cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://phabricator.wikimedia.org/T399916 (10phaultfinder) 03NEW
[02:18:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:23:40] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:33:57] <icinga-wm>	 PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:35:55] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:35:57] <icinga-wm>	 RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:39:04] <jinxer-wm>	 FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on wdqs1022:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[02:40:55] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:45:55] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:47:40] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:51:55] <jinxer-wm>	 FIRING: [5x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:54:12] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[02:54:25] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-esams:xe-0/1/2 (inter.link reserved port) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[02:56:55] <jinxer-wm>	 FIRING: [6x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:57:40] <jinxer-wm>	 RESOLVED: [6x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[03:08:57] <icinga-wm>	 PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:09:25] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[03:09:57] <icinga-wm>	 RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:10:39] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-drmrs and Orange (193.251.154.145) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[03:34:08] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:xe-0/1/2 (Transit: Orange (LD019029) {#D0072}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[03:50:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[03:54:08] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:xe-0/1/2 (Transit: Orange (LD019029) {#D0072}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[03:55:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:00:10] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:01:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[04:10:40] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:15:40] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:19:08] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:xe-0/1/2 (Transit: Orange (LD019029) {#D0072}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[04:26:40] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:31:40] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:34:08] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:xe-0/1/2 (Transit: Orange (LD019029) {#D0072}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[04:40:39] <jinxer-wm>	 RESOLVED: [2x] TransitBGPDown: Transit BGP session down between cr2-drmrs and Orange (193.251.154.145) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[04:59:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:04:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:05:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:06:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:10:40] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:21:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:31:55] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:34:53] <wikibugs>	 (03CR) 10Stang: zhwiki: Allow local securepoll setup (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100228 (https://phabricator.wikimedia.org/T380020) (owner: 10Stang)
[05:35:05] <wikibugs>	 (03PS9) 10Stang: zhwiki: Allow local securepoll setup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100228 (https://phabricator.wikimedia.org/T380020)
[05:36:40] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:46:51] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/6 UP : OSPFv3: 5/5 UP : 6 v2 P2P interfaces vs. 5 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:48:49] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:51:37] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170455
[05:51:40] <jinxer-wm>	 FIRING: [7x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:52:55] <jinxer-wm>	 FIRING: [7x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:53:42] <wikibugs>	 (03PS2) 10Stevemunene: dns: Add dse-k8s codfw SRV records [dns] - 10https://gerrit.wikimedia.org/r/1170364 (https://phabricator.wikimedia.org/T397293)
[05:54:37] <wikibugs>	 (03CR) 10Stevemunene: dns: Add dse-k8s codfw SRV records (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/1170364 (https://phabricator.wikimedia.org/T397293) (owner: 10Stevemunene)
[05:55:36] <wikibugs>	 (03PS3) 10Ryan Kemper: Replace elasticsearch api with python requests [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167299 (https://phabricator.wikimedia.org/T390860)
[05:55:58] <wikibugs>	 (03CR) 10Ryan Kemper: Replace elasticsearch api with python requests (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167299 (https://phabricator.wikimedia.org/T390860) (owner: 10Ryan Kemper)
[05:56:40] <jinxer-wm>	 RESOLVED: [5x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[05:59:19] <wikibugs>	 (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170457
[06:00:07] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250718T0600)
[06:02:40] <jinxer-wm>	 FIRING: [6x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:05:03] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Replace elasticsearch api with python requests [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167299 (https://phabricator.wikimedia.org/T390860) (owner: 10Ryan Kemper)
[06:07:40] <jinxer-wm>	 FIRING: [6x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:09:36] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1229.eqiad.wmnet with reason: Maintenance
[06:12:40] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:12:55] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:17:40] <jinxer-wm>	 FIRING: [5x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:22:05] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.parsercache
[06:22:24] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
[06:22:40] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:28:07] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
[06:32:40] <jinxer-wm>	 FIRING: [5x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:37:40] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:39:04] <jinxer-wm>	 FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on wdqs1022:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[06:41:35] <wikibugs>	 (03CR) 10Elukey: [C:03+2] "Kevin the changes should be propagated via puppet 30/40 mins after the merge, so in our case we should be good." [alerts] - 10https://gerrit.wikimedia.org/r/1170107 (https://phabricator.wikimedia.org/T399683) (owner: 10Kevin Bazira)
[06:47:03] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "LGTM but please validate with your team as well :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170447 (https://phabricator.wikimedia.org/T363336) (owner: 10Kevin Bazira)
[06:48:38] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[06:48:40] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:48:55] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:51:22] <wikibugs>	 (03CR) 10Elukey: [C:03+2] admin_ng: bump memory quota for kartotherian on Wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1168840 (owner: 10Elukey)
[06:51:37] <wikibugs>	 (03PS2) 10Elukey: services: move kartotherian codfw to the maps-test postgres cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/1165551 (https://phabricator.wikimedia.org/T381565)
[06:52:45] <logmsgbot>	 elukey@cumin1003 provision (PID 2204614) is awaiting input
[06:53:10] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s2 #page on db1229 is OK: OK slave_sql_lag Replication lag: 0.02 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[06:53:40] <jinxer-wm>	 FIRING: [5x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:54:12] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[06:57:51] <logmsgbot>	 elukey@cumin1003 provision (PID 2204614) is awaiting input
[06:58:40] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[06:58:55] <jinxer-wm>	 FIRING: [5x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250718T0700)
[07:01:43] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops: Transition codfw data persistence external storage (es) hosts to 10G - https://phabricator.wikimedia.org/T399927 (10Marostegui) 03NEW
[07:02:30] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops: Transition codfw data persistence external storage (es) hosts to 10G - https://phabricator.wikimedia.org/T399927#11015887 (10Marostegui) p:05Triage→03Medium
[07:02:34] <wikibugs>	 (03CR) 10Elukey: [C:03+2] services: move kartotherian codfw to the maps-test postgres cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/1165551 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[07:03:40] <jinxer-wm>	 RESOLVED: [5x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[07:04:05] <logmsgbot>	 !log jelto@cumin1003 START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
[07:06:10] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] START helmfile.d/services/kartotherian: sync
[07:09:25] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[07:10:01] <logmsgbot>	 !log jelto@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
[07:10:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1229 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79353 and previous config saved to /var/cache/conftool/dbconfig/20250718-071014-root.json
[07:10:42] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:11:05] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1233.eqiad.wmnet with reason: Maintenance
[07:11:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1233 (T399249)', diff saved to https://phabricator.wikimedia.org/P79354 and previous config saved to /var/cache/conftool/dbconfig/20250718-071112-marostegui.json
[07:11:17] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[07:13:55] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[07:14:40] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[07:15:42] <jinxer-wm>	 RESOLVED: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:16:21] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
[07:18:55] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[07:23:21] <wikibugs>	 (03PS1) 10Brouberol: dumpwikibasejson: ensure the dump script exists after any error [dumps] - 10https://gerrit.wikimedia.org/r/1170459 (https://phabricator.wikimedia.org/T399077)
[07:25:13] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[07:25:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1229 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79355 and previous config saved to /var/cache/conftool/dbconfig/20250718-072520-root.json
[07:25:56] <wikibugs>	 (03PS2) 10Brouberol: dumpwikibase: ensure the dump script exists after any error [dumps] - 10https://gerrit.wikimedia.org/r/1170459 (https://phabricator.wikimedia.org/T399077)
[07:27:55] <wikibugs>	 (03PS1) 10Elukey: DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170462
[07:28:09] <wikibugs>	 (03PS3) 10Brouberol: dumpwikibase: ensure the dump script exists after any error [dumps] - 10https://gerrit.wikimedia.org/r/1170459 (https://phabricator.wikimedia.org/T399077)
[07:29:20] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
[07:29:34] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1012.eqiad.wmnet with OS bookworm
[07:30:30] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] START helmfile.d/admin 'sync'.
[07:31:40] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'sync'.
[07:33:13] <wikibugs>	 (03PS2) 10Elukey: DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170462
[07:33:41] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
[07:34:08] <wikibugs>	 (03CR) 10Btullis: [C:03+1] dumpwikibase: ensure the dump script exists after any error [dumps] - 10https://gerrit.wikimedia.org/r/1170459 (https://phabricator.wikimedia.org/T399077) (owner: 10Brouberol)
[07:34:17] <logmsgbot>	 !log elukey@deploy1003 helmfile [eqiad] START helmfile.d/admin 'sync'.
[07:34:31] <logmsgbot>	 !log elukey@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'sync'.
[07:34:58] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] START helmfile.d/services/kartotherian: sync
[07:35:23] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: Transition codfw data persistence external storage (es) hosts to 10G - https://phabricator.wikimedia.org/T399927#11015926 (10ayounsi) According to https://netbox.wikimedia.org/dcim/devices/?q=es20 es2020 to es2025 are now offline. es2026 to es2034 are almost 5 years old...
[07:37:30] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: Transition codfw data persistence external storage (es) hosts to 10G - https://phabricator.wikimedia.org/T399927#11015929 (10Marostegui) >>! In T399927#11015926, @ayounsi wrote: > According to https://netbox.wikimedia.org/dcim/devices/?q=es20 > es2020 to es2025 are now...
[07:37:41] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: Transition codfw data persistence external storage (es) hosts to 10G - https://phabricator.wikimedia.org/T399927#11015930 (10Marostegui)
[07:39:31] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170462 (owner: 10Elukey)
[07:39:55] <wikibugs>	 (03CR) 10Stevemunene: [C:03+1] "lgtm" [dumps] - 10https://gerrit.wikimedia.org/r/1170459 (https://phabricator.wikimedia.org/T399077) (owner: 10Brouberol)
[07:40:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[07:40:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1229 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79356 and previous config saved to /var/cache/conftool/dbconfig/20250718-074026-root.json
[07:45:02] <logmsgbot>	 !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
[07:45:08] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
[07:45:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[07:46:43] <wikibugs>	 (03PS1) 10Elukey: DNM - test for ML hosts t Change-Id: I8ff264ae5b395b0147d60015599859769ccfb9bd [cookbooks] - 10https://gerrit.wikimedia.org/r/1170463
[07:48:14] <wikibugs>	 (03PS2) 10Elukey: DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170463
[07:49:20] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
[07:49:35] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1012.eqiad.wmnet with OS bookworm
[07:50:10] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[07:50:36] <wikibugs>	 (03PS3) 10Elukey: DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170463
[07:51:03] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
[07:51:53] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/6 UP : OSPFv3: 5/5 UP : 6 v2 P2P interfaces vs. 5 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:52:38] <wikibugs>	 (03PS3) 10Arthur taylor: Enable wbui2025 mobile user interface on Wikidata Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170304 (https://phabricator.wikimedia.org/T399703)
[07:52:49] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:55:10] <jinxer-wm>	 FIRING: [5x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[07:55:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1229 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79357 and previous config saved to /var/cache/conftool/dbconfig/20250718-075532-root.json
[07:56:30] <logmsgbot>	 !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
[07:58:03] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#11015965 (10elukey) I have realized that the above DHCP response during UEFI wasn't correct (`/srv/tftpboot/bookworm-installer/pxelinux.0`), and I got why - in the Spic...
[07:58:36] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170463 (owner: 10Elukey)
[07:59:44] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] dumpwikibase: ensure the dump script exists after any error [dumps] - 10https://gerrit.wikimedia.org/r/1170459 (https://phabricator.wikimedia.org/T399077) (owner: 10Brouberol)
[08:00:10] <jinxer-wm>	 RESOLVED: [5x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[08:02:00] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[08:07:46] <wikibugs>	 (03PS1) 10Vgutierrez: acme_chief: Delete empty directories after pruning expired certs [puppet] - 10https://gerrit.wikimedia.org/r/1170497 (https://phabricator.wikimedia.org/T399419)
[08:09:16] <wikibugs>	 (03PS2) 10Tiziano Fogli: prom/metamonitor: hide DeadManSwitch alerts in Karma [puppet] - 10https://gerrit.wikimedia.org/r/1170360 (https://phabricator.wikimedia.org/T397003)
[08:11:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1233 (T399249)', diff saved to https://phabricator.wikimedia.org/P79358 and previous config saved to /var/cache/conftool/dbconfig/20250718-081114-marostegui.json
[08:11:19] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[08:16:03] <wikibugs>	 (03CR) 10Tiziano Fogli: "Patch ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/1170360 (https://phabricator.wikimedia.org/T397003) (owner: 10Tiziano Fogli)
[08:17:09] <wikibugs>	 06SRE-OnFire, 10Cloud-VPS, 10cloud-services-team (FY2025/26-Q1), 10Sustainability (Incident Followup): Cloud Ceph misbehaving on Debian Bookworm - https://phabricator.wikimedia.org/T399858#11016022 (10fnegri) 05Open→03In progress a:03fnegri Memory usage on cloudcephosd1006 did reset at 18:00 UTC yest...
[08:20:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[08:20:55] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[08:23:30] <wikibugs>	 (03CR) 10Stevemunene: [C:03+1] "lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/1170279 (https://phabricator.wikimedia.org/T399778) (owner: 10Brouberol)
[08:24:53] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] site: assign the insetup::data_platform_ferm role to dse-k8s-worker1014 [puppet] - 10https://gerrit.wikimedia.org/r/1170279 (https://phabricator.wikimedia.org/T399778) (owner: 10Brouberol)
[08:26:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P79359 and previous config saved to /var/cache/conftool/dbconfig/20250718-082621-marostegui.json
[08:32:35] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:34:25] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-esams:xe-0/1/2 (inter.link reserved port) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[08:34:54] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2205 to s3 master [puppet] - 10https://gerrit.wikimedia.org/r/1170499 (https://phabricator.wikimedia.org/T399930)
[08:35:14] <wikibugs>	 14SRE-Sprint-Week-Sustainability-March2023, 06Data-Persistence-Automations, 06DBA, 13Patch-For-Review, 10Sustainability (Incident Followup): Implement (or refactor) a script to move slaves when the master is not available - https://phabricator.wikimedia.org/T196366#11016063 (10FCeratto-WMF)
[08:35:35] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] START helmfile.d/services/kartotherian: sync
[08:36:43] <wikibugs>	 (03CR) 10Kevin Bazira: "Great. Thank you for the clarification." [alerts] - 10https://gerrit.wikimedia.org/r/1170107 (https://phabricator.wikimedia.org/T399683) (owner: 10Kevin Bazira)
[08:37:08] <logmsgbot>	 elukey@cumin1003 provision (PID 2216138) is awaiting input
[08:37:13] <wikibugs>	 (03PS1) 10Marostegui: db1189: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1170500 (https://phabricator.wikimedia.org/T399548)
[08:37:42] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1189: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1170500 (https://phabricator.wikimedia.org/T399548) (owner: 10Marostegui)
[08:38:01] <wikibugs>	 (03Abandoned) 10Kevin Bazira: team-ml: use global deploy tag for ORESFetchScoreJobKafkaLag alert [alerts] - 10https://gerrit.wikimedia.org/r/1170109 (https://phabricator.wikimedia.org/T399683) (owner: 10Kevin Bazira)
[08:38:28] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1189.eqiad.wmnet with reason: Maintenance
[08:38:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1189 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79360 and previous config saved to /var/cache/conftool/dbconfig/20250718-083831-marostegui.json
[08:41:07] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:41:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[08:41:30] <wikibugs>	 (03PS1) 10Marostegui: installserver: Do not format es1047 [puppet] - 10https://gerrit.wikimedia.org/r/1170502
[08:41:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P79361 and previous config saved to /var/cache/conftool/dbconfig/20250718-084129-marostegui.json
[08:42:55] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[08:43:09] <wikibugs>	 (03CR) 10Marostegui: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1170502 (owner: 10Marostegui)
[08:43:51] <wikibugs>	 (03PS9) 10Elukey: WIP: sre.hosts.provision: add custom settings for Supermicro [cookbooks] - 10https://gerrit.wikimedia.org/r/1170085 (https://phabricator.wikimedia.org/T394357)
[08:43:51] <wikibugs>	 (03PS4) 10Elukey: DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170463
[08:44:34] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] installserver: Do not format es1047 [puppet] - 10https://gerrit.wikimedia.org/r/1170502 (owner: 10Marostegui)
[08:44:44] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:45:45] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
[08:46:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[08:46:25] <wikibugs>	 (03PS1) 10Marostegui: es1048: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1170503 (https://phabricator.wikimedia.org/T395771)
[08:46:30] <elukey>	 !log elukey@kafkamon2003:~$ sudo systemctl restart burrow-main-codfw.service
[08:46:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:38] <wikibugs>	 (03PS4) 10Ayounsi: Ganeti Bird BGP [puppet] - 10https://gerrit.wikimedia.org/r/1169662 (https://phabricator.wikimedia.org/T362392)
[08:47:14] <wikibugs>	 (03CR) 10Ayounsi: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1169662 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[08:47:24] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] es1048: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1170503 (https://phabricator.wikimedia.org/T395771) (owner: 10Marostegui)
[08:48:29] <wikibugs>	 (03CR) 10Jaime Nuche: "Thanks for this Daniel!" [puppet] - 10https://gerrit.wikimedia.org/r/1137818 (https://phabricator.wikimedia.org/T377889) (owner: 10Dzahn)
[08:48:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79362 and previous config saved to /var/cache/conftool/dbconfig/20250718-084853-root.json
[08:49:30] <wikibugs>	 (03PS2) 10Arnaudb: miscweb: wikiworkshop use httpd [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170464 (https://phabricator.wikimedia.org/T398303)
[08:49:30] <wikibugs>	 (03CR) 10Arnaudb: "all tags have been checked and are pullable" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170464 (https://phabricator.wikimedia.org/T398303) (owner: 10Arnaudb)
[08:49:47] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: BGP: Support receipt of graceful-shutdown community and set local-pref - https://phabricator.wikimedia.org/T399931 (10cmooney) 03NEW p:05Triage→03Low
[08:50:36] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170463 (owner: 10Elukey)
[08:54:01] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add es1048 [puppet] - 10https://gerrit.wikimedia.org/r/1170504 (https://phabricator.wikimedia.org/T395771)
[08:54:53] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances.yaml: Add es1048 [puppet] - 10https://gerrit.wikimedia.org/r/1170504 (https://phabricator.wikimedia.org/T395771) (owner: 10Marostegui)
[08:54:53] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: eqsin purged consumers lag - https://phabricator.wikimedia.org/T399221#11016109 (10cmooney) All looks clean overnight with this, I have confirmed to Arelion they can close their ticket and we will re-open if the same thing happens ag...
[08:55:02] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: BGP: Support receipt of graceful-shutdown community and set local-pref - https://phabricator.wikimedia.org/T399931#11016110 (10ayounsi) Makes sens!
[08:55:24] <logmsgbot>	 !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:56:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Add es1048 to es7 depooled T395771', diff saved to https://phabricator.wikimedia.org/P79363 and previous config saved to /var/cache/conftool/dbconfig/20250718-085652-marostegui.json
[08:56:57] <stashbot>	 T395771: Productionize es2047, es2048, es1047, es1048 - https://phabricator.wikimedia.org/T395771
[08:57:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1233 (T399249)', diff saved to https://phabricator.wikimedia.org/P79364 and previous config saved to /var/cache/conftool/dbconfig/20250718-085704-marostegui.json
[08:57:08] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[08:57:19] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:57:19] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
[08:57:22] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:57:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Pool es1048 with 1% weight on es7 T395771', diff saved to https://phabricator.wikimedia.org/P79365 and previous config saved to /var/cache/conftool/dbconfig/20250718-085755-marostegui.json
[09:03:59] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79366 and previous config saved to /var/cache/conftool/dbconfig/20250718-090358-root.json
[09:04:31] <wikibugs>	 (03PS1) 10Brouberol: deployment_server: group chown airflow-wmde kubeconfig files to airflow-deployers [puppet] - 10https://gerrit.wikimedia.org/r/1170508 (https://phabricator.wikimedia.org/T399066)
[09:04:58] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Q4:rack/setup/install sretest2009 - https://phabricator.wikimedia.org/T396365#11016136 (10elukey) @Jhancock.wm I managed to make provision working, the new settings are not yet merged so if you have other similar hosts ping me first :)  The issue with the passwords/accounts is a...
[09:04:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] deployment_server: group chown airflow-wmde kubeconfig files to airflow-deployers [puppet] - 10https://gerrit.wikimedia.org/r/1170508 (https://phabricator.wikimedia.org/T399066) (owner: 10Brouberol)
[09:05:50] <wikibugs>	 (03PS2) 10Brouberol: deployment_server: group chown airflow-wmde kubeconfig files to airflow-deployers [puppet] - 10https://gerrit.wikimedia.org/r/1170508 (https://phabricator.wikimedia.org/T399066)
[09:06:18] <wikibugs>	 (03PS3) 10Brouberol: deployment_server: group chown airflow-wmde kubeconfigs to airflow-deployers [puppet] - 10https://gerrit.wikimedia.org/r/1170508 (https://phabricator.wikimedia.org/T399066)
[09:06:18] <wikibugs>	 (03CR) 10CI reject: [V:04-1] deployment_server: group chown airflow-wmde kubeconfigs to airflow-deployers [puppet] - 10https://gerrit.wikimedia.org/r/1170508 (https://phabricator.wikimedia.org/T399066) (owner: 10Brouberol)
[09:08:37] <wikibugs>	 (03CR) 10Btullis: [C:03+1] deployment_server: group chown airflow-wmde kubeconfigs to airflow-deployers [puppet] - 10https://gerrit.wikimedia.org/r/1170508 (https://phabricator.wikimedia.org/T399066) (owner: 10Brouberol)
[09:11:35] <wikibugs>	 (03CR) 10Brouberol: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6303/co" [puppet] - 10https://gerrit.wikimedia.org/r/1170508 (https://phabricator.wikimedia.org/T399066) (owner: 10Brouberol)
[09:15:37] <wikibugs>	 (03CR) 10Arnaudb: "some questions inline. Otherwise lgtm, we'll have to be extra careful when we'll sunset `gerrit2`, this patch is another step in that dire" [puppet] - 10https://gerrit.wikimedia.org/r/1170433 (https://phabricator.wikimedia.org/T387833) (owner: 10Dzahn)
[09:18:41] <wikibugs>	 (03PS3) 10Arnaudb: miscweb: re-use httpd base image on miscweb [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170464 (https://phabricator.wikimedia.org/T398303)
[09:19:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79367 and previous config saved to /var/cache/conftool/dbconfig/20250718-091904-root.json
[09:20:46] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm, thank you!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170464 (https://phabricator.wikimedia.org/T398303) (owner: 10Arnaudb)
[09:23:52] <wikibugs>	 (03CR) 10Ayounsi: "Nop, it was a typo, problem solved." [puppet] - 10https://gerrit.wikimedia.org/r/1169662 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[09:24:52] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[09:25:52] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[09:26:16] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] miscweb: re-use httpd base image on miscweb [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170464 (https://phabricator.wikimedia.org/T398303) (owner: 10Arnaudb)
[09:27:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[09:28:47] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb: re-use httpd base image on miscweb [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170464 (https://phabricator.wikimedia.org/T398303) (owner: 10Arnaudb)
[09:28:57] <wikibugs>	 (03PS1) 10Cathal Mooney: BGP Policy: Set local-pref to zero on receipt of gshut community [homer/public] - 10https://gerrit.wikimedia.org/r/1170509 (https://phabricator.wikimedia.org/T399931)
[09:30:00] <wikibugs>	 (03PS2) 10Cathal Mooney: BGP Policy: Set local-pref to zero on receipt of gshut community [homer/public] - 10https://gerrit.wikimedia.org/r/1170509 (https://phabricator.wikimedia.org/T399931)
[09:30:30] <wikibugs>	 (03PS1) 10Btullis: Add the wikitech dump script [dumps] - 10https://gerrit.wikimedia.org/r/1170510 (https://phabricator.wikimedia.org/T398968)
[09:30:54] <wikibugs>	 (03PS5) 10Ayounsi: Ganeti Bird BGP [puppet] - 10https://gerrit.wikimedia.org/r/1169662 (https://phabricator.wikimedia.org/T362392)
[09:32:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[09:33:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: BGP: Support receipt of graceful-shutdown community and set local-pref - https://phabricator.wikimedia.org/T399931#11016186 (10cmooney)
[09:34:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79369 and previous config saved to /var/cache/conftool/dbconfig/20250718-093410-root.json
[09:34:28] <wikibugs>	 (03Abandoned) 10Cathal Mooney: Rename YAML var "evpn_bgp" to "switch_ibgp" [homer/public] - 10https://gerrit.wikimedia.org/r/1122208 (https://phabricator.wikimedia.org/T371088) (owner: 10Cathal Mooney)
[09:35:02] <logmsgbot>	 !log arnaudb@deploy1003 helmfile [staging] START helmfile.d/services/miscweb: apply
[09:36:41] <logmsgbot>	 !log arnaudb@deploy1003 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[09:36:55] <Mvolz>	 Does anyone mind if I do a services (citoid) deploy? Weird spike in 503s in pyrra/thanos not due to any code change and I think I found the problem locally. 
[09:39:13] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Arelion IC-374549 100G Transport outage (cr1-codfw -> cr1-eqiad) July 2025 - https://phabricator.wikimedia.org/T399097#11016195 (10cmooney) No update from Arelion, asked them to advise on the situation.
[09:39:16] <logmsgbot>	 !log arnaudb@deploy1003 helmfile [codfw] START helmfile.d/services/miscweb: apply
[09:39:28] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Arelion IC-374549 100G Transport outage (cr1-codfw -> cr1-eqiad) July 2025 - https://phabricator.wikimedia.org/T399097#11016196 (10cmooney) a:03cmooney
[09:41:00] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
[09:41:06] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[09:41:22] <logmsgbot>	 !log arnaudb@deploy1003 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[09:43:22] <logmsgbot>	 !log arnaudb@deploy1003 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[09:45:16] <logmsgbot>	 !log arnaudb@deploy1003 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[09:46:25] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[09:46:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[09:51:25] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[09:53:30] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] prom/metamonitor: hide DeadManSwitch alerts in Karma [puppet] - 10https://gerrit.wikimedia.org/r/1170360 (https://phabricator.wikimedia.org/T397003) (owner: 10Tiziano Fogli)
[09:56:25] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[09:59:31] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1254.eqiad.wmnet with reason: Maintenance
[09:59:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1254 (T399249)', diff saved to https://phabricator.wikimedia.org/P79370 and previous config saved to /var/cache/conftool/dbconfig/20250718-095938-marostegui.json
[09:59:42] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[10:00:04] <wikibugs>	 (03PS1) 10Stevemunene: dse-k8s: bootstrap dse-k8s-codefw cluster [puppet] - 10https://gerrit.wikimedia.org/r/1170514 (https://phabricator.wikimedia.org/T397293)
[10:00:04] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM nice work!" [puppet] - 10https://gerrit.wikimedia.org/r/1169662 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[10:02:20] <wikibugs>	 (03PS2) 10Stevemunene: dse-k8s: bootstrap dse-k8s-codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/1170514 (https://phabricator.wikimedia.org/T397293)
[10:14:49] <wikibugs>	 (03PS1) 10Jelto: miscweb: update miscweb images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170515 (https://phabricator.wikimedia.org/T398303)
[10:14:52] <wikibugs>	 (03PS1) 10Jelto: miscweb: update miscweb design images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170516 (https://phabricator.wikimedia.org/T398303)
[10:24:07] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170390 (owner: 10PipelineBot)
[10:25:52] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170390 (owner: 10PipelineBot)
[10:35:10] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] START helmfile.d/services/citoid: apply
[10:35:40] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] DONE helmfile.d/services/citoid: apply
[10:35:58] <logmsgbot>	 !log mvolz@deploy1003 helmfile [codfw] START helmfile.d/services/citoid: apply
[10:36:23] <logmsgbot>	 !log mvolz@deploy1003 helmfile [codfw] DONE helmfile.d/services/citoid: apply
[10:37:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[10:37:13] <logmsgbot>	 !log mvolz@deploy1003 helmfile [eqiad] START helmfile.d/services/citoid: apply
[10:37:39] <logmsgbot>	 !log mvolz@deploy1003 helmfile [eqiad] DONE helmfile.d/services/citoid: apply
[10:39:04] <jinxer-wm>	 FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on wdqs1022:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[10:39:35] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Arelion IC-374549 100G Transport outage (cr1-codfw -> cr1-eqiad) July 2025 - https://phabricator.wikimedia.org/T399097#11016357 (10cmooney) ` 2025-07-18 10:24  Apologies for the inconveniences,  Please be informed that investigation is ongoing with our senior engineer and be res...
[10:42:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[10:47:10] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[10:51:22] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good to me." [dns] - 10https://gerrit.wikimedia.org/r/1170364 (https://phabricator.wikimedia.org/T397293) (owner: 10Stevemunene)
[10:51:45] <wikibugs>	 (03CR) 10Btullis: [C:03+1] dse-k8s: bootstrap dse-k8s-codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/1170514 (https://phabricator.wikimedia.org/T397293) (owner: 10Stevemunene)
[10:52:10] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[10:54:12] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[10:55:53] <wikibugs>	 (03PS3) 10Cathal Mooney: BGP Policy: Set local-pref to zero on receipt of gshut community [homer/public] - 10https://gerrit.wikimedia.org/r/1170509 (https://phabricator.wikimedia.org/T399931)
[10:56:45] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[11:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250718T0700)
[11:00:05] <jouncebot>	 jelto, arnoldokoth, and mutante: GitLab version upgrades (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250718T1100). Please do the needful.
[11:00:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1254 (T399249)', diff saved to https://phabricator.wikimedia.org/P79371 and previous config saved to /var/cache/conftool/dbconfig/20250718-110033-marostegui.json
[11:00:38] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[11:03:15] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[11:03:30] <wikibugs>	 (03CR) 10Cathal Mooney: "Hey Sukhbir thanks for checking.  Let's hold off for now I need to review it again, I believe this only covers the case when no '--generat" [dns] - 10https://gerrit.wikimedia.org/r/1164124 (https://phabricator.wikimedia.org/T362985) (owner: 10Slyngshede)
[11:07:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[11:07:55] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[11:09:25] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[11:10:59] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] dns: Add dse-k8s codfw SRV records [dns] - 10https://gerrit.wikimedia.org/r/1170364 (https://phabricator.wikimedia.org/T397293) (owner: 10Stevemunene)
[11:13:04] <logmsgbot>	 !log stevemunene@dns1004 START - running authdns-update
[11:14:09] <logmsgbot>	 !log stevemunene@dns1004 END - running authdns-update
[11:15:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P79372 and previous config saved to /var/cache/conftool/dbconfig/20250718-111541-marostegui.json
[11:23:01] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] dse-k8s: bootstrap dse-k8s-codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/1170514 (https://phabricator.wikimedia.org/T397293) (owner: 10Stevemunene)
[11:27:18] <wikibugs>	 (03PS6) 10Cathal Mooney: Capirca: handle script having no 'status' attribute gracefully [software/homer] - 10https://gerrit.wikimedia.org/r/1166373
[11:30:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P79373 and previous config saved to /var/cache/conftool/dbconfig/20250718-113048-marostegui.json
[11:35:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[11:38:31] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Capirca: handle script having no 'status' attribute gracefully [software/homer] - 10https://gerrit.wikimedia.org/r/1166373 (owner: 10Cathal Mooney)
[11:39:31] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170447 (https://phabricator.wikimedia.org/T363336) (owner: 10Kevin Bazira)
[11:39:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1048 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P79374 and previous config saved to /var/cache/conftool/dbconfig/20250718-113933-root.json
[11:40:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[11:41:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[11:42:43] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.parsercache
[11:42:51] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
[11:43:22] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2017.codfw.wmnet,pc1017.eqiad.wmnet with reason: Maintenance
[11:43:24] <marostegui>	 !log Restart pc7 T399540
[11:43:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:43:28] <stashbot>	 T399540: Upgrade masters to 10.6.22 and 10.11.13 .2 update - https://phabricator.wikimedia.org/T399540
[11:45:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1254 (T399249)', diff saved to https://phabricator.wikimedia.org/P79376 and previous config saved to /var/cache/conftool/dbconfig/20250718-114555-marostegui.json
[11:46:00] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[11:46:10] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[11:46:11] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1259.eqiad.wmnet with reason: Maintenance
[11:46:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1259 (T399249)', diff saved to https://phabricator.wikimedia.org/P79377 and previous config saved to /var/cache/conftool/dbconfig/20250718-114618-marostegui.json
[11:48:01] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[11:49:20] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.parsercache
[11:49:45] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
[11:49:57] <jinxer-wm>	 FIRING: ProbeDown: Service ncredir-https:443 has failed probes (http_ncredir-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#ncredir-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:50:25] <jynus>	 here
[11:50:31] <elukey>	 here as well
[11:50:36] <wikibugs>	 (03PS1) 10Cathal Mooney: sre.hosts.decommision: remove virtual interfaces from during decom [cookbooks] - 10https://gerrit.wikimedia.org/r/1170530 (https://phabricator.wikimedia.org/T398412)
[11:50:55] <jynus>	 I acked it, looking if there was maintenance or something
[11:51:10] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[11:51:18] <jynus>	 eqsin is not pooled, right?
[11:51:27] <topranks>	 it was repooled last night 
[11:51:31] <jynus>	 oh
[11:51:59] <jynus>	 it is failing on codfw too
[11:54:20] <elukey>	 I do see the following on the logs
[11:54:23] <elukey>	 2025/07/18 11:53:26 [alert] 3856385#3856385: 768 worker_connections are not enough
[11:54:31] <elukey>	 this on 5001
[11:54:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1048 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P79379 and previous config saved to /var/cache/conftool/dbconfig/20250718-115440-root.json
[11:54:57] <jinxer-wm>	 RESOLVED: ProbeDown: Service ncredir-https:443 has failed probes (http_ncredir-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#ncredir-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:55:17] <elukey>	 what
[11:56:26] <jynus>	 its the different dcs, I think
[11:56:54] <jynus>	 ah, sorry, I didn't see it was a resolution
[11:56:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.hosts.decommision: remove virtual interfaces from during decom [cookbooks] - 10https://gerrit.wikimedia.org/r/1170530 (https://phabricator.wikimedia.org/T398412) (owner: 10Cathal Mooney)
[12:01:13] <wikibugs>	 (03CR) 10Jelto: [C:03+2] miscweb: update miscweb design images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170516 (https://phabricator.wikimedia.org/T398303) (owner: 10Jelto)
[12:01:16] <wikibugs>	 (03CR) 10Jelto: [C:03+2] miscweb: update miscweb images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170515 (https://phabricator.wikimedia.org/T398303) (owner: 10Jelto)
[12:01:27] <wikibugs>	 10ops-codfw, 06DC-Ops: Unresponsive management for cirrussearch2089.mgmt:22 - https://phabricator.wikimedia.org/T399943 (10phaultfinder) 03NEW
[12:01:36] <wikibugs>	 (03PS2) 10Cathal Mooney: sre.hosts.decommision: remove virtual interfaces from during decom [cookbooks] - 10https://gerrit.wikimedia.org/r/1170530 (https://phabricator.wikimedia.org/T398412)
[12:03:12] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb: update miscweb images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170515 (https://phabricator.wikimedia.org/T398303) (owner: 10Jelto)
[12:03:18] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb: update miscweb design images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170516 (https://phabricator.wikimedia.org/T398303) (owner: 10Jelto)
[12:03:51] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] miscweb: update miscweb images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170515 (https://phabricator.wikimedia.org/T398303) (owner: 10Jelto)
[12:04:04] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] miscweb: update miscweb design images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170516 (https://phabricator.wikimedia.org/T398303) (owner: 10Jelto)
[12:05:59] <logmsgbot>	 !log jelto@deploy1003 helmfile [staging] START helmfile.d/services/miscweb: apply
[12:07:12] <logmsgbot>	 !log jelto@deploy1003 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[12:08:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[12:08:10] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.hosts.decommision: remove virtual interfaces from during decom [cookbooks] - 10https://gerrit.wikimedia.org/r/1170530 (https://phabricator.wikimedia.org/T398412) (owner: 10Cathal Mooney)
[12:08:23] <logmsgbot>	 !log jelto@deploy1003 helmfile [codfw] START helmfile.d/services/miscweb: apply
[12:09:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1048 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P79380 and previous config saved to /var/cache/conftool/dbconfig/20250718-120946-root.json
[12:09:57] <logmsgbot>	 !log jelto@deploy1003 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[12:10:48] <logmsgbot>	 !log jelto@deploy1003 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[12:12:31] <logmsgbot>	 !log jelto@deploy1003 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[12:13:10] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr1-codfw and 208.80.153.220 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[12:13:22] <wikibugs>	 (03CR) 10DDesouza: [V:03+1 C:03+1] miscweb: update miscweb design images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170516 (https://phabricator.wikimedia.org/T398303) (owner: 10Jelto)
[12:17:02] <wikibugs>	 (03PS1) 10Marostegui: db1198: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1170534 (https://phabricator.wikimedia.org/T399548)
[12:18:11] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1198: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1170534 (https://phabricator.wikimedia.org/T399548) (owner: 10Marostegui)
[12:18:58] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1198.eqiad.wmnet with reason: Maintenance
[12:19:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1198 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79381 and previous config saved to /var/cache/conftool/dbconfig/20250718-121901-marostegui.json
[12:22:29] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] prom/metamonitor: hide DeadManSwitch alerts in Karma (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1170360 (https://phabricator.wikimedia.org/T397003) (owner: 10Tiziano Fogli)
[12:23:25] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Disable paging for ncredir-https [puppet] - 10https://gerrit.wikimedia.org/r/1170536
[12:24:04] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: row C/D switch refresh - https://phabricator.wikimedia.org/T396063#11016669 (10Jclark-ctr) Adjusted Mgmt in Rack D1 , D8 down to place spines in top of rack. Updated netbox and installed Rails
[12:24:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1048 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79382 and previous config saved to /var/cache/conftool/dbconfig/20250718-122452-root.json
[12:26:59] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] hiera: Disable paging for ncredir-https [puppet] - 10https://gerrit.wikimedia.org/r/1170536 (owner: 10Vgutierrez)
[12:27:27] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] hiera: Disable paging for ncredir-https [puppet] - 10https://gerrit.wikimedia.org/r/1170536 (owner: 10Vgutierrez)
[12:29:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79383 and previous config saved to /var/cache/conftool/dbconfig/20250718-122914-root.json
[12:34:25] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-esams:xe-0/1/2 (inter.link reserved port) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[12:35:11] <wikibugs>	 (03PS1) 10Tiziano Fogli: prom/metamonitor: fix typo on karma erb config file [puppet] - 10https://gerrit.wikimedia.org/r/1170540 (https://phabricator.wikimedia.org/T397003)
[12:35:23] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] prom/metamonitor: fix typo on karma erb config file [puppet] - 10https://gerrit.wikimedia.org/r/1170540 (https://phabricator.wikimedia.org/T397003) (owner: 10Tiziano Fogli)
[12:35:35] <wikibugs>	 (03CR) 10Tiziano Fogli: [V:03+2 C:03+2] prom/metamonitor: fix typo on karma erb config file [puppet] - 10https://gerrit.wikimedia.org/r/1170540 (https://phabricator.wikimedia.org/T397003) (owner: 10Tiziano Fogli)
[12:35:57] <wikibugs>	 (03PS1) 10Marostegui: installserver: Do not format es1048 [puppet] - 10https://gerrit.wikimedia.org/r/1170541
[12:39:20] <wikibugs>	 (03PS1) 10Cathal Mooney: cephosd: un-set bird bgp neighbors rather than override for each host [puppet] - 10https://gerrit.wikimedia.org/r/1170543
[12:39:50] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] installserver: Do not format es1048 [puppet] - 10https://gerrit.wikimedia.org/r/1170541 (owner: 10Marostegui)
[12:39:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1048 (re)pooling @ 35%: Repooling', diff saved to https://phabricator.wikimedia.org/P79384 and previous config saved to /var/cache/conftool/dbconfig/20250718-123958-root.json
[12:40:25] <wikibugs>	 (03CR) 10Cathal Mooney: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1170543 (owner: 10Cathal Mooney)
[12:41:53] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on backup1007 - https://phabricator.wikimedia.org/T399847#11016724 (10Jclark-ctr) Replaced Failed drive powering up now
[12:42:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on backup1007 - https://phabricator.wikimedia.org/T399847#11016725 (10jcrespo) Thank you!
[12:44:10] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] "nice lgtm!" [homer/public] - 10https://gerrit.wikimedia.org/r/1170509 (https://phabricator.wikimedia.org/T399931) (owner: 10Cathal Mooney)
[12:44:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79385 and previous config saved to /var/cache/conftool/dbconfig/20250718-124419-root.json
[12:44:35] <wikibugs>	 (03PS1) 10Jelto: miscweb: update miscweb images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170544 (https://phabricator.wikimedia.org/T398303)
[12:45:11] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on backup1007 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T399948 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[12:45:15] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on backup1007 - https://phabricator.wikimedia.org/T399948 (10ops-monitoring-bot) 03NEW
[12:45:42] <wikibugs>	 (03PS1) 10Tiziano Fogli: prom/metamonitor: fix indentation on karma erb config file [puppet] - 10https://gerrit.wikimedia.org/r/1170545 (https://phabricator.wikimedia.org/T397003)
[12:46:30] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] prom/metamonitor: fix indentation on karma erb config file [puppet] - 10https://gerrit.wikimedia.org/r/1170545 (https://phabricator.wikimedia.org/T397003) (owner: 10Tiziano Fogli)
[12:48:15] <wikibugs>	 (03CR) 10Ayounsi: "overall lgtm, I'd suggest to test it on a sretest hosts on the prod instance with test-cookbook." [cookbooks] - 10https://gerrit.wikimedia.org/r/1170530 (https://phabricator.wikimedia.org/T398412) (owner: 10Cathal Mooney)
[12:49:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1259 (T399249)', diff saved to https://phabricator.wikimedia.org/P79386 and previous config saved to /var/cache/conftool/dbconfig/20250718-124901-marostegui.json
[12:49:06] <wikibugs>	 (03CR) 10Cathal Mooney: "OK yep, I guess I can add some virtual ints and stuff to sretest or mess with it without risking too much - good idea!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1170530 (https://phabricator.wikimedia.org/T398412) (owner: 10Cathal Mooney)
[12:49:08] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[12:49:38] <wikibugs>	 (03CR) 10Cathal Mooney: "If you've any idea about the CI error I'm all ears.  Too many branches but I don't see an easy way to avoid it here." [cookbooks] - 10https://gerrit.wikimedia.org/r/1170530 (https://phabricator.wikimedia.org/T398412) (owner: 10Cathal Mooney)
[12:51:45] <wikibugs>	 (03CR) 10Jelto: [C:03+2] miscweb: update miscweb images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170544 (https://phabricator.wikimedia.org/T398303) (owner: 10Jelto)
[12:53:57] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb: update miscweb images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170544 (https://phabricator.wikimedia.org/T398303) (owner: 10Jelto)
[12:55:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1048 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79387 and previous config saved to /var/cache/conftool/dbconfig/20250718-125504-root.json
[12:55:08] <logmsgbot>	 !log jelto@deploy1003 helmfile [staging] START helmfile.d/services/miscweb: apply
[12:55:25] <logmsgbot>	 !log jelto@deploy1003 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[12:56:00] <logmsgbot>	 !log jelto@deploy1003 helmfile [codfw] START helmfile.d/services/miscweb: apply
[12:56:01] <wikibugs>	 (03PS2) 10Cathal Mooney: cephosd: un-set bird bgp neighbors rather than override for each host [puppet] - 10https://gerrit.wikimedia.org/r/1170543
[12:56:22] <logmsgbot>	 !log jelto@deploy1003 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[12:56:52] <logmsgbot>	 !log jelto@deploy1003 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[12:57:05] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on backup1007 - https://phabricator.wikimedia.org/T399847#11016776 (10jcrespo) I'm afraid the new disk has not been detected: {F65180194} (it is not out of order, either)  We are still running in degraded mode (with 1 less disk).
[12:57:12] <logmsgbot>	 !log jelto@deploy1003 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[12:57:38] <wikibugs>	 (03PS1) 10C. Scott Ananian: Enable the "Report Visual Bug" feature of Extension:ParserMigration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170549 (https://phabricator.wikimedia.org/T365371)
[12:57:57] <wikibugs>	 (03PS1) 10Elukey: pyrra: simplify multi-dc handling for istio SLOs [puppet] - 10https://gerrit.wikimedia.org/r/1170550 (https://phabricator.wikimedia.org/T398534)
[12:58:17] <wikibugs>	 (03CR) 10Cathal Mooney: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1170543 (owner: 10Cathal Mooney)
[12:58:17] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
[12:58:26] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Enable the "Report Visual Bug" feature of Extension:ParserMigration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170549 (https://phabricator.wikimedia.org/T365371) (owner: 10C. Scott Ananian)
[12:58:58] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
[12:59:03] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6306/co" [puppet] - 10https://gerrit.wikimedia.org/r/1170550 (https://phabricator.wikimedia.org/T398534) (owner: 10Elukey)
[12:59:06] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
[12:59:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79388 and previous config saved to /var/cache/conftool/dbconfig/20250718-125925-root.json
[12:59:40] <logmsgbot>	 !log jelto@deploy1003 helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
[13:00:43] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] pyrra: simplify multi-dc handling for istio SLOs [puppet] - 10https://gerrit.wikimedia.org/r/1170550 (https://phabricator.wikimedia.org/T398534) (owner: 10Elukey)
[13:02:11] <wikibugs>	 (03CR) 10Ayounsi: "`# pylint disable=too-many-branches` :) If volans is ok of course. Otherwise we would need to refactor and split some processing in their " [cookbooks] - 10https://gerrit.wikimedia.org/r/1170530 (https://phabricator.wikimedia.org/T398412) (owner: 10Cathal Mooney)
[13:02:16] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[13:02:18] <wikibugs>	 (03CR) 10Elukey: [V:03+1 C:03+2] pyrra: simplify multi-dc handling for istio SLOs [puppet] - 10https://gerrit.wikimedia.org/r/1170550 (https://phabricator.wikimedia.org/T398534) (owner: 10Elukey)
[13:02:30] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[13:04:10] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P79389 and previous config saved to /var/cache/conftool/dbconfig/20250718-130410-marostegui.json
[13:04:18] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on backup1007 - https://phabricator.wikimedia.org/T399847#11016820 (10jcrespo)
[13:04:22] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on backup1007 - https://phabricator.wikimedia.org/T399948#11016822 (10jcrespo) →14Duplicate dup:03T399847
[13:05:07] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
[13:05:21] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[13:05:56] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:07:08] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:07:50] <wikibugs>	 (03PS1) 10Marostegui: db1212: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1170551 (https://phabricator.wikimedia.org/T399548)
[13:08:22] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] acme_chief: Delete empty directories after pruning expired certs [puppet] - 10https://gerrit.wikimedia.org/r/1170497 (https://phabricator.wikimedia.org/T399419) (owner: 10Vgutierrez)
[13:09:40] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:09:56] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:10:10] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1048 (re)pooling @ 65%: Repooling', diff saved to https://phabricator.wikimedia.org/P79390 and previous config saved to /var/cache/conftool/dbconfig/20250718-131009-root.json
[13:11:05] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum - https://phabricator.wikimedia.org/T398650#11016853 (10ssingh) @aranyap: It seems like the group membership has been updated. Can you please try again? Thanks!
[13:12:19] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:12:21] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
[13:12:26] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[13:12:30] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:14:13] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:14:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79391 and previous config saved to /var/cache/conftool/dbconfig/20250718-131431-root.json
[13:15:03] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1212: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1170551 (https://phabricator.wikimedia.org/T399548) (owner: 10Marostegui)
[13:15:06] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 10 hosts with reason: Maintenance
[13:15:25] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:15:51] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1212.eqiad.wmnet with reason: Maintenance
[13:15:55] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1212 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79392 and previous config saved to /var/cache/conftool/dbconfig/20250718-131554-marostegui.json
[13:16:34] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:17:02] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:17:56] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:18:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:19:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P79393 and previous config saved to /var/cache/conftool/dbconfig/20250718-131917-marostegui.json
[13:21:32] <wikibugs>	 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 10LDAP-Access-Requests: Offboard Noarave from WMF systems - https://phabricator.wikimedia.org/T399953 (10karapayneWMDE) 03NEW
[13:22:51] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on backup1007 - https://phabricator.wikimedia.org/T399847#11016884 (10jcrespo) I will put the server back into service so the service is not down during during the weekend and figure out a way to resolve this next week.
[13:23:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:23:45] <logmsgbot>	 !log jynus@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on backup1007.eqiad.wmnet with reason: failed disk
[13:23:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on backup1007 - https://phabricator.wikimedia.org/T399847#11016885 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=89313787-9150-425e-afd0-6f2bea491334) set by jynus@cumin1003 for 3 days, 0:00:00 on 1 host(s) and their services with reason: fail...
[13:24:16] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Add the wikitech dump script [dumps] - 10https://gerrit.wikimedia.org/r/1170510 (https://phabricator.wikimedia.org/T398968) (owner: 10Btullis)
[13:24:40] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1189 to s3 master [puppet] - 10https://gerrit.wikimedia.org/r/1170554 (https://phabricator.wikimedia.org/T399954)
[13:24:45] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update s3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1170555 (https://phabricator.wikimedia.org/T399954)
[13:25:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1048 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79394 and previous config saved to /var/cache/conftool/dbconfig/20250718-132515-root.json
[13:26:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79395 and previous config saved to /var/cache/conftool/dbconfig/20250718-132638-root.json
[13:27:46] <wikibugs>	 (03CR) 10Ayounsi: "recheck" [software/homer] - 10https://gerrit.wikimedia.org/r/1166373 (owner: 10Cathal Mooney)
[13:28:52] <wikibugs>	 (03CR) 10DDesouza: [C:03+1] miscweb: update miscweb images to new version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170544 (https://phabricator.wikimedia.org/T398303) (owner: 10Jelto)
[13:29:36] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] BGP Policy: Set local-pref to zero on receipt of gshut community [homer/public] - 10https://gerrit.wikimedia.org/r/1170509 (https://phabricator.wikimedia.org/T399931) (owner: 10Cathal Mooney)
[13:30:09] <wikibugs>	 (03Merged) 10jenkins-bot: BGP Policy: Set local-pref to zero on receipt of gshut community [homer/public] - 10https://gerrit.wikimedia.org/r/1170509 (https://phabricator.wikimedia.org/T399931) (owner: 10Cathal Mooney)
[13:30:43] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
[13:34:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1259 (T399249)', diff saved to https://phabricator.wikimedia.org/P79396 and previous config saved to /var/cache/conftool/dbconfig/20250718-133424-marostegui.json
[13:34:31] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[13:35:26] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
[13:35:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2149 (T399249)', diff saved to https://phabricator.wikimedia.org/P79397 and previous config saved to /var/cache/conftool/dbconfig/20250718-133533-marostegui.json
[13:35:40] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-codfw and 208.80.153.220 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:37:01] <logmsgbot>	 !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
[13:37:39] <wikibugs>	 (03PS1) 10Marostegui: db2242: Fix section [puppet] - 10https://gerrit.wikimedia.org/r/1170558
[13:38:10] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Capirca: handle script having no 'status' attribute gracefully [software/homer] - 10https://gerrit.wikimedia.org/r/1166373 (owner: 10Cathal Mooney)
[13:38:12] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2242.codfw.wmnet with reason: Maintenance
[13:38:15] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[13:39:12] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.upgrade for db2242.codfw.wmnet
[13:39:21] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.depool db2242 - Upgrading db2242.codfw.wmnet
[13:39:36] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2242: Fix section [puppet] - 10https://gerrit.wikimedia.org/r/1170558 (owner: 10Marostegui)
[13:39:50] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2242 - Upgrading db2242.codfw.wmnet
[13:40:07] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] "It was all good in zarcillo" [puppet] - 10https://gerrit.wikimedia.org/r/1170558 (owner: 10Marostegui)
[13:40:19] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] acme_chief: Delete empty directories after pruning expired certs [puppet] - 10https://gerrit.wikimedia.org/r/1170497 (https://phabricator.wikimedia.org/T399419) (owner: 10Vgutierrez)
[13:40:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1048 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79399 and previous config saved to /var/cache/conftool/dbconfig/20250718-134021-root.json
[13:40:40] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-codfw and 208.80.153.220 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:41:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1212 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79400 and previous config saved to /var/cache/conftool/dbconfig/20250718-134144-root.json
[13:42:16] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#11016986 (10ayounsi) From the network side it does indeed try to fetch the URL through TFTP... ` install1004:~$ sudo tcpdump host    10.64.159.5 tcpdump: verbose output...
[13:45:23] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.pool db2242 gradually with 4 steps - Upgrade of db2242.codfw.wmnet completed
[13:45:40] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-codfw and 208.80.153.220 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:49:43] <wikibugs>	 (03PS1) 10Elukey: pyrra: fix Istio latency metric config with latency_target_requests_regex [puppet] - 10https://gerrit.wikimedia.org/r/1170564 (https://phabricator.wikimedia.org/T390706)
[13:50:40] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:50:59] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6315/co" [puppet] - 10https://gerrit.wikimedia.org/r/1170564 (https://phabricator.wikimedia.org/T390706) (owner: 10Elukey)
[13:54:48] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017038 (10ssingh) Hi @REsquito-WMF: I am trying to understand if analytics-privatedata-users is really required for this. Can you clarify the reason for your access a bit mo...
[13:55:26] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] pyrra: fix Istio latency metric config with latency_target_requests_regex [puppet] - 10https://gerrit.wikimedia.org/r/1170564 (https://phabricator.wikimedia.org/T390706) (owner: 10Elukey)
[13:55:33] <wikibugs>	 (03CR) 10Elukey: [V:03+1 C:03+2] pyrra: fix Istio latency metric config with latency_target_requests_regex [puppet] - 10https://gerrit.wikimedia.org/r/1170564 (https://phabricator.wikimedia.org/T390706) (owner: 10Elukey)
[13:56:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79402 and previous config saved to /var/cache/conftool/dbconfig/20250718-135650-root.json
[14:02:29] <Dreamy_Jazz>	 !log Running `foreachwiki AbuseFilter:PopulateAbuseFilterLogIPHex.php --batch-size 1000 --sleep 1` for T397842
[14:02:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:33] <stashbot>	 T397842: Populate afl_ip_hex for pre-existing abuse_filter_log rows - https://phabricator.wikimedia.org/T397842
[14:02:58] <wikibugs>	 (03PS9) 10Daimona Eaytoy: Move special wikis outside of the 'wikipedia' group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167880 (https://phabricator.wikimedia.org/T183549)
[14:04:48] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017063 (10ssingh)
[14:05:20] <Dreamy_Jazz>	 !log Stopped the previous command
[14:05:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:05:33] <Dreamy_Jazz>	 !log Running `foreachwiki AbuseFilter:PopulateAbuseFilterLogIPHex.php` for T397842
[14:05:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:05:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[14:05:57] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[14:06:36] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017065 (10REsquito-WMF) HI  I will need acess to data lake, hive, and others.  Also Adam Baso just mentioned to me that I missing wmf group.
[14:06:59] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, July 21 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167900 (https://phabricator.wikimedia.org/T183549) (owner: 10Jforrester)
[14:07:09] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10decommission-hardware: decommission sretest2007/sretest2008 - https://phabricator.wikimedia.org/T399447#11017066 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm
[14:07:20] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, July 21 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167880 (https://phabricator.wikimedia.org/T183549) (owner: 10Daimona Eaytoy)
[14:07:30] <wikibugs>	 (03PS5) 10Daimona Eaytoy: Add a test to verify that "normal" DBLists contain only SUL wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167890 (https://phabricator.wikimedia.org/T183549)
[14:08:21] <wikibugs>	 (03CR) 10Jforrester: "Thanks for deploying, I got busy yesterday and ran out of time!" [extensions/FlaggedRevs] (wmf/1.45.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1170318 (https://phabricator.wikimedia.org/T399641) (owner: 10Jforrester)
[14:11:29] <wikibugs>	 (03CR) 10Brouberol: [V:03+1 C:03+2] deployment_server: group chown airflow-wmde kubeconfigs to airflow-deployers [puppet] - 10https://gerrit.wikimedia.org/r/1170508 (https://phabricator.wikimedia.org/T399066) (owner: 10Brouberol)
[14:11:49] <wikibugs>	 (03CR) 10Jforrester: [C:03+1] Move special wikis outside of the 'wikipedia' group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167880 (https://phabricator.wikimedia.org/T183549) (owner: 10Daimona Eaytoy)
[14:11:49] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, July 21 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167890 (https://phabricator.wikimedia.org/T183549) (owner: 10Daimona Eaytoy)
[14:11:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79404 and previous config saved to /var/cache/conftool/dbconfig/20250718-141156-root.json
[14:12:52] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017086 (10ssingh) >>! In T399899#11017065, @REsquito-WMF wrote: > HI >  > I will need acess to data lake, hive, and others. >  > Also Adam Baso just mentioned to me that I m...
[14:12:55] <wikibugs>	 (03PS2) 10Daimona Eaytoy: Clean up some settings for special wikis no longer in wikipedia group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1168169 (https://phabricator.wikimedia.org/T183549)
[14:13:19] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, July 21 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1168169 (https://phabricator.wikimedia.org/T183549) (owner: 10Daimona Eaytoy)
[14:13:21] <wikibugs>	 (03PS1) 10Jforrester: Clean up wmgWikibaseSiteGroup list, alpha-sort and de-dupe [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170565
[14:13:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:13:58] <wikibugs>	 (03PS10) 10Elukey: WIP: sre.hosts.provision: add custom settings for Supermicro [cookbooks] - 10https://gerrit.wikimedia.org/r/1170085 (https://phabricator.wikimedia.org/T394357)
[14:13:58] <wikibugs>	 (03PS5) 10Elukey: DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170463
[14:14:23] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, July 21 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167941 (owner: 10Daimona Eaytoy)
[14:14:42] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:14:50] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:15:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:18:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:19:04] <wikibugs>	 (03PS11) 10Elukey: WIP: sre.hosts.provision: add custom settings for Supermicro [cookbooks] - 10https://gerrit.wikimedia.org/r/1170085 (https://phabricator.wikimedia.org/T394357)
[14:19:04] <wikibugs>	 (03PS6) 10Elukey: DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170463
[14:20:09] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:20:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:21:03] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[14:21:11] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#11017119 (10elukey) Due to a bug in my provisioning-changes I was missing these:  ` BIOS: IPv4HTTPSupport is set to Disabled, while we want Enabled BIOS: IPv4PXESupport...
[14:22:34] <wikibugs>	 (03PS1) 10Fabfur: haproxy: this commit deliberately contains a syntax error in haproxy [puppet] - 10https://gerrit.wikimedia.org/r/1170567
[14:23:02] <wikibugs>	 (03PS2) 10Fabfur: haproxy: this commit deliberately contains a syntax error in haproxy [puppet] - 10https://gerrit.wikimedia.org/r/1170567
[14:25:10] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:25:27] <logmsgbot>	 !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:25:47] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DNM - test for ML hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1170463 (owner: 10Elukey)
[14:25:48] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WIP: sre.hosts.provision: add custom settings for Supermicro [cookbooks] - 10https://gerrit.wikimedia.org/r/1170085 (https://phabricator.wikimedia.org/T394357) (owner: 10Elukey)
[14:30:40] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:30:50] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2242 gradually with 4 steps - Upgrade of db2242.codfw.wmnet completed
[14:30:50] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2242.codfw.wmnet
[14:33:32] <wikibugs>	 (03PS1) 10Scott French: httpd: clean up transitional -bookworm track [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1170405 (https://phabricator.wikimedia.org/T378128)
[14:33:44] <wikibugs>	 (03PS1) 10Ayounsi: WIP: Bird: VM side - add support for Routed Ganeti [puppet] - 10https://gerrit.wikimedia.org/r/1170570 (https://phabricator.wikimedia.org/T362392)
[14:35:13] <wikibugs>	 (03CR) 10Scott French: [V:03+2] "No longer processed by docker-pkg." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1170405 (https://phabricator.wikimedia.org/T378128) (owner: 10Scott French)
[14:35:40] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:38:30] <wikibugs>	 (03PS2) 10Ayounsi: WIP: Bird: VM side - add support for Routed Ganeti [puppet] - 10https://gerrit.wikimedia.org/r/1170570 (https://phabricator.wikimedia.org/T362392)
[14:39:04] <jinxer-wm>	 FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on wdqs1022:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[14:39:24] <wikibugs>	 (03CR) 10Ayounsi: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1170570 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[14:43:23] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017204 (10dr0ptp4kt) Thanks @ssingh - I'm wondering, should we create a subheading between https://wikitech.wikimedia.org/wiki/SRE/Production_access#Generating_your_SSH_key...
[14:44:24] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] httpd: clean up transitional -bookworm track [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1170405 (https://phabricator.wikimedia.org/T378128) (owner: 10Scott French)
[14:44:52] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum - https://phabricator.wikimedia.org/T398650#11017207 (10dr0ptp4kt) Just for visibility here as these are around the same time: thread over at T399899#11017204 that's related, heads up.
[14:47:14] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 10LDAP-Access-Requests: Offboard Noarave from WMF systems - https://phabricator.wikimedia.org/T399953#11017210 (10ssingh) a:03joanna_borun
[14:51:39] <wikibugs>	 06SRE-OnFire, 10Cloud-VPS, 10cloud-services-team (FY2025/26-Q1), 10Sustainability (Incident Followup): Cloud Ceph misbehaving on Debian Bookworm - https://phabricator.wikimedia.org/T399858#11017233 (10fnegri) a:05fnegri→03Andrew The growth rate is slowing, but it's not flatlining as I hoped... So the s...
[14:53:59] <wikibugs>	 (03PS1) 10Ssingh: admin: add resquito to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1170571 (https://phabricator.wikimedia.org/T399899)
[14:54:12] <wikibugs>	 (03PS2) 10Ssingh: admin: add resquito to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1170571 (https://phabricator.wikimedia.org/T399899)
[14:54:13] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[14:54:56] <wikibugs>	 (03CR) 10CI reject: [V:04-1] admin: add resquito to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1170571 (https://phabricator.wikimedia.org/T399899) (owner: 10Ssingh)
[14:55:14] <wikibugs>	 (03PS3) 10Ssingh: admin: add resquito to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1170571 (https://phabricator.wikimedia.org/T399899)
[15:00:08] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017246 (10dr0ptp4kt) >>! In T399899#11017038, @ssingh wrote: > Hi @REsquito-WMF: I am trying to understand if analytics-privatedata-users is really req...
[15:00:45] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[15:06:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:09:01] <wikibugs>	 (03PS1) 10Fabfur: haproxy: script to perform configuration validation [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941)
[15:09:25] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[15:09:26] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017262 (10ssingh) Hi @dr0ptp4kt:  >>! In T399899#11017204, @dr0ptp4kt wrote: > Thanks @ssingh - I'm wondering, should we create a subheading between ht...
[15:09:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] haproxy: script to perform configuration validation [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941) (owner: 10Fabfur)
[15:14:09] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#11017270 (10elukey) @Jclark-ctr for some reason ml-serve1012 seems stuck, I am not able to powercycle it from the mgmt console. Would you mind to hard reset it when you...
[15:15:39] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr1-magru and Ufinet (187.108.235.25) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[15:17:38] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017283 (10dr0ptp4kt) Thanks @ssingh ! I think it's probably just a matter of updating the pages. I've had my access for a good while now, and I bet the...
[15:18:44] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017284 (10ssingh) >>! In T399899#11017246, @dr0ptp4kt wrote: >>>! In T399899#11017038, @ssingh wrote: >> Hi @REsquito-WMF: I am trying to understand if...
[15:20:53] <wikibugs>	 (03PS2) 10Fabfur: haproxy: script to perform configuration validation [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941)
[15:21:39] <wikibugs>	 10ops-eqiad, 06Data-Platform-SRE, 06DC-Ops: Q1:rack/setup/install an-worker12[09-32].eqiad.wmnet - https://phabricator.wikimedia.org/T399964 (10RobH) 03NEW
[15:21:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:21:58] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941) (owner: 10Fabfur)
[15:22:46] <wikibugs>	 10ops-eqiad, 06Data-Platform-SRE, 06DC-Ops: Q1:rack/setup/install an-worker12[09-32].eqiad.wmnet - https://phabricator.wikimedia.org/T399964#11017307 (10RobH)
[15:23:26] <wikibugs>	 10ops-eqiad, 06Data-Platform-SRE, 06DC-Ops: Q1:rack/setup/install an-worker12[09-32].eqiad.wmnet - https://phabricator.wikimedia.org/T399964#11017308 (10RobH) a:03BTullis Please update the site.pp file with the insetup role for your team (detailed on https://wikitech.wikimedia.org/wiki/SRE/Dc-operations) a...
[15:24:09] <wikibugs>	 (03PS1) 10Federico Ceratto: zarcillo: allow egress to gerrit [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170574 (https://phabricator.wikimedia.org/T389663)
[15:24:10] <wikibugs>	 (03CR) 10Federico Ceratto: "A small addition in egress regarding gerrit" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170574 (https://phabricator.wikimedia.org/T389663) (owner: 10Federico Ceratto)
[15:25:16] <logmsgbot>	 !log hashar@deploy1003 Started deploy [integration/docroot@6384514]: build: Updating mediawiki/mediawiki-phan-config to 0.16.0
[15:25:29] <logmsgbot>	 !log hashar@deploy1003 Finished deploy [integration/docroot@6384514]: build: Updating mediawiki/mediawiki-phan-config to 0.16.0 (duration: 00m 12s)
[15:28:01] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] wmnet: Update s3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1170555 (https://phabricator.wikimedia.org/T399954) (owner: 10Gerrit maintenance bot)
[15:29:21] <wikibugs>	 (03CR) 10Fabfur: "if anyone would like to try the commands I used to test it were:" [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941) (owner: 10Fabfur)
[15:31:05] <wikibugs>	 (03CR) 10Fabfur: [C:04-2] haproxy: this commit deliberately contains a syntax error in haproxy [puppet] - 10https://gerrit.wikimedia.org/r/1170567 (owner: 10Fabfur)
[15:32:30] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] "verified the uid and key are correct" [puppet] - 10https://gerrit.wikimedia.org/r/1170571 (https://phabricator.wikimedia.org/T399899) (owner: 10Ssingh)
[15:35:39] <jinxer-wm>	 RESOLVED: [2x] TransitBGPDown: Transit BGP session down between cr1-magru and Ufinet (187.108.235.25) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[15:36:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[15:36:32] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017327 (10ssingh)
[15:39:06] <wikibugs>	 (03PS3) 10Fabfur: haproxy: script to perform configuration validation [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941)
[15:39:32] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum - https://phabricator.wikimedia.org/T398650#11017329 (10ssingh) @aranyap: Also please note that it seems like you are using the same key for WMCS and production:  ` aranyap uses the same SSH key(...
[15:39:37] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] admin: add resquito to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1170571 (https://phabricator.wikimedia.org/T399899) (owner: 10Ssingh)
[15:41:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[15:41:29] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for resquito - https://phabricator.wikimedia.org/T399899#11017333 (10ssingh) @REsquito-WMF: Your access request has been merged. Please allow ~30 minutes for it to roll out. I have also added you to the `wmf` n...
[15:42:42] <wikibugs>	 (03CR) 10Vgutierrez: [C:04-1] haproxy: script to perform configuration validation (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941) (owner: 10Fabfur)
[15:48:45] <wikibugs>	 (03PS14) 10Federico Ceratto: Add parsercache pooling/depooling cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1165546 (https://phabricator.wikimedia.org/T388389)
[15:53:13] <wikibugs>	 (03CR) 10BCornwall: [C:03+2] ACMEChiefConfig: Automated MarkMonitor domain sync [puppet] - 10https://gerrit.wikimedia.org/r/1165187 (owner: 10Ncmonitor)
[15:55:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[15:55:18] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add parsercache pooling/depooling cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1165546 (https://phabricator.wikimedia.org/T388389) (owner: 10Federico Ceratto)
[15:55:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T399249)', diff saved to https://phabricator.wikimedia.org/P79407 and previous config saved to /var/cache/conftool/dbconfig/20250718-155542-marostegui.json
[15:55:47] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[16:00:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[16:01:04] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] zuul::main: install apparmor-utils, needed for docker [puppet] - 10https://gerrit.wikimedia.org/r/1170444 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn)
[16:01:11] <wikibugs>	 (03PS2) 10Dzahn: zuul::main: install apparmor-utils, needed for docker [puppet] - 10https://gerrit.wikimedia.org/r/1170444 (https://phabricator.wikimedia.org/T395938)
[16:07:14] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] zuul::main: install apparmor-utils, needed for docker [puppet] - 10https://gerrit.wikimedia.org/r/1170444 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn)
[16:09:58] <wikibugs>	 (03PS4) 10Fabfur: haproxy: script to perform configuration validation [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941)
[16:10:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P79408 and previous config saved to /var/cache/conftool/dbconfig/20250718-161050-marostegui.json
[16:10:55] <wikibugs>	 (03CR) 10Fabfur: haproxy: script to perform configuration validation (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941) (owner: 10Fabfur)
[16:16:58] <wikibugs>	 (03CR) 10Scott French: [V:03+2] "Thanks for the review!" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1170405 (https://phabricator.wikimedia.org/T378128) (owner: 10Scott French)
[16:17:28] <wikibugs>	 (03CR) 10Scott French: [V:03+2 C:03+2] httpd: clean up transitional -bookworm track [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1170405 (https://phabricator.wikimedia.org/T378128) (owner: 10Scott French)
[16:21:56] <wikibugs>	 (03PS15) 10Federico Ceratto: Add parsercache pooling/depooling cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1165546 (https://phabricator.wikimedia.org/T388389)
[16:25:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P79409 and previous config saved to /var/cache/conftool/dbconfig/20250718-162557-marostegui.json
[16:29:04] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add parsercache pooling/depooling cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1165546 (https://phabricator.wikimedia.org/T388389) (owner: 10Federico Ceratto)
[16:33:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[16:34:25] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-esams:xe-0/1/2 (inter.link reserved port) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[16:38:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[16:41:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T399249)', diff saved to https://phabricator.wikimedia.org/P79410 and previous config saved to /var/cache/conftool/dbconfig/20250718-164105-marostegui.json
[16:41:10] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[16:41:21] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
[16:41:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2156 (T399249)', diff saved to https://phabricator.wikimedia.org/P79411 and previous config saved to /var/cache/conftool/dbconfig/20250718-164128-marostegui.json
[16:43:10] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[16:53:27] <wikibugs>	 (03PS5) 10Fabfur: haproxy: script to perform configuration validation [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941)
[16:54:40] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[16:55:09] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.mysql.parsercache
[16:55:10] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
[16:55:16] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.mysql.parsercache
[16:55:16] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
[16:55:45] <wikibugs>	 (03CR) 10Fabfur: haproxy: script to perform configuration validation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1170572 (https://phabricator.wikimedia.org/T399941) (owner: 10Fabfur)
[16:55:56] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.mysql.parsercache
[16:55:57] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
[16:56:50] <wikibugs>	 (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1170584
[16:58:55] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[17:11:03] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#11017530 (10Jclark-ctr) Power cycled ml-server1012
[17:30:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[17:33:30] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Redundant bootloaders for software RAID - https://phabricator.wikimedia.org/T215183#11017567 (10Eevans) >>! In T215183#11014363, @Eevans wrote: > Has there been any progress toward goal #2?  I didn't see where anything had been added to the mentioned runbook. >  > For conte...
[17:34:11] <wikibugs>	 (03PS16) 10Federico Ceratto: Add parsercache pooling/depooling cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1165546 (https://phabricator.wikimedia.org/T388389)
[17:35:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[17:35:40] <wikibugs>	 (03PS17) 10Federico Ceratto: Add parsercache pooling/depooling cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1165546 (https://phabricator.wikimedia.org/T388389)
[17:47:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[17:47:55] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[17:58:55] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[18:03:55] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[18:39:04] <jinxer-wm>	 FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on wdqs1022:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[18:41:33] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum - https://phabricator.wikimedia.org/T398650#11017710 (10aranyap) @ssingh I'm now able to access JupyterHub and have deleted the WMCS key. Thank you!
[18:44:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[18:47:34] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum - https://phabricator.wikimedia.org/T398650#11017724 (10ssingh) 05Open→03Resolved Thanks for resolving the WMC key issue!
[18:49:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[18:54:13] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[18:55:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[18:56:33] <wikibugs>	 (03CR) 10Dzahn: [V:03+1] gerrit: replace host names in replica config with variables (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1170433 (https://phabricator.wikimedia.org/T387833) (owner: 10Dzahn)
[18:58:09] <wikibugs>	 (03CR) 10Dzahn: [V:03+1] gerrit: replace host names in replica config with variables (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1170433 (https://phabricator.wikimedia.org/T387833) (owner: 10Dzahn)
[18:59:52] <wikibugs>	 (03PS2) 10Dzahn: aphlict: create system user with systemd:sysuser [puppet] - 10https://gerrit.wikimedia.org/r/1080823 (https://phabricator.wikimedia.org/T377374)
[18:59:58] <wikibugs>	 (03PS3) 10Dzahn: aphlict: create system user with systemd:sysuser [puppet] - 10https://gerrit.wikimedia.org/r/1080823 (https://phabricator.wikimedia.org/T377374)
[19:00:30] <wikibugs>	 (03CR) 10Dzahn: "@mmuhlenhoff@wikimedia.org not -1 anymore now, since we are on bookworm. right?" [puppet] - 10https://gerrit.wikimedia.org/r/1080823 (https://phabricator.wikimedia.org/T377374) (owner: 10Dzahn)
[19:00:40] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[19:04:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T399249)', diff saved to https://phabricator.wikimedia.org/P79413 and previous config saved to /var/cache/conftool/dbconfig/20250718-190416-marostegui.json
[19:04:22] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[19:06:30] <wikibugs>	 06SRE, 06Fundraising-Backlog, 10fundraising-tech-ops, 10Infrastructure Security, and 2 others: Re-opening our DMarcian Trial - https://phabricator.wikimedia.org/T394788#11017745 (10nisrael) Hi SRE team,  Checking in on this task. Do you have an approximate timeline when we'd be able to configure the mail r...
[19:09:25] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[19:19:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P79414 and previous config saved to /var/cache/conftool/dbconfig/20250718-191924-marostegui.json
[19:34:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P79415 and previous config saved to /var/cache/conftool/dbconfig/20250718-193431-marostegui.json
[19:39:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[19:49:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T399249)', diff saved to https://phabricator.wikimedia.org/P79416 and previous config saved to /var/cache/conftool/dbconfig/20250718-194938-marostegui.json
[19:49:43] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[19:49:44] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
[19:49:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2177 (T399249)', diff saved to https://phabricator.wikimedia.org/P79417 and previous config saved to /var/cache/conftool/dbconfig/20250718-194951-marostegui.json
[20:13:37] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
[20:14:03] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[20:24:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[20:29:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[20:32:54] <wikibugs>	 06SRE, 10SRE-Access-Requests: Access Request to DMarcDigests - https://phabricator.wikimedia.org/T399976#11017861 (10Johannnes89)
[20:34:25] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-esams:xe-0/1/2 (inter.link reserved port) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[20:38:08] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Infrastructure-Foundations, 10Mail: Access Request to DMarcDigests - https://phabricator.wikimedia.org/T399976#11017873 (10Dzahn)
[20:40:48] <jinxer-wm>	 FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[20:41:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[20:46:40] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[21:09:45] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[21:10:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[21:15:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[21:20:10] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[21:25:10] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[21:25:48] <jinxer-wm>	 RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[21:28:24] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
[21:28:28] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[21:31:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[21:49:12] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
[21:49:18] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[21:57:03] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
[21:57:08] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[21:57:09] <wikibugs>	 06SRE, 10Wikimedia-Mailing-lists: Archive affiliates-l - https://phabricator.wikimedia.org/T399878#11017983 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup
[22:11:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T399249)', diff saved to https://phabricator.wikimedia.org/P79418 and previous config saved to /var/cache/conftool/dbconfig/20250718-221112-marostegui.json
[22:11:17] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[22:24:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[22:26:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P79419 and previous config saved to /var/cache/conftool/dbconfig/20250718-222620-marostegui.json
[22:29:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[22:39:04] <jinxer-wm>	 FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on wdqs1022:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[22:41:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P79420 and previous config saved to /var/cache/conftool/dbconfig/20250718-224127-marostegui.json
[22:54:13] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[22:56:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T399249)', diff saved to https://phabricator.wikimedia.org/P79421 and previous config saved to /var/cache/conftool/dbconfig/20250718-225635-marostegui.json
[22:56:40] <stashbot>	 T399249: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249
[22:56:51] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
[22:56:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2190 (T399249)', diff saved to https://phabricator.wikimedia.org/P79422 and previous config saved to /var/cache/conftool/dbconfig/20250718-225658-marostegui.json
[23:03:40] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[23:06:18] <wikibugs>	 (03PS1) 10Krinkle: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170614
[23:07:07] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170614 (owner: 10Krinkle)
[23:08:01] <wikibugs>	 (03CR) 10Krinkle: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1138933 (owner: 10Zabe)
[23:08:40] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[23:09:26] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate thanos-query.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[23:38:31] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1170615
[23:38:31] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1170615 (owner: 10TrainBranchBot)
[23:39:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[23:44:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[23:48:08] <wikibugs>	 (03PS2) 10Krinkle: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170614
[23:48:08] <wikibugs>	 (03PS1) 10Krinkle: build: Fix failing `phpcs` in CI on commits updating interwiki.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170616
[23:50:40] <jinxer-wm>	 FIRING: [3x] BFDdown: BFD session down between cr1-eqiad and fe80::6687:88ff:fef2:6d48 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[23:51:32] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1170615 (owner: 10TrainBranchBot)
[23:55:40] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[23:55:55] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-codfw and fe80::5e5e:abff:fe3d:8198 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown