[00:05:29] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:09:55] <tzatziki>	 !log removing 1 file for legal compliance
[00:09:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:38:16] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1111360
[00:38:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1111360 (owner: 10TrainBranchBot)
[00:56:44] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1111360 (owner: 10TrainBranchBot)
[01:08:46] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1111361
[01:08:47] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1111361 (owner: 10TrainBranchBot)
[01:35:39] <wikibugs>	 (03PS5) 10Scott French: shellbox-syntaxhighlight: 1 codfw replica on 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087581 (https://phabricator.wikimedia.org/T377038)
[01:35:39] <wikibugs>	 (03PS5) 10Scott French: shellbox-syntaxhighlight: all replicas on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087582 (https://phabricator.wikimedia.org/T377038)
[01:44:50] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1111361 (owner: 10TrainBranchBot)
[01:45:39] <wikibugs>	 (03CR) 10Bartosz Dziewoński: [C:03+1] Yet more authentication domain overrides [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111343 (https://phabricator.wikimedia.org/T383729) (owner: 10Gergő Tisza)
[01:47:09] <wikibugs>	 (03CR) 10Bartosz Dziewoński: [C:03+1] "Looks good, assuming that it's intended that you define several of them to the same name `'static'`." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111344 (https://phabricator.wikimedia.org/T383729) (owner: 10Gergő Tisza)
[01:48:18] <wikibugs>	 (03CR) 10Bartosz Dziewoński: [C:03+1] "I see that there's a note about that in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/1111345." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111344 (https://phabricator.wikimedia.org/T383729) (owner: 10Gergő Tisza)
[02:01:21] <icinga-wm>	 PROBLEM - Disk space on releases1003 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/dc93be088aaa4b75dd2c125bf59f25a009e9f771d54e5e2e443062e2577b23a2/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[02:21:21] <icinga-wm>	 RECOVERY - Disk space on releases1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[02:36:07] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv4: Connect https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[02:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:01:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:24:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10460931 (10phaultfinder)
[03:30:30] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1013:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[04:05:29] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:05:55] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:06:45] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53367 bytes in 0.069 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:19:53] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2025-01-13-044601-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111373 (https://phabricator.wikimedia.org/T382294)
[04:52:52] <wikibugs>	 (03PS1) 10KartikMistry: Update MinT to [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111374 (https://phabricator.wikimedia.org/T347929)
[04:54:27] <wikibugs>	 (03PS2) 10KartikMistry: Update MinT to 2025-01-07-122638-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111374 (https://phabricator.wikimedia.org/T347929)
[05:03:46] <kart_>	 Deploying cxserver/MinT.
[05:03:57] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Mail: Message sizes exceeding limits after migrating from Exim to Postfix - https://phabricator.wikimedia.org/T383271#10460979 (10Aklapper)
[05:05:35] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2025-01-13-044601-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111373 (https://phabricator.wikimedia.org/T382294) (owner: 10KartikMistry)
[05:06:37] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2025-01-13-044601-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111373 (https://phabricator.wikimedia.org/T382294) (owner: 10KartikMistry)
[05:14:53] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply
[05:15:19] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[05:20:04] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[05:20:33] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[05:22:07] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[05:22:41] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[05:23:23] <kart_>	 !log Updated cxserver to 2025-01-13-044601-production (T382294)
[05:23:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:23:27] <stashbot>	 T382294: Use openapi compliant examples in swagger spec - https://phabricator.wikimedia.org/T382294
[05:25:22] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update MinT to 2025-01-07-122638-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111374 (https://phabricator.wikimedia.org/T347929) (owner: 10KartikMistry)
[05:26:28] <wikibugs>	 (03Merged) 10jenkins-bot: Update MinT to 2025-01-07-122638-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111374 (https://phabricator.wikimedia.org/T347929) (owner: 10KartikMistry)
[05:30:13] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/machinetranslation: apply
[05:47:53] <kart_>	 Anything change recently with peopleweb.discovery.wmnet? Seems MinT can't download models from there.
[05:50:24] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
[06:10:25] <wikibugs>	 10SRE-swift-storage, 10CX-deployments, 10LPL Essential, 10MinT: Provide better long-term storage for translation models - https://phabricator.wikimedia.org/T335491#10461028 (10KartikMistry)
[06:50:17] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw[2369-2372].codfw.wmnet
[06:52:43] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw[2369-2372].codfw.wmnet
[06:54:55] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename mw23[69-72] to wikikube-worker222[0-3] [puppet] - 10https://gerrit.wikimedia.org/r/1111271 (https://phabricator.wikimedia.org/T377877) (owner: 10Jelto)
[06:58:46] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2369 to wikikube-worker2220
[06:59:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[06:59:11] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitorin
[06:59:11] <icinga-wm>	 status
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250115T0700)
[07:00:49] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitorin
[07:00:49] <icinga-wm>	 status
[07:04:31] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2369 to wikikube-worker2220 - jelto@cumin1002"
[07:04:51] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2369 to wikikube-worker2220 - jelto@cumin1002"
[07:04:51] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:04:51] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2220
[07:05:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2220
[07:06:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2369 to wikikube-worker2220
[07:08:40] <jinxer-wm>	 FIRING: [3x] KubernetesRsyslogDown: rsyslog on mw2370:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[07:29:52] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2370 to wikikube-worker2221
[07:30:12] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[07:30:30] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1013:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:36:38] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2370 to wikikube-worker2221 - jelto@cumin1002"
[07:37:26] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2370 to wikikube-worker2221 - jelto@cumin1002"
[07:37:26] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:37:26] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2221
[07:37:52] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2221
[07:38:30] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2370 to wikikube-worker2221
[07:39:21] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2371 to wikikube-worker2222
[07:39:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[07:43:10] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2371 to wikikube-worker2222 - jelto@cumin1002"
[07:45:45] <wikibugs>	 (03CR) 10Michael Große: "Agreed. This can now be deployed whenever." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105420 (https://phabricator.wikimedia.org/T379522) (owner: 10Michael Große)
[07:49:56] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2371 to wikikube-worker2222 - jelto@cumin1002"
[07:49:56] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:49:56] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2222
[07:50:23] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2222
[07:51:02] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2371 to wikikube-worker2222
[07:55:02] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2372 to wikikube-worker2223
[07:55:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[07:56:25] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "What's stopping us from merging this?" [puppet] - 10https://gerrit.wikimedia.org/r/1007026 (https://phabricator.wikimedia.org/T357595) (owner: 10RLazarus)
[07:58:48] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2372 to wikikube-worker2223 - jelto@cumin1002"
[07:59:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2372 to wikikube-worker2223 - jelto@cumin1002"
[07:59:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:59:10] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2223
[07:59:22] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2223
[07:59:58] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Explicitly disable all local imagescaling on k8s [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987432 (https://phabricator.wikimedia.org/T352515)
[08:00:01] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2372 to wikikube-worker2223
[08:00:04] <jouncebot>	 Amir1, Urbanecm, and awight: UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250115T0800). Please do the needful.
[08:00:05] <jouncebot>	 MatmaRex: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:32] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 16 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987432 (https://phabricator.wikimedia.org/T352515) (owner: 10Giuseppe Lavagetto)
[08:00:35] <MatmaRex>	 hi. any deployers around at this unusual hour?
[08:01:59] <wikibugs>	 (03Abandoned) 10Giuseppe Lavagetto: Start using the ClusterConfig class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/756016 (owner: 10Giuseppe Lavagetto)
[08:03:13] <wikibugs>	 (03Abandoned) 10Giuseppe Lavagetto: Simplify management of the request time limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/749718 (owner: 10Giuseppe Lavagetto)
[08:03:46] <wikibugs>	 (03Abandoned) 10Giuseppe Lavagetto: Do not use firejail on kubernetes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/920213 (owner: 10Giuseppe Lavagetto)
[08:05:29] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:08:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2220.codfw.wmnet wikikube-worker2221.codfw.wmnet wikikube-worker2222.codfw.wmnet wikikube-worker2223.codfw.wmnet on all recursors
[08:08:11] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2220.codfw.wmnet wikikube-worker2221.codfw.wmnet wikikube-worker2222.codfw.wmnet wikikube-worker2223.codfw.wmnet on all recursors
[08:10:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2220.codfw.wmnet
[08:10:16] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2220.codfw.wmnet with OS bullseye
[08:10:26] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2220
[08:10:34] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:13:59] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2220 - jelto@cumin1002"
[08:14:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2220 - jelto@cumin1002"
[08:14:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:14:03] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2220.codfw.wmnet 19.48.192.10.in-addr.arpa 9.1.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:14:06] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2220.codfw.wmnet 19.48.192.10.in-addr.arpa 9.1.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:14:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2220
[08:14:17] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2220
[08:14:18] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2220
[08:18:27] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2221.codfw.wmnet
[08:18:45] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2221.codfw.wmnet with OS bullseye
[08:18:55] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2221
[08:19:03] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:19:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P72049 and previous config saved to /var/cache/conftool/dbconfig/20250115-081922-root.json
[08:20:53] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add es1043 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1111567 (https://phabricator.wikimedia.org/T382569)
[08:21:04] <urbanecm>	 MatmaRex: i'm here!
[08:22:08] <MatmaRex>	 oh, hi!
[08:22:28] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2221 - jelto@cumin1002"
[08:22:33] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2221 - jelto@cumin1002"
[08:22:33] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:22:33] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2221.codfw.wmnet 20.48.192.10.in-addr.arpa 0.2.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:22:36] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2221.codfw.wmnet 20.48.192.10.in-addr.arpa 0.2.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:22:36] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2221
[08:22:48] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2221
[08:22:48] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2221
[08:23:04] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Add license messages for new Wikinews licenses [extensions/WikimediaMessages] (wmf/1.44.0-wmf.11) - 10https://gerrit.wikimedia.org/r/1109756 (https://phabricator.wikimedia.org/T383338) (owner: 10Bartosz Dziewoński)
[08:23:27] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances.yaml: Add es1043 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1111567 (https://phabricator.wikimedia.org/T382569) (owner: 10Marostegui)
[08:23:38] <urbanecm>	 MatmaRex: i guess i need to wait with the license config for the WikimediaMessages backport
[08:24:21] <MatmaRex>	 that'd be ideal
[08:25:09] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2222.codfw.wmnet
[08:25:25] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2222.codfw.wmnet with OS bullseye
[08:25:35] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2222
[08:25:43] * MichaelG_WMF is here as well
[08:25:55] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Add es1043 to dbctl depooled T382569', diff saved to https://phabricator.wikimedia.org/P72050 and previous config saved to /var/cache/conftool/dbconfig/20250115-082554-marostegui.json
[08:25:58] <stashbot>	 T382569: Productionize es104[1-6] - https://phabricator.wikimedia.org/T382569
[08:26:04] <wikibugs>	 (03PS5) 10Filippo Giunchedi: thanos-rule: manage retention setting [puppet] - 10https://gerrit.wikimedia.org/r/1111241 (https://phabricator.wikimedia.org/T352756) (owner: 10Herron)
[08:26:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:26:35] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Enable abusefilter-log-detail for autoconfirmed users on en.wikibooks (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109736 (https://phabricator.wikimedia.org/T383332) (owner: 10Dreamrimmer)
[08:26:50] <wikibugs>	 (03PS2) 10Michael Große: Growth: Remove temporary config for clearing link recommendations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105420 (https://phabricator.wikimedia.org/T379522)
[08:26:52] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Growth: Remove temporary config for clearing link recommendations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105420 (https://phabricator.wikimedia.org/T379522) (owner: 10Michael Große)
[08:27:18] <wikibugs>	 (03Merged) 10jenkins-bot: Enable abusefilter-log-detail for autoconfirmed users on en.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109736 (https://phabricator.wikimedia.org/T383332) (owner: 10Dreamrimmer)
[08:27:41] <wikibugs>	 (03Merged) 10jenkins-bot: Growth: Remove temporary config for clearing link recommendations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105420 (https://phabricator.wikimedia.org/T379522) (owner: 10Michael Große)
[08:28:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Please see PS4" [puppet] - 10https://gerrit.wikimedia.org/r/1111241 (https://phabricator.wikimedia.org/T352756) (owner: 10Herron)
[08:28:13] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: htmlform: fix defaults for namespace and relative in titlesmultiselect [core] (wmf/1.44.0-wmf.11) - 10https://gerrit.wikimedia.org/r/1111568 (https://phabricator.wikimedia.org/T383133)
[08:28:18] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: htmlform: fix defaults for namespace and relative in titlesmultiselect [core] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111569 (https://phabricator.wikimedia.org/T383133)
[08:28:37] <wikibugs>	 (03PS6) 10Filippo Giunchedi: thanos-rule: manage retention setting [puppet] - 10https://gerrit.wikimedia.org/r/1111241 (https://phabricator.wikimedia.org/T352756) (owner: 10Herron)
[08:28:45] <logmsgbot>	 !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1109736|Enable abusefilter-log-detail for autoconfirmed users on en.wikibooks (T383332)]], [[gerrit:1105420|Growth: Remove temporary config for clearing link recommendations (T379522)]]
[08:28:50] <stashbot>	 T383332: Enable abusefilter-log-detail for autoconfirmed users on en.wikibooks - https://phabricator.wikimedia.org/T383332
[08:28:51] <stashbot>	 T379522: Switch GETempLinkRecommendationSwitchTagClearHook to true at all wikis - https://phabricator.wikimedia.org/T379522
[08:29:00] <MatmaRex>	 just realized those core changes should probably be backported too. if we have the time
[08:29:28] <urbanecm>	 sure
[08:29:40] <urbanecm>	 MatmaRex: do you mind adding them to the calendar, please?
[08:29:43] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2222 - jelto@cumin1002"
[08:29:47] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2222 - jelto@cumin1002"
[08:29:47] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:29:47] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2222.codfw.wmnet 21.48.192.10.in-addr.arpa 1.2.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:29:50] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2222.codfw.wmnet 21.48.192.10.in-addr.arpa 1.2.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:29:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2222
[08:30:08] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2222
[08:30:08] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2222
[08:30:11] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4802/co" [puppet] - 10https://gerrit.wikimedia.org/r/1111241 (https://phabricator.wikimedia.org/T352756) (owner: 10Herron)
[08:30:27] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2220.codfw.wmnet with reason: host reimage
[08:31:01] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2223.codfw.wmnet
[08:31:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2223.codfw.wmnet with OS bullseye
[08:31:29] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2223
[08:31:30] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 15 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [core] (wmf/1.44.0-wmf.11) - 10https://gerrit.wikimedia.org/r/1111568 (https://phabricator.wikimedia.org/T383133) (owner: 10Bartosz Dziewoński)
[08:31:34] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:31:38] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 15 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [core] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111569 (https://phabricator.wikimedia.org/T383133) (owner: 10Bartosz Dziewoński)
[08:31:58] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Move kafka-main2010 within the same rack - https://phabricator.wikimedia.org/T381788#10461157 (10JMeybohm) >>! In T381788#10458970, @Jhancock.wm wrote: > We are off on the 20th in the US. but the rest of the week is good for me.   Sorry, I wasn't aware. What abou...
[08:32:13] <urbanecm>	 MatmaRex: i meant for this window, unless you deliberately want to wait with them for later?
[08:32:24] <MatmaRex>	 urbanecm: done (fixed the window)
[08:32:42] <urbanecm>	 ty
[08:32:58] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] htmlform: fix defaults for namespace and relative in titlesmultiselect [core] (wmf/1.44.0-wmf.11) - 10https://gerrit.wikimedia.org/r/1111568 (https://phabricator.wikimedia.org/T383133) (owner: 10Bartosz Dziewoński)
[08:32:59] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] htmlform: fix defaults for namespace and relative in titlesmultiselect [core] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111569 (https://phabricator.wikimedia.org/T383133) (owner: 10Bartosz Dziewoński)
[08:33:09] <wikibugs>	 (03PS1) 10Marostegui: production-m5.sql.erb: Add new grants to ipoid_rw [puppet] - 10https://gerrit.wikimedia.org/r/1111570 (https://phabricator.wikimedia.org/T383753)
[08:33:28] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2220.codfw.wmnet with reason: host reimage
[08:34:14] <wikibugs>	 (03CR) 10Marostegui: "This is a noop as grants need to be added to the DB" [puppet] - 10https://gerrit.wikimedia.org/r/1111570 (https://phabricator.wikimedia.org/T383753) (owner: 10Marostegui)
[08:34:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P72051 and previous config saved to /var/cache/conftool/dbconfig/20250115-083427-root.json
[08:34:59] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2223 - jelto@cumin1002"
[08:35:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2223 - jelto@cumin1002"
[08:35:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:35:04] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2223.codfw.wmnet 22.48.192.10.in-addr.arpa 2.2.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:35:07] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2223.codfw.wmnet 22.48.192.10.in-addr.arpa 2.2.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:35:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2223
[08:35:40] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2223
[08:35:40] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2223
[08:35:45] <logmsgbot>	 !log urbanecm@deploy2002 dreamrimmer, urbanecm, migr: Backport for [[gerrit:1109736|Enable abusefilter-log-detail for autoconfirmed users on en.wikibooks (T383332)]], [[gerrit:1105420|Growth: Remove temporary config for clearing link recommendations (T379522)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:35:50] <stashbot>	 T383332: Enable abusefilter-log-detail for autoconfirmed users on en.wikibooks - https://phabricator.wikimedia.org/T383332
[08:35:50] <stashbot>	 T379522: Switch GETempLinkRecommendationSwitchTagClearHook to true at all wikis - https://phabricator.wikimedia.org/T379522
[08:36:13] <urbanecm>	 MatmaRex: can you test the first one (r1109736)?
[08:37:04] <MatmaRex>	 urbanecm: i can look at Special:UserGroupRights, but not beyond that
[08:37:13] <urbanecm>	 that's fair
[08:37:31] <MatmaRex>	 looks good then
[08:37:33] <logmsgbot>	 !log urbanecm@deploy2002 dreamrimmer, urbanecm, migr: Continuing with sync
[08:38:37] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2221.codfw.wmnet with reason: host reimage
[08:41:31] <wikibugs>	 07Puppet, 06SRE, 06Data-Engineering-Radar: modules/udp2log/manifests/instance/monitoring.pp has unreachable code - https://phabricator.wikimedia.org/T152104#10461173 (10fgiunchedi) 05Open→03Invalid Manifest doesn't contain unreachable code anymore  ` define udp2log::instance::monitoring(     $log_dir...
[08:42:17] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2221.codfw.wmnet with reason: host reimage
[08:44:55] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1109736|Enable abusefilter-log-detail for autoconfirmed users on en.wikibooks (T383332)]], [[gerrit:1105420|Growth: Remove temporary config for clearing link recommendations (T379522)]] (duration: 16m 09s)
[08:45:00] <stashbot>	 T383332: Enable abusefilter-log-detail for autoconfirmed users on en.wikibooks - https://phabricator.wikimedia.org/T383332
[08:45:00] <stashbot>	 T379522: Switch GETempLinkRecommendationSwitchTagClearHook to true at all wikis - https://phabricator.wikimedia.org/T379522
[08:45:04] <urbanecm>	 okay, first deployment done
[08:45:06] <urbanecm>	 waiting on CI now
[08:45:22] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2222.codfw.wmnet with reason: host reimage
[08:45:27] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Update French wikinews license to CC-BY-SA 4.0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1106911 (https://phabricator.wikimedia.org/T381946) (owner: 10Dreamrimmer)
[08:46:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [extensions/WikimediaMessages] (wmf/1.44.0-wmf.11) - 10https://gerrit.wikimedia.org/r/1109756 (https://phabricator.wikimedia.org/T383338) (owner: 10Bartosz Dziewoński)
[08:46:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1106911 (https://phabricator.wikimedia.org/T381946) (owner: 10Dreamrimmer)
[08:46:14] <wikibugs>	 (03Merged) 10jenkins-bot: Update French wikinews license to CC-BY-SA 4.0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1106911 (https://phabricator.wikimedia.org/T381946) (owner: 10Dreamrimmer)
[08:47:26] <wikibugs>	 (03Merged) 10jenkins-bot: Add license messages for new Wikinews licenses [extensions/WikimediaMessages] (wmf/1.44.0-wmf.11) - 10https://gerrit.wikimedia.org/r/1109756 (https://phabricator.wikimedia.org/T383338) (owner: 10Bartosz Dziewoński)
[08:48:00] <logmsgbot>	 !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1109756|Add license messages for new Wikinews licenses (T383338)]], [[gerrit:1106911|Update French wikinews license to CC-BY-SA 4.0 (T381946)]]
[08:48:06] <stashbot>	 T383338: Check/fix/cleanup licenses on Wikinewses january 2025 - https://phabricator.wikimedia.org/T383338
[08:48:06] <stashbot>	 T381946: Update license of dewikinews and frwikinews to CC-BY-SA 4.0 by January 1, 2025 - https://phabricator.wikimedia.org/T381946
[08:49:09] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2222.codfw.wmnet with reason: host reimage
[08:49:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid (k8s) 1.748s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[08:49:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P72052 and previous config saved to /var/cache/conftool/dbconfig/20250115-084932-root.json
[08:51:15] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2223.codfw.wmnet with reason: host reimage
[08:52:55] <wikibugs>	 (03Merged) 10jenkins-bot: htmlform: fix defaults for namespace and relative in titlesmultiselect [core] (wmf/1.44.0-wmf.11) - 10https://gerrit.wikimedia.org/r/1111568 (https://phabricator.wikimedia.org/T383133) (owner: 10Bartosz Dziewoński)
[08:53:03] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
[08:53:26] <logmsgbot>	 !log urbanecm@deploy2002 sync-world aborted: Backport for [[gerrit:1109756|Add license messages for new Wikinews licenses (T383338)]], [[gerrit:1106911|Update French wikinews license to CC-BY-SA 4.0 (T381946)]] (duration: 05m 25s)
[08:53:30] <stashbot>	 T383338: Check/fix/cleanup licenses on Wikinewses january 2025 - https://phabricator.wikimedia.org/T383338
[08:53:30] <stashbot>	 T381946: Update license of dewikinews and frwikinews to CC-BY-SA 4.0 by January 1, 2025 - https://phabricator.wikimedia.org/T381946
[08:53:34] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2220.codfw.wmnet with OS bullseye
[08:53:35] <wikibugs>	 (03Merged) 10jenkins-bot: htmlform: fix defaults for namespace and relative in titlesmultiselect [core] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111569 (https://phabricator.wikimedia.org/T383133) (owner: 10Bartosz Dziewoński)
[08:53:52] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2223.codfw.wmnet with reason: host reimage
[08:54:15] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid (k8s) 1.324s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[08:54:23] <logmsgbot>	 !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1109756|Add license messages for new Wikinews licenses (T383338)]], [[gerrit:1106911|Update French wikinews license to CC-BY-SA 4.0 (T381946)]], [[gerrit:1111568|htmlform: fix defaults for namespace and relative in titlesmultiselect (T383133)]], [[gerrit:1111569|htmlform: fix defaults for namespace and relative in titlesmultiselect (T383133)]]
[08:54:28] <stashbot>	 T383133: Page restrictions menu not being populated correctly - https://phabricator.wikimedia.org/T383133
[08:54:49] <jelto>	 !log !log homer cr*codfw* commit 'T377877'
[08:54:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:54:52] <stashbot>	 T377877: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877
[08:57:02] <jelto>	 !log homer lsw1-d3-codfw* commit 'T377877'
[08:57:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:59:58] <icinga-wm>	 PROBLEM - BGP status on lsw1-d3-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:02:00] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 104, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:02:57] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2221.codfw.wmnet with OS bullseye
[09:04:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P72053 and previous config saved to /var/cache/conftool/dbconfig/20250115-090437-root.json
[09:07:58] <icinga-wm>	 RECOVERY - BGP status on lsw1-d3-codfw.mgmt is OK: BGP OK - up: 26, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:10:30] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2222.codfw.wmnet with OS bullseye
[09:10:58] <icinga-wm>	 PROBLEM - BGP status on lsw1-d3-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:11:00] <wikibugs>	 (03PS1) 10Stevemunene: Kerberos access for Kgraessle [puppet] - 10https://gerrit.wikimedia.org/r/1111575 (https://phabricator.wikimedia.org/T383598)
[09:11:13] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2220.codfw.wmnet
[09:12:56] <icinga-wm>	 RECOVERY - BGP status on lsw1-d3-codfw.mgmt is OK: BGP OK - up: 26, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:15:09] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] production-m5.sql.erb: Add new grants to ipoid_rw [puppet] - 10https://gerrit.wikimedia.org/r/1111570 (https://phabricator.wikimedia.org/T383753) (owner: 10Marostegui)
[09:15:22] <MatmaRex>	 urbanecm: it's still in progress, right?
[09:15:26] <urbanecm>	 MatmaRex: correct
[09:15:29] <MatmaRex>	 or did i miss it
[09:15:30] <MatmaRex>	 alright
[09:15:37] <urbanecm>	 still in the pre-mwdebug stage
[09:15:44] <urbanecm>	 deploying i18n changes takes time :)
[09:15:50] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2223.codfw.wmnet with OS bullseye
[09:15:55] <logmsgbot>	 !log urbanecm@deploy2002 matmarex, urbanecm, dreamrimmer: Backport for [[gerrit:1109756|Add license messages for new Wikinews licenses (T383338)]], [[gerrit:1106911|Update French wikinews license to CC-BY-SA 4.0 (T381946)]], [[gerrit:1111568|htmlform: fix defaults for namespace and relative in titlesmultiselect (T383133)]], [[gerrit:1111569|htmlform: fix defaults for namespace and relative in titlesmultiselect (T383133)]]
[09:15:56] <logmsgbot>	 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[09:16:01] <urbanecm>	 here we go!
[09:16:01] <stashbot>	 T383338: Check/fix/cleanup licenses on Wikinewses january 2025 - https://phabricator.wikimedia.org/T383338
[09:16:02] <stashbot>	 T381946: Update license of dewikinews and frwikinews to CC-BY-SA 4.0 by January 1, 2025 - https://phabricator.wikimedia.org/T381946
[09:16:02] <stashbot>	 T383133: Page restrictions menu not being populated correctly - https://phabricator.wikimedia.org/T383133
[09:16:03] <urbanecm>	 MatmaRex: can you test?
[09:16:08] <urbanecm>	 (all remaining patches should be there)
[09:16:20] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to releasers-mediawiki for MSantos - https://phabricator.wikimedia.org/T382616#10461268 (10Bmueller) @Dzahn Approved, thank you!
[09:16:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1043 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P72054 and previous config saved to /var/cache/conftool/dbconfig/20250115-091622-root.json
[09:16:34] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
[09:16:37] <MatmaRex>	 yeah. looking
[09:17:26] <wikibugs>	 (03PS1) 10Marostegui: es1043: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1111576 (https://phabricator.wikimedia.org/T382569)
[09:18:03] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] es1043: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1111576 (https://phabricator.wikimedia.org/T382569) (owner: 10Marostegui)
[09:19:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P72055 and previous config saved to /var/cache/conftool/dbconfig/20250115-091943-root.json
[09:20:15] <MatmaRex>	 urbanecm: things look good
[09:20:19] <urbanecm>	 great! proceeding
[09:20:20] <logmsgbot>	 !log urbanecm@deploy2002 matmarex, urbanecm, dreamrimmer: Continuing with sync
[09:21:49] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
[09:24:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10461281 (10phaultfinder)
[09:26:31] <wikibugs>	 (03PS1) 10Filippo Giunchedi: kubernetes: enable selecting clusters for deployment_server [puppet] - 10https://gerrit.wikimedia.org/r/1111577 (https://phabricator.wikimedia.org/T383699)
[09:26:34] <wikibugs>	 (03PS1) 10Filippo Giunchedi: ci: select k8s staging for deployment_server [puppet] - 10https://gerrit.wikimedia.org/r/1111578 (https://phabricator.wikimedia.org/T383699)
[09:28:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [software/bitu] - 10https://gerrit.wikimedia.org/r/1110732 (https://phabricator.wikimedia.org/T383201) (owner: 10Slyngshede)
[09:28:53] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1111575 (https://phabricator.wikimedia.org/T383598) (owner: 10Stevemunene)
[09:30:33] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2221.codfw.wmnet
[09:31:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1043 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P72056 and previous config saved to /var/cache/conftool/dbconfig/20250115-093127-root.json
[09:32:23] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] Kerberos access for Kgraessle [puppet] - 10https://gerrit.wikimedia.org/r/1111575 (https://phabricator.wikimedia.org/T383598) (owner: 10Stevemunene)
[09:34:20] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1109756|Add license messages for new Wikinews licenses (T383338)]], [[gerrit:1106911|Update French wikinews license to CC-BY-SA 4.0 (T381946)]], [[gerrit:1111568|htmlform: fix defaults for namespace and relative in titlesmultiselect (T383133)]], [[gerrit:1111569|htmlform: fix defaults for namespace and relative in titlesmultiselect (T383133)]] (durat
[09:34:21] <logmsgbot>	 ion: 39m 57s)
[09:34:26] <urbanecm>	 finally
[09:34:26] <stashbot>	 T383338: Check/fix/cleanup licenses on Wikinewses january 2025 - https://phabricator.wikimedia.org/T383338
[09:34:26] <stashbot>	 T381946: Update license of dewikinews and frwikinews to CC-BY-SA 4.0 by January 1, 2025 - https://phabricator.wikimedia.org/T381946
[09:34:27] <stashbot>	 T383133: Page restrictions menu not being populated correctly - https://phabricator.wikimedia.org/T383133
[09:34:28] <urbanecm>	 39mins
[09:34:31] <urbanecm>	 MatmaRex: anything else? :)
[09:34:45] <MatmaRex>	 thanks urbanecm
[09:36:10] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - No response from remote host 208.80.154.197 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:36:44] <wikibugs>	 (03CR) 10Hashar: [C:03+1] ci: Install memcached for MediaWiki success cache [puppet] - 10https://gerrit.wikimedia.org/r/1111295 (https://phabricator.wikimedia.org/T383243) (owner: 10Dduvall)
[09:37:26] <hashar>	 argh
[09:37:32] <hashar>	 39 minutes is way tooo lonng
[09:38:36] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2222.codfw.wmnet
[09:42:25] <wikibugs>	 (03PS1) 10Gkyziridis: ml-services: update articletopic outlink image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111580 (https://phabricator.wikimedia.org/T383312)
[09:43:25] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 16 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109108 (https://phabricator.wikimedia.org/T382947) (owner: 10Giuseppe Lavagetto)
[09:44:55] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2223.codfw.wmnet
[09:45:02] <wikibugs>	 (03CR) 10David Caro: "Thanks very much for this!" [alerts] - 10https://gerrit.wikimedia.org/r/1111328 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[09:45:46] <wikibugs>	 (03CR) 10David Caro: [C:03+1] "LGTM (once we have the others)" [puppet] - 10https://gerrit.wikimedia.org/r/1111340 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[09:46:25] <wikibugs>	 (03CR) 10David Caro: [C:03+1] "/me being unclear" [puppet] - 10https://gerrit.wikimedia.org/r/1111340 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[09:46:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1043 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P72057 and previous config saved to /var/cache/conftool/dbconfig/20250115-094632-root.json
[09:46:40] <urbanecm>	 hashar: i'm jealously looking at deployments from few years ago, when it took less than a minute
[09:46:47] <urbanecm>	 (granting, _not_ when i'm changing i18n)
[09:47:13] <wikibugs>	 (03PS2) 10Stevemunene: Add linkeddata.cultureelerfgoed.nl to SPARQL allowlist [puppet] - 10https://gerrit.wikimedia.org/r/1105882 (https://phabricator.wikimedia.org/T381717)
[09:47:53] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
[09:49:56] <wikibugs>	 (03PS2) 10Filippo Giunchedi: kubernetes: enable selecting clusters for deployment_server [puppet] - 10https://gerrit.wikimedia.org/r/1111577 (https://phabricator.wikimedia.org/T383699)
[09:49:56] <wikibugs>	 (03PS2) 10Filippo Giunchedi: ci: select k8s staging for deployment_server [puppet] - 10https://gerrit.wikimedia.org/r/1111578 (https://phabricator.wikimedia.org/T383699)
[09:50:16] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru and A:cp
[09:51:35] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2220-2223].codfw.wmnet
[09:51:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2220-2223].codfw.wmnet
[09:53:19] <wikibugs>	 (03PS1) 10Muehlenhoff: Add component/amd-gpu-firmware [puppet] - 10https://gerrit.wikimedia.org/r/1111582 (https://phabricator.wikimedia.org/T383557)
[09:53:21] <wikibugs>	 (03PS1) 10Muehlenhoff: amd_rocm: Switch to installing from component/amd-gpu-firmware [puppet] - 10https://gerrit.wikimedia.org/r/1111583 (https://phabricator.wikimedia.org/T383557)
[09:54:45] <jayme>	 !log disabling puppet on 543 nodes using k8s::package resource - T341984
[09:54:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:54:49] <stashbot>	 T341984: Update Kubernetes clusters to >1.25 - https://phabricator.wikimedia.org/T341984
[09:54:54] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2220.codfw.wmnet with OS bookworm
[09:55:13] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2220
[09:55:13] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2220
[09:57:18] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] k8s::package: Install version specific kubernetes-client package [puppet] - 10https://gerrit.wikimedia.org/r/1109704 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[09:58:57] <icinga-wm>	 PROBLEM - BGP status on lsw1-d3-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:59:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2221.codfw.wmnet with OS bookworm
[09:59:26] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2221
[09:59:26] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2221
[09:59:29] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 15 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111344 (https://phabricator.wikimedia.org/T383729) (owner: 10Gergő Tisza)
[09:59:57] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2222.codfw.wmnet with OS bookworm
[10:00:10] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 15 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111343 (https://phabricator.wikimedia.org/T383729) (owner: 10Gergő Tisza)
[10:00:16] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2222
[10:00:16] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2222
[10:01:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2223.codfw.wmnet with OS bookworm
[10:01:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2223
[10:01:19] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2223
[10:01:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1043 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P72058 and previous config saved to /var/cache/conftool/dbconfig/20250115-100138-root.json
[10:02:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es1025 to eqiad es5 master dbmaint T382569', diff saved to https://phabricator.wikimedia.org/P72059 and previous config saved to /var/cache/conftool/dbconfig/20250115-100207-marostegui.json
[10:02:11] <stashbot>	 T382569: Productionize es104[1-6] - https://phabricator.wikimedia.org/T382569
[10:02:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es1024 T382569', diff saved to https://phabricator.wikimedia.org/P72060 and previous config saved to /var/cache/conftool/dbconfig/20250115-100228-marostegui.json
[10:02:44] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es1024.eqiad.wmnet with reason: cloning
[10:02:58] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es1024.eqiad.wmnet with reason: cloning
[10:04:20] <wikibugs>	 (03CR) 10Máté Szabó: [C:03+1] urldownloader: scrub outbound privacy-sensitive hdrs [puppet] - 10https://gerrit.wikimedia.org/r/1111265 (https://phabricator.wikimedia.org/T340552) (owner: 10CDanis)
[10:04:47] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Productionize es1045 [puppet] - 10https://gerrit.wikimedia.org/r/1111584 (https://phabricator.wikimedia.org/T382569)
[10:05:08] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mariadb: Productionize es1045 [puppet] - 10https://gerrit.wikimedia.org/r/1111584 (https://phabricator.wikimedia.org/T382569) (owner: 10Marostegui)
[10:05:58] <wikibugs>	 (03PS2) 10Marostegui: mariadb: Productionize es1045 [puppet] - 10https://gerrit.wikimedia.org/r/1111584 (https://phabricator.wikimedia.org/T382569)
[10:06:19] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mariadb: Productionize es1045 [puppet] - 10https://gerrit.wikimedia.org/r/1111584 (https://phabricator.wikimedia.org/T382569) (owner: 10Marostegui)
[10:06:53] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 112, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:08:02] <jayme>	 !log re-enabling puppet on nodes using k8s::package resource - T341984
[10:08:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:06] <stashbot>	 T341984: Update Kubernetes clusters to >1.25 - https://phabricator.wikimedia.org/T341984
[10:08:08] <wikibugs>	 (03PS3) 10Marostegui: mariadb: Productionize es1045 [puppet] - 10https://gerrit.wikimedia.org/r/1111584 (https://phabricator.wikimedia.org/T382569)
[10:09:03] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Productionize es1045 [puppet] - 10https://gerrit.wikimedia.org/r/1111584 (https://phabricator.wikimedia.org/T382569) (owner: 10Marostegui)
[10:11:47] <icinga-wm>	 PROBLEM - Host restbase2037 is DOWN: PING CRITICAL - Packet loss = 100%
[10:11:53] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 113, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:12:08] <wikibugs>	 (03PS9) 10JMeybohm: Update staging-codfw to kubernetes 1.31, calico 3.29 [puppet] - 10https://gerrit.wikimedia.org/r/1110813 (https://phabricator.wikimedia.org/T341984)
[10:12:29] <wikibugs>	 (03PS1) 10Gergő Tisza: Enable SUL3 on test wikis, second attempt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111585 (https://phabricator.wikimedia.org/T383729)
[10:12:39] <wikibugs>	 (03CR) 10JMeybohm: Update staging-codfw to kubernetes 1.31, calico 3.29 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1110813 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[10:13:07] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 15 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111585 (https://phabricator.wikimedia.org/T383729) (owner: 10Gergő Tisza)
[10:13:29] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2220.codfw.wmnet with reason: host reimage
[10:13:58] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru and A:cp
[10:14:22] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service restbase2037-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:14:42] <wikibugs>	 (03PS6) 10Bartosz Dziewoński: Replace favicon.php with static.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075211 (https://phabricator.wikimedia.org/T374997)
[10:14:47] <icinga-wm>	 RECOVERY - Host restbase2037 is UP: PING OK - Packet loss = 0%, RTA = 30.27 ms
[10:16:05] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2220.codfw.wmnet with reason: host reimage
[10:16:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1043 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P72061 and previous config saved to /var/cache/conftool/dbconfig/20250115-101643-root.json
[10:16:51] <jinxer-wm>	 FIRING: SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://wikifeeds.svc.codfw.wmnet:4101 - https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?var-site=codfw - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[10:17:09] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Provide additional information about users [software/bitu] - 10https://gerrit.wikimedia.org/r/1110732 (https://phabricator.wikimedia.org/T383201) (owner: 10Slyngshede)
[10:17:35] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2221.codfw.wmnet with reason: host reimage
[10:18:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2222.codfw.wmnet with reason: host reimage
[10:19:11] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2223.codfw.wmnet with reason: host reimage
[10:21:24] <wikibugs>	 (03Merged) 10jenkins-bot: Provide additional information about users [software/bitu] - 10https://gerrit.wikimedia.org/r/1110732 (https://phabricator.wikimedia.org/T383201) (owner: 10Slyngshede)
[10:21:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2221.codfw.wmnet with reason: host reimage
[10:21:51] <jinxer-wm>	 RESOLVED: SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://wikifeeds.svc.codfw.wmnet:4101 - https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?var-site=codfw - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[10:22:08] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: No response from remote host 208.80.153.193 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:22:17] <jinxer-wm>	 RESOLVED: [6x] ProbeDown: Service restbase2037-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:22:55] <wikibugs>	 (03PS5) 10Klausman: admin/data: Add user for Georgios Kyziridis (ML Team) [puppet] - 10https://gerrit.wikimedia.org/r/1109414
[10:22:55] <wikibugs>	 (03CR) 10Klausman: [V:03+1] "Sorry for the wide approvers list." [puppet] - 10https://gerrit.wikimedia.org/r/1109414 (owner: 10Klausman)
[10:22:56] <icinga-wm>	 PROBLEM - Juniper alarms on cr2-codfw is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 208.80.153.193 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm
[10:23:46] <icinga-wm>	 RECOVERY - Juniper alarms on cr2-codfw is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm
[10:24:00] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 113, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:25:00] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2223.codfw.wmnet with reason: host reimage
[10:31:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1043 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P72062 and previous config saved to /var/cache/conftool/dbconfig/20250115-103149-root.json
[10:32:19] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2222.codfw.wmnet with reason: host reimage
[10:32:27] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1110813 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[10:35:10] <wikibugs>	 (03PS1) 10Jelto: sre.k8s.renumber-node: change default os to bookworm [cookbooks] - 10https://gerrit.wikimedia.org/r/1111588 (https://phabricator.wikimedia.org/T341984)
[10:36:20] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2220.codfw.wmnet with OS bookworm
[10:41:30] <wikibugs>	 (03CR) 10FNegri: [C:03+2] Revert "Block PAWS workers nodes from all UDP traffic other than DNS & NTP" [puppet] - 10https://gerrit.wikimedia.org/r/1105036 (https://phabricator.wikimedia.org/T383261) (owner: 10FNegri)
[10:42:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2221.codfw.wmnet with OS bookworm
[10:44:51] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2223.codfw.wmnet with OS bookworm
[10:46:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1043 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P72063 and previous config saved to /var/cache/conftool/dbconfig/20250115-104654-root.json
[10:52:50] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2222.codfw.wmnet with OS bookworm
[10:53:04] <icinga-wm>	 RECOVERY - BGP status on lsw1-d3-codfw.mgmt is OK: BGP OK - up: 26, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:59:48] <wikibugs>	 (03CR) 10Gkyziridis: "Thank you for working on this." [puppet] - 10https://gerrit.wikimedia.org/r/1109414 (owner: 10Klausman)
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250115T1100)
[11:00:17] <wikibugs>	 (03PS1) 10Marostegui: dbproxy2007.yaml: Replace m3 master [puppet] - 10https://gerrit.wikimedia.org/r/1111589 (https://phabricator.wikimedia.org/T373579)
[11:02:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1043 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P72064 and previous config saved to /var/cache/conftool/dbconfig/20250115-110159-root.json
[11:03:05] <wikibugs>	 (03CR) 10Marostegui: "root@cumin1002:~# host 10.192.31.6" [puppet] - 10https://gerrit.wikimedia.org/r/1111589 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[11:03:15] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] shellbox-video: scale down [deployment-charts] - 10https://gerrit.wikimedia.org/r/1109459 (https://phabricator.wikimedia.org/T383317) (owner: 10Hnowlan)
[11:04:15] <wikibugs>	 (03CR) 10Marostegui: "# db-mysql db2234 -e "show databases"" [puppet] - 10https://gerrit.wikimedia.org/r/1111589 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[11:08:38] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] dbproxy2007.yaml: Replace m3 master [puppet] - 10https://gerrit.wikimedia.org/r/1111589 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[11:09:31] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] dbproxy2007.yaml: Replace m3 master [puppet] - 10https://gerrit.wikimedia.org/r/1111589 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[11:12:51] <wikibugs>	 (03PS1) 10Marostegui: db2234: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1111590 (https://phabricator.wikimedia.org/T373579)
[11:13:38] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2234: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1111590 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[11:14:31] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+1] admin/data: Add user for Georgios Kyziridis (ML Team) [puppet] - 10https://gerrit.wikimedia.org/r/1109414 (owner: 10Klausman)
[11:15:14] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2220-2223].codfw.wmnet
[11:15:17] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2220-2223].codfw.wmnet
[11:16:26] <wikibugs>	 10ops-codfw, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T383764 (10Jelto) 03NEW
[11:17:02] <wikibugs>	 (03CR) 10FNegri: "Thanks for this patch! I left a couple comments inline." [alerts] - 10https://gerrit.wikimedia.org/r/1111338 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[11:17:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1043 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P72066 and previous config saved to /var/cache/conftool/dbconfig/20250115-111704-root.json
[11:18:38] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: move k8s-mlstaging to new port [puppet] - 10https://gerrit.wikimedia.org/r/1111593 (https://phabricator.wikimedia.org/T383223)
[11:19:20] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "They are URLs, but they are also paths to files in this repository. `wmfStaticParsePath` is for paths to files in `mediawiki/core`. I thin" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075211 (https://phabricator.wikimedia.org/T374997) (owner: 10Bartosz Dziewoński)
[11:20:09] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+1 C:03+2] "Done" [puppet] - 10https://gerrit.wikimedia.org/r/1108772 (https://phabricator.wikimedia.org/T371087) (owner: 10Filippo Giunchedi)
[11:21:52] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] hieradata: move k8s-mlstaging to new port [puppet] - 10https://gerrit.wikimedia.org/r/1111593 (https://phabricator.wikimedia.org/T383223) (owner: 10Filippo Giunchedi)
[11:23:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: "I'd like to go ahead with this, acme-chief is the sole user ATM, what do you think Valentin ?" [puppet] - 10https://gerrit.wikimedia.org/r/1074409 (https://phabricator.wikimedia.org/T375271) (owner: 10Filippo Giunchedi)
[11:24:34] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10461706 (10phaultfinder)
[11:25:43] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1074409 (https://phabricator.wikimedia.org/T375271) (owner: 10Filippo Giunchedi)
[11:30:30] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1013:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:31:38] <wikibugs>	 (03PS7) 10Bartosz Dziewoński: Replace favicon.php with static.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075211 (https://phabricator.wikimedia.org/T374997)
[11:32:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:04-1] "See inline, also thanks for this !" [alerts] - 10https://gerrit.wikimedia.org/r/1111328 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[11:32:10] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1043 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P72067 and previous config saved to /var/cache/conftool/dbconfig/20250115-113210-root.json
[11:34:50] <wikibugs>	 (03PS1) 10Jelto: Rename mw23[59|66|67|68] to wikikube-worker222[4-7] [puppet] - 10https://gerrit.wikimedia.org/r/1111597 (https://phabricator.wikimedia.org/T377877)
[11:38:05] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:04-1] "See inline, thank you for tackling this!" [alerts] - 10https://gerrit.wikimedia.org/r/1111338 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[11:38:35] <wikibugs>	 (03PS3) 10Filippo Giunchedi: uwsgi: remove icinga-based monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1074409 (https://phabricator.wikimedia.org/T375271)
[11:42:22] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Move Beta Cluster favicons to this repository [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111598
[11:46:00] <wikibugs>	 (03PS7) 10Cathal Mooney: Spicerack: find true physical int if server primary IP is on a bridge [software/spicerack] - 10https://gerrit.wikimedia.org/r/1109037 (https://phabricator.wikimedia.org/T383207)
[11:47:56] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:51:11] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2025-01-15-103159-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111601 (https://phabricator.wikimedia.org/T377966)
[11:53:04] <kart_>	 Quick deploying of cxserver, shouldn't take much time.
[11:53:42] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2025-01-15-103159-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111601 (https://phabricator.wikimedia.org/T377966) (owner: 10KartikMistry)
[11:54:45] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM, let's make sure to test it after deploy" [puppet] - 10https://gerrit.wikimedia.org/r/1091755 (https://phabricator.wikimedia.org/T379570) (owner: 10FNegri)
[11:54:50] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2025-01-15-103159-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111601 (https://phabricator.wikimedia.org/T377966) (owner: 10KartikMistry)
[11:56:00] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply
[11:56:23] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[11:57:15] <wikibugs>	 (03CR) 10Volans: [C:03+1] "Great, LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1109037 (https://phabricator.wikimedia.org/T383207) (owner: 10Cathal Mooney)
[11:57:56] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:58:14] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[11:58:43] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[11:59:10] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[11:59:43] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[11:59:49] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: WMF RIPE Atlas probe in Eqiad offline - https://phabricator.wikimedia.org/T382518#10461757 (10cmooney) >>! In T382518#10455949, @VRiley-WMF wrote: > This has been rebooted >  > @cmooney would you be able to check this when you have a chance?  Thanks for doing th...
[12:00:05] <jouncebot>	 mvolz: I, the Bot under the Fountain, call upon thee, The Deployer, to do Services – Citoid / Zotero deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250115T1200).
[12:02:09] <kart_>	 !log Updated cxserver to 2025-01-15-103159-production (T377966)
[12:02:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:13] <stashbot>	 T377966: Make cxserver Logstash logs readable and reliable - https://phabricator.wikimedia.org/T377966
[12:02:30] <wikibugs>	 (03PS1) 10Stevemunene: Add FactGrid to WDQS allowlist [puppet] - 10https://gerrit.wikimedia.org/r/1111605 (https://phabricator.wikimedia.org/T381649)
[12:02:32] <wikibugs>	 (03PS1) 10Stevemunene: Add api.finto.fi/sparql to Wikidata query service and WCQS whitelist [puppet] - 10https://gerrit.wikimedia.org/r/1111606 (https://phabricator.wikimedia.org/T378561)
[12:02:33] <wikibugs>	 (03PS1) 10Stevemunene: whitelist kg.kunsten.be on wikidata query service [puppet] - 10https://gerrit.wikimedia.org/r/1111607 (https://phabricator.wikimedia.org/T380984)
[12:05:29] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:06:34] <wikibugs>	 (03PS8) 10Cathal Mooney: Spicerack: find true physical int if server primary IP is on a bridge [software/spicerack] - 10https://gerrit.wikimedia.org/r/1109037 (https://phabricator.wikimedia.org/T383207)
[12:08:03] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm now" [puppet] - 10https://gerrit.wikimedia.org/r/1110813 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[12:09:09] <wikibugs>	 (03CR) 10Gergő Tisza: "Hm, you are right. Maybe something to do with how the entry point is accessed via a symlink under `docroot/`? Although in theory `__DIR__`" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075211 (https://phabricator.wikimedia.org/T374997) (owner: 10Bartosz Dziewoński)
[15:22:02] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2225.codfw.wmnet with OS bookworm
[15:22:42] <wikibugs>	 (03PS3) 10Muehlenhoff: Make maps-test2001 a bookworm maps master node (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1111634 (https://phabricator.wikimedia.org/T381565)
[15:23:36] <jelto>	 !log homer 'lsw1-d3-codfw*' commit 'T377877'
[15:23:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:23:40] <stashbot>	 T377877: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877
[15:24:24] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v9.1.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1111644 (owner: 10Volans)
[15:25:07] <wikibugs>	 (03Abandoned) 10Hnowlan: similar-users: make max queries per account configurable [deployment-charts] - 10https://gerrit.wikimedia.org/r/808923 (https://phabricator.wikimedia.org/T310646) (owner: 10Hnowlan)
[15:25:49] <jelto>	 !log homer 'lsw1-c6-codfw*' commit 'T377877'
[15:25:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:25] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow: define pod templates enabling creating Pods from a task [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111619 (https://phabricator.wikimedia.org/T383430) (owner: 10Brouberol)
[15:26:57] <jelto>	 !log homer 'cr*codfw*' commit 'T377877'
[15:27:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:34] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111634 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[15:27:45] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 96, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:28:52] <volans>	 !log uploaded spicerack_9.1.0 to apt.wikimedia.org bullseye-wikimedia
[15:28:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:37] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow: define pod templates enabling creating Pods from a task [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111619 (https://phabricator.wikimedia.org/T383430) (owner: 10Brouberol)
[15:30:05] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2192.codfw.wmnet with OS bookworm
[15:30:09] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2192
[15:30:09] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2192
[15:30:33] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2224-2227].codfw.wmnet
[15:30:36] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2224-2227].codfw.wmnet
[15:31:12] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T383764#10462745 (10Jelto)
[15:31:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job routinator in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:32:59] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[15:33:40] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[15:35:24] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10462767 (10Jhancock.wm) this is what they sent me   Steps on how to generate the SOS report:   The 'sos' package provides the sos report command, which is typically installed by...
[15:35:50] <wikibugs>	 (03PS1) 10Jelto: Rename mw235[4-7] to wikikube-worker22[28-31] [puppet] - 10https://gerrit.wikimedia.org/r/1111646 (https://phabricator.wikimedia.org/T377877)
[15:36:21] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove profile::java from maps hosts [puppet] - 10https://gerrit.wikimedia.org/r/1111647 (https://phabricator.wikimedia.org/T381565)
[15:36:53] <wikibugs>	 (03PS1) 10Kamila Součková: kubernetes: rename mw142[1-5] -> kubernetes-worker110[2-6] [puppet] - 10https://gerrit.wikimedia.org/r/1111648 (https://phabricator.wikimedia.org/T377876)
[15:36:54] <moritzm>	 !log installing python-django security updates
[15:36:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:13] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm 👍" [puppet] - 10https://gerrit.wikimedia.org/r/1111648 (https://phabricator.wikimedia.org/T377876) (owner: 10Kamila Součková)
[15:40:50] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111647 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[15:40:53] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw[1421-1425].eqiad.wmnet
[15:41:05] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+2] kubernetes: rename mw142[1-5] -> kubernetes-worker110[2-6] [puppet] - 10https://gerrit.wikimedia.org/r/1111648 (https://phabricator.wikimedia.org/T377876) (owner: 10Kamila Součková)
[15:41:55] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "I was confused at first, but I see that profile::java it is not even imported in the maps role:" [puppet] - 10https://gerrit.wikimedia.org/r/1111647 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[15:43:19] <logmsgbot>	 !log root@cumin1002 START - Cookbook sre.puppet.renew-cert for dbprov1004.eqiad.wmnet: Renew puppet certificate - root@cumin1002
[15:43:41] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw[1421-1425].eqiad.wmnet
[15:46:07] <logmsgbot>	 !log root@cumin1002 END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for dbprov1004.eqiad.wmnet: Renew puppet certificate - root@cumin1002
[15:46:07] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1421 to wikikube-worker1102
[15:46:17] <logmsgbot>	 !log root@cumin1002 START - Cookbook sre.puppet.renew-cert for dbprov1005.eqiad.wmnet: Renew puppet certificate - root@cumin1002
[15:46:27] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[15:46:34] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Remove db2130 [puppet] - 10https://gerrit.wikimedia.org/r/1111650 (https://phabricator.wikimedia.org/T383766)
[15:46:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job routinator in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:47:18] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.decommission for hosts db2130.codfw.wmnet
[15:47:29] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Remove db2130 [puppet] - 10https://gerrit.wikimedia.org/r/1111650 (https://phabricator.wikimedia.org/T383766) (owner: 10Marostegui)
[15:48:53] <logmsgbot>	 !log root@cumin1002 END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for dbprov1005.eqiad.wmnet: Renew puppet certificate - root@cumin1002
[15:49:57] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv
[15:49:57] <icinga-wm>	 e - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:50:04] <kamila_>	 marostegui: caught your db2130 removal in netbox cookbook, proceeding
[15:50:07] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv
[15:50:07] <icinga-wm>	 e - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:50:22] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2192.codfw.wmnet with reason: host reimage
[15:51:08] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1421 to wikikube-worker1102 - kamila@cumin1002"
[15:51:46] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1421 to wikikube-worker1102 - kamila@cumin1002"
[15:51:46] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:51:46] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1102
[15:51:55] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.dns.netbox
[15:52:25] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1422 to wikikube-worker1103
[15:53:11] <moritzm>	 !log installing libsoup2.4 security updates
[15:53:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:53:17] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1102
[15:53:56] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1421 to wikikube-worker1102
[15:54:04] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2192.codfw.wmnet with reason: host reimage
[15:54:13] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:54:14] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2130.codfw.wmnet
[15:54:19] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[15:56:33] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10462908 (10MatthewVernon) Hi, yes, those are Red-Hat specific instructions. On Debian & Ubuntu one has to install the sosreport package. Unfortunately, the root filesystem is no...
[15:56:43] <wikibugs>	 (03PS10) 10Tiziano Fogli: thanos-rule: manage retention setting [puppet] - 10https://gerrit.wikimedia.org/r/1111599 (https://phabricator.wikimedia.org/T352756)
[15:57:11] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1423 to wikikube-worker1104
[15:58:06] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1422 to wikikube-worker1103 - kamila@cumin1002"
[15:58:27] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1422 to wikikube-worker1103 - kamila@cumin1002"
[15:58:27] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:58:27] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1103
[15:58:46] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[15:59:04] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db2130.codfw.wmnet - https://phabricator.wikimedia.org/T383766#10462916 (10Marostegui) a:05Marostegui→03None
[15:59:40] <jinxer-wm>	 FIRING: [2x] KubernetesRsyslogDown: rsyslog on mw1424:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[15:59:42] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1103
[15:59:51] <logmsgbot>	 !log jynus@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1006.eqiad.wmnet with reason: os upgrade
[16:00:06] <logmsgbot>	 !log jynus@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1006.eqiad.wmnet with reason: os upgrade
[16:00:21] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1422 to wikikube-worker1103
[16:00:33] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
[16:00:39] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10462926 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudcephosd2004-dev.codfw.wmnet with OS bullsey...
[16:01:08] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db2130.codfw.wmnet - https://phabricator.wikimedia.org/T383766#10462928 (10Marostegui) This is ready for #dc-ops
[16:01:40] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1424 to wikikube-worker1105
[16:02:17] <wikibugs>	 (03CR) 10Tiziano Fogli: "Thank you for the hints. Have a look at the comments to see if they're clear enough." [puppet] - 10https://gerrit.wikimedia.org/r/1111599 (https://phabricator.wikimedia.org/T352756) (owner: 10Tiziano Fogli)
[16:02:29] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1423 to wikikube-worker1104 - kamila@cumin1002"
[16:02:49] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1423 to wikikube-worker1104 - kamila@cumin1002"
[16:02:49] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:02:49] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1104
[16:02:51] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[16:04:06] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1104
[16:04:45] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1423 to wikikube-worker1104
[16:05:08] <wikibugs>	 (03CR) 10Volans: "recheck" [cookbooks] - 10https://gerrit.wikimedia.org/r/1111236 (owner: 10Ssingh)
[16:05:29] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:06:57] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1424 to wikikube-worker1105 - kamila@cumin1002"
[16:07:01] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1424 to wikikube-worker1105 - kamila@cumin1002"
[16:07:01] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:07:02] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1105
[16:08:27] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1105
[16:08:41] <wikibugs>	 (03PS1) 10Marostegui: rebuild_tables.sh: Add start and finish time [software] - 10https://gerrit.wikimedia.org/r/1111653 (https://phabricator.wikimedia.org/T382842)
[16:08:47] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1425 to wikikube-worker1106
[16:09:06] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1424 to wikikube-worker1105
[16:09:07] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[16:09:53] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] rebuild_tables.sh: Add start and finish time [software] - 10https://gerrit.wikimedia.org/r/1111653 (https://phabricator.wikimedia.org/T382842) (owner: 10Marostegui)
[16:10:22] <wikibugs>	 (03Merged) 10jenkins-bot: rebuild_tables.sh: Add start and finish time [software] - 10https://gerrit.wikimedia.org/r/1111653 (https://phabricator.wikimedia.org/T382842) (owner: 10Marostegui)
[16:11:16] <wikibugs>	 (03CR) 10Muehlenhoff: "Yeah, this is just a leftover Hiera config I noticed when preparing a separate maps_bookworm role" [puppet] - 10https://gerrit.wikimedia.org/r/1111647 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[16:11:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove profile::java from maps hosts [puppet] - 10https://gerrit.wikimedia.org/r/1111647 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[16:11:53] <wikibugs>	 (03CR) 10Kevin Bazira: "thank you for working on this, Georgios. we shall proceed with this patch after we've fixed the issue of CI/CD not building the new predic" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111580 (https://phabricator.wikimedia.org/T383312) (owner: 10Gkyziridis)
[16:13:50] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1425 to wikikube-worker1106 - kamila@cumin1002"
[16:14:22] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1425 to wikikube-worker1106 - kamila@cumin1002"
[16:14:22] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:14:23] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1106
[16:14:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jelto@cumin1002"
[16:15:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jelto@cumin1002"
[16:15:11] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2192.codfw.wmnet with OS bookworm
[16:16:04] <logmsgbot>	 !log volans@cumin2002 START - Cookbook sre.dns.netbox
[16:16:22] <logmsgbot>	 !log volans@cumin2002 END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
[16:16:54] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1106
[16:17:25] <jelto>	 !log homer 'lsw1-d8-codfw*' commit 'T377877'
[16:17:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:17:28] <stashbot>	 T377877: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877
[16:17:33] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1425 to wikikube-worker1106
[16:17:39] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1102.eqiad.wmnet wikikube-worker1103.eqiad.wmnet wikikube-worker1104.eqiad.wmnet wikikube-worker1105.eqiad.wmnet wikikube-worker1106.eqiad.wmnet on all recursors
[16:17:42] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1102.eqiad.wmnet wikikube-worker1103.eqiad.wmnet wikikube-worker1104.eqiad.wmnet wikikube-worker1105.eqiad.wmnet wikikube-worker1106.eqiad.wmnet on all recursors
[16:18:35] <wikibugs>	 (03PS1) 10Volans: tests: accept unowned as a valid owner [cookbooks] - 10https://gerrit.wikimedia.org/r/1111655
[16:19:02] <wikibugs>	 (03CR) 10Volans: [C:03+2] "merging to unblock cookbook patches" [cookbooks] - 10https://gerrit.wikimedia.org/r/1111655 (owner: 10Volans)
[16:20:01] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2192.codfw.wmnet
[16:20:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2192.codfw.wmnet
[16:20:41] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1102.eqiad.wmnet with OS bookworm
[16:20:44] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1102
[16:20:44] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1102
[16:20:51] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1103.eqiad.wmnet with OS bookworm
[16:20:54] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1103
[16:20:54] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1103
[16:20:58] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1104.eqiad.wmnet with OS bookworm
[16:21:01] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1104
[16:21:01] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1104
[16:21:03] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1105.eqiad.wmnet with OS bookworm
[16:21:06] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1105
[16:21:06] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1105
[16:21:08] <wikibugs>	 (03PS1) 10Jcrespo: dbbackups: Migrate db2139 backup generation to db2239 [puppet] - 10https://gerrit.wikimedia.org/r/1111656 (https://phabricator.wikimedia.org/T373579)
[16:21:10] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1106.eqiad.wmnet with OS bookworm
[16:21:13] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1106
[16:21:14] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1106
[16:22:16] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 07Kubernetes: hw troubleshooting: Comm Error: backplane 0 for wikikube-worker2192.codfw.wmnet - https://phabricator.wikimedia.org/T383339#10463043 (10Jelto) Thanks @Jhancock.wm for handling this hardware issue. The host is up and a reimage was successful.   I added the hos...
[16:23:24] <wikibugs>	 (03PS2) 10Jcrespo: dbbackups: Migrate db2139 backup generation to db2239 [puppet] - 10https://gerrit.wikimedia.org/r/1111656 (https://phabricator.wikimedia.org/T373579)
[16:24:20] <wikibugs>	 (03CR) 10Jcrespo: [C:04-1] "Do not merge yet, data is not ready (rebuilding tables)." [puppet] - 10https://gerrit.wikimedia.org/r/1111656 (https://phabricator.wikimedia.org/T373579) (owner: 10Jcrespo)
[16:24:33] <wikibugs>	 (03PS3) 10Ssingh: sre.dns.admin: update show to use CookbookInitSuccess [cookbooks] - 10https://gerrit.wikimedia.org/r/1111236
[16:24:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10463065 (10phaultfinder)
[16:27:24] <wikibugs>	 (03PS1) 10Muehlenhoff: Add separate maps master/replica roles for the new Bookworm setup (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1111659 (https://phabricator.wikimedia.org/T381565)
[16:27:45] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add separate maps master/replica roles for the new Bookworm setup (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1111659 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[16:29:39] <wikibugs>	 (03CR) 10Herron: [C:03+1] "Appreciate them thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1111599 (https://phabricator.wikimedia.org/T352756) (owner: 10Tiziano Fogli)
[16:31:08] <wikibugs>	 (03PS1) 10Volans: sre.hosts.downtime: skip START log to SAL [cookbooks] - 10https://gerrit.wikimedia.org/r/1111660
[16:31:38] <wikibugs>	 (03PS2) 10Muehlenhoff: Add separate maps master/replica roles for the new Bookworm setup (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1111659 (https://phabricator.wikimedia.org/T381565)
[16:32:25] <wikibugs>	 (03CR) 10Volans: "Using this cookbook as beta-tester for this new feature as it's almost always a quick one." [cookbooks] - 10https://gerrit.wikimedia.org/r/1111660 (owner: 10Volans)
[16:33:51] <wikibugs>	 10ops-eqiad, 06cloud-services-team, 06DC-Ops: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10463105 (10Andrew) DC people, is this anything?  This same alert has popped up a few times in the last few days.
[16:33:51] <wikibugs>	 (03CR) 10Volans: [C:03+1] "Sukhbir, I'll ping you once the new release is deployed to all cumin hosts and this can be merged." [cookbooks] - 10https://gerrit.wikimedia.org/r/1111236 (owner: 10Ssingh)
[16:33:55] <wikibugs>	 (03PS3) 10Muehlenhoff: Add separate maps master/replica roles for the new Bookworm setup [puppet] - 10https://gerrit.wikimedia.org/r/1111659 (https://phabricator.wikimedia.org/T381565)
[16:34:12] <wikibugs>	 (03CR) 10Ssingh: "No worries and thanks!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1111236 (owner: 10Ssingh)
[16:34:50] <wikibugs>	 (03CR) 10Elukey: [C:03+1] sre.hosts.downtime: skip START log to SAL [cookbooks] - 10https://gerrit.wikimedia.org/r/1111660 (owner: 10Volans)
[16:36:42] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1104.eqiad.wmnet with reason: host reimage
[16:36:52] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1105.eqiad.wmnet with reason: host reimage
[16:39:41] <wikibugs>	 (03CR) 10DLynch: "I mean, I imagine I would use it, though I'd need to go educate myself a bit first." [puppet] - 10https://gerrit.wikimedia.org/r/1110867 (owner: 10CDanis)
[16:40:34] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1104.eqiad.wmnet with reason: host reimage
[16:41:43] <wikibugs>	 (03PS3) 10Anzx: Add dso and thq to wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111661 (https://phabricator.wikimedia.org/T383785)
[16:43:29] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1105.eqiad.wmnet with reason: host reimage
[16:44:42] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10cloud-services-team (Hardware): Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10463156 (10fnegri)
[16:52:33] <logmsgbot>	 !log kamila@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1102.eqiad.wmnet with OS bookworm
[16:52:56] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1102.eqiad.wmnet with OS bookworm
[16:52:59] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1102
[16:52:59] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1102
[16:53:04] <logmsgbot>	 !log kamila@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1103.eqiad.wmnet with OS bookworm
[16:53:18] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1103.eqiad.wmnet with OS bookworm
[16:53:20] <logmsgbot>	 !log kamila@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1106.eqiad.wmnet with OS bookworm
[16:53:21] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1103
[16:53:22] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1103
[16:53:34] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1106.eqiad.wmnet with OS bookworm
[16:53:38] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1106
[16:53:38] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1106
[16:54:31] <wikibugs>	 (03CR) 10Eevans: [C:03+2] cassandra: set target_dev to 4.x (no-op) [puppet] - 10https://gerrit.wikimedia.org/r/1109768 (https://phabricator.wikimedia.org/T380420) (owner: 10Eevans)
[16:54:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P72074 and previous config saved to /var/cache/conftool/dbconfig/20250115-165434-root.json
[16:54:48] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111350 (https://phabricator.wikimedia.org/T380846) (owner: 10Chlod Alejandro)
[16:55:28] <wikibugs>	 (03PS1) 10Marostegui: db1163: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1111664
[16:56:24] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[16:57:24] <wikibugs>	 (03CR) 10Jsn.sherman: [C:03+1] Increase Nuke max age to 90 days (attempt 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111350 (https://phabricator.wikimedia.org/T380846) (owner: 10Chlod Alejandro)
[16:58:33] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Upgrading to Cassandra 4.1.7 — T380420 - eevans@cumin1002
[16:58:37] <stashbot>	 T380420: Upgrade Cassandra clusters to v4.1.7 - https://phabricator.wikimedia.org/T380420
[16:58:58] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1163: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1111664 (owner: 10Marostegui)
[16:59:39] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1104.eqiad.wmnet with OS bookworm
[17:01:59] <logmsgbot>	 !log kamila@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1103.eqiad.wmnet with OS bookworm
[17:02:07] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1105.eqiad.wmnet with OS bookworm
[17:02:17] <logmsgbot>	 !log kamila@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1106.eqiad.wmnet with OS bookworm
[17:02:36] <logmsgbot>	 !log kamila@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1102.eqiad.wmnet with OS bookworm
[17:04:48] <logmsgbot>	 !log andrew@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[17:05:09] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1105.eqiad.wmnet with OS bookworm
[17:05:21] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1106.eqiad.wmnet with OS bookworm
[17:05:25] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1106
[17:05:25] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1106
[17:05:35] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1105
[17:05:35] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1105
[17:05:39] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1102.eqiad.wmnet with OS bookworm
[17:05:42] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1102
[17:05:42] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1102
[17:05:53] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[17:06:22] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1103.eqiad.wmnet with OS bookworm
[17:06:25] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1103
[17:06:25] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1103
[17:06:48] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1111660 (owner: 10Volans)
[17:09:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P72075 and previous config saved to /var/cache/conftool/dbconfig/20250115-170940-root.json
[17:12:13] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] Rename mw235[4-7] to wikikube-worker22[28-31] [puppet] - 10https://gerrit.wikimedia.org/r/1111646 (https://phabricator.wikimedia.org/T377877) (owner: 10Jelto)
[17:18:44] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Upgrading to Cassandra 4.1.7 — T380420 - eevans@cumin1002
[17:18:48] <stashbot>	 T380420: Upgrade Cassandra clusters to v4.1.7 - https://phabricator.wikimedia.org/T380420
[17:21:07] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1106.eqiad.wmnet with reason: host reimage
[17:21:34] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1102.eqiad.wmnet with reason: host reimage
[17:21:54] <logmsgbot>	 !log root@cumin1002 START - Cookbook sre.puppet.renew-cert for dbprov1006.eqiad.wmnet: Renew puppet certificate - root@cumin1002
[17:21:57] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1105.eqiad.wmnet with reason: host reimage
[17:22:09] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1103.eqiad.wmnet with reason: host reimage
[17:23:29] <wikibugs>	 (03PS1) 10Hnowlan: wikikube: reimage 5 former jobrunner/videoscaler hosts to workers [puppet] - 10https://gerrit.wikimedia.org/r/1111670 (https://phabricator.wikimedia.org/T354791)
[17:24:21] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1106.eqiad.wmnet with reason: host reimage
[17:24:44] <hnowlan>	 !log running `decommission` for 5 codfw jobrunners
[17:24:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:24:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P72076 and previous config saved to /var/cache/conftool/dbconfig/20250115-172445-root.json
[17:25:10] <logmsgbot>	 !log root@cumin1002 END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for dbprov1006.eqiad.wmnet: Renew puppet certificate - root@cumin1002
[17:26:15] <logmsgbot>	 !log andrew@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[17:26:21] <wikibugs>	 (03PS2) 10Hnowlan: wikikube: reimage 5 former jobrunner/videoscaler hosts to workers [puppet] - 10https://gerrit.wikimedia.org/r/1111670 (https://phabricator.wikimedia.org/T354791)
[17:26:34] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[17:26:49] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1105.eqiad.wmnet with reason: host reimage
[17:30:10] <fabfur>	 Lucas_WMDE sorry I scheduled the deploy for https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1111166 but was sick today, so I wasn't present for the deployment
[17:30:19] <fabfur>	 I'll re-schedule thanks
[17:30:32] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1102.eqiad.wmnet with reason: host reimage
[17:30:33] <Lucas_WMDE>	 ok, I hope you’ll get better!
[17:30:38] <Lucas_WMDE>	 jouncebot: now
[17:30:38] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 29 minute(s)
[17:31:06] <Lucas_WMDE>	 (I’d also be up for deploying it now if that’s okay with everyone else)
[17:33:13] <fabfur>	 if you're ok we can do now
[17:33:22] <fabfur>	 should I reschedule? 
[17:34:21] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1103.eqiad.wmnet with reason: host reimage
[17:36:36] <logmsgbot>	 !log andrew@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[17:39:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10463552 (10phaultfinder)
[17:39:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P72077 and previous config saved to /var/cache/conftool/dbconfig/20250115-173951-root.json
[17:41:31] <Lucas_WMDE>	 sorry, I didn’t look at the channel for a few minutes
[17:41:32] <Lucas_WMDE>	 jouncebot: next
[17:41:33] <jouncebot>	 In 0 hour(s) and 18 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250115T1800)
[17:42:12] <Lucas_WMDE>	 let’s try to get it in now
[17:42:31] <fabfur>	 ok
[17:42:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111166 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[17:42:48] <Lucas_WMDE>	 will there be anything to test on WikimediaDebug for this change?
[17:42:59] <Lucas_WMDE>	 (I don’t remember if these new stream configs are usually testable or not)
[17:43:16] <fabfur>	 mmm don't know, but I can easily test with a curl on a mwdebug
[17:43:18] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1106.eqiad.wmnet with OS bookworm
[17:44:33] <Lucas_WMDE>	 ok!
[17:44:40] <Lucas_WMDE>	 while we still have the bare-metal mwdebugs ;)
[17:44:49] <fabfur>	 do I need to do something other than testing? 
[17:44:58] <fabfur>	 sorry I don't usually deploy these kind of changes
[17:45:02] <Lucas_WMDE>	 no, I’ll let you know when you can test
[17:45:08] <wikibugs>	 (03Merged) 10jenkins-bot: Added new stream config for haproxy_requestctl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111166 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[17:45:08] <Lucas_WMDE>	 I just wanted to check in advance
[17:45:16] <fabfur>	 ack! 
[17:45:20] <Lucas_WMDE>	 since the timing will probably be pretty tight ^^
[17:45:40] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1111166|Added new stream config for haproxy_requestctl (T383392)]]
[17:45:43] <stashbot>	 T383392: Define a schema for analytics pipeline ingestion - https://phabricator.wikimedia.org/T383392
[17:47:40] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1105.eqiad.wmnet with OS bookworm
[17:49:09] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1102.eqiad.wmnet with OS bookworm
[17:51:07] <kamila_>	 !log homer cr*eqiad* commit T377876
[17:51:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:51:10] <stashbot>	 T377876: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876
[17:52:00] <fabfur>	 Lucas_WMDE I would say it's working!
[17:52:06] <fabfur>	 thanks a lot!
[17:52:12] <Lucas_WMDE>	 it’s not quite ready for testing yet, in theory :P
[17:52:19] <Lucas_WMDE>	 but yeah ok it got synced already
[17:52:23] <Lucas_WMDE>	 scap is just still testing it
[17:52:23] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1103.eqiad.wmnet with OS bookworm
[17:52:28] <fabfur>	 ack
[17:52:43] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, fabfur: Backport for [[gerrit:1111166|Added new stream config for haproxy_requestctl (T383392)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[17:52:45] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, fabfur: Continuing with sync
[17:52:46] <stashbot>	 T383392: Define a schema for analytics pipeline ingestion - https://phabricator.wikimedia.org/T383392
[17:53:11] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1102-1106].eqiad.wmnet
[17:53:13] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1102-1106].eqiad.wmnet
[17:53:38] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.dhcp for host cloudcephosd1012.eqiad.wmnet
[17:54:15] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T383620#10463696 (10kamila)
[17:54:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P72079 and previous config saved to /var/cache/conftool/dbconfig/20250115-175456-root.json
[17:55:22] <wikibugs>	 (03CR) 10Volans: [C:03+2] sre.hosts.downtime: skip START log to SAL [cookbooks] - 10https://gerrit.wikimedia.org/r/1111660 (owner: 10Volans)
[17:56:26] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudcephosd1012.eqiad.wmnet
[17:58:18] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[17:59:18] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1111166|Added new stream config for haproxy_requestctl (T383392)]] (duration: 13m 38s)
[17:59:21] <stashbot>	 T383392: Define a schema for analytics pipeline ingestion - https://phabricator.wikimedia.org/T383392
[17:59:37] <fabfur>	 thanks again Lucas_WMDE 
[17:59:50] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] sre.dns.admin: update show to use CookbookInitSuccess [cookbooks] - 10https://gerrit.wikimedia.org/r/1111236 (owner: 10Ssingh)
[17:59:57] <Lucas_WMDE>	 np :)
[18:00:01] <Lucas_WMDE>	 just in time :D
[18:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250115T1800)
[18:00:15] * Lucas_WMDE done deploying
[18:00:23] <fabfur>	 :) 
[18:02:33] <wikibugs>	 (03Merged) 10jenkins-bot: sre.hosts.downtime: skip START log to SAL [cookbooks] - 10https://gerrit.wikimedia.org/r/1111660 (owner: 10Volans)
[18:04:28] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to releasers-mediawiki for MSantos - https://phabricator.wikimedia.org/T382616#10463805 (10Dzahn)
[18:05:30] <wikibugs>	 06SRE, 10SRE-tools, 06Infrastructure-Foundations: exception raised for "sre.dns.admin show" - https://phabricator.wikimedia.org/T378039#10463812 (10ssingh) 05Open→03Resolved a:03ssingh This has now been fixed, thanks to @Volans!  ` sukhe@cumin1002:~$ sudo cookbook sre.dns.admin show => CURRENT STAT...
[18:05:32] <logmsgbot>	 !log volans@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1001.eqiad.wmnet with reason: testing cookbook
[18:06:02] <logmsgbot>	 !log volans@cumin2002 START - Cookbook sre.hosts.downtime for 0:05:00 on sretest1002.eqiad.wmnet with reason: testing cookbook
[18:08:08] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:08:25] <wikibugs>	 (03CR) 10Dzahn: Add myself to releasers-mediawiki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1105952 (https://phabricator.wikimedia.org/T382616) (owner: 10MSantos)
[18:09:48] <logmsgbot>	 !log volans@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1002.eqiad.wmnet with reason: testing cookbook
[18:10:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: httpbb_kubernetes_mw-api-int_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:12:12] <icinga-wm>	 PROBLEM - Disk space on ms-be2075 is CRITICAL: DISK CRITICAL - /srv/swift-storage/objects10 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ms-be2075&var-datasource=codfw+prometheus/ops
[18:12:40] <wikibugs>	 (03PS2) 10Andrea Denisse: wmcs: Migrate iowait stalling alerts to the alerts.git repository [alerts] - 10https://gerrit.wikimedia.org/r/1111338 (https://phabricator.wikimedia.org/T328502)
[18:13:18] <wikibugs>	 (03CR) 10Andrea Denisse: "Thanks for your review, I sent a new patch." [alerts] - 10https://gerrit.wikimedia.org/r/1111338 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[18:13:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] wmcs: Migrate iowait stalling alerts to the alerts.git repository [alerts] - 10https://gerrit.wikimedia.org/r/1111338 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[18:15:03] <wikibugs>	 (03PS3) 10Andrea Denisse: wmcs: Migrate iowait stalling alerts to the alerts.git repository [alerts] - 10https://gerrit.wikimedia.org/r/1111338 (https://phabricator.wikimedia.org/T328502)
[18:16:20] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[18:16:23] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[18:16:30] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1163 (T371742)', diff saved to https://phabricator.wikimedia.org/P72080 and previous config saved to /var/cache/conftool/dbconfig/20250115-181629-ladsgroup.json
[18:16:33] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[18:19:08] <wikibugs>	 (03PS1) 10Volans: sre.network.peering: use CookbookInitSuccess [cookbooks] - 10https://gerrit.wikimedia.org/r/1111677
[18:20:06] <wikibugs>	 (03PS1) 10Ssingh: dns.admin: show descriptive text before calling admin_state [cookbooks] - 10https://gerrit.wikimedia.org/r/1111678
[18:21:47] <wikibugs>	 (03PS4) 10Andrea Denisse: wmcs: Migrate iowait stalling alerts to the alerts.git repository [alerts] - 10https://gerrit.wikimedia.org/r/1111338 (https://phabricator.wikimedia.org/T328502)
[18:22:33] <logmsgbot>	 !log andrew@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[18:22:50] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[18:24:05] <wikibugs>	 (03PS2) 10Dzahn: Add myself to releasers-mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/1105952 (https://phabricator.wikimedia.org/T382616) (owner: 10MSantos)
[18:24:42] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] Add myself to releasers-mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/1105952 (https://phabricator.wikimedia.org/T382616) (owner: 10MSantos)
[18:24:57] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "has approval from manager and group owner, amended, merging" [puppet] - 10https://gerrit.wikimedia.org/r/1105952 (https://phabricator.wikimedia.org/T382616) (owner: 10MSantos)
[18:27:27] <wikibugs>	 (03PS1) 10BPirkle: RevisionStore: No first revision of non-existing page [core] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111680 (https://phabricator.wikimedia.org/T380677)
[18:27:54] <wikibugs>	 (03PS1) 10CDanis: urldownloader: squid_exporter [puppet] - 10https://gerrit.wikimedia.org/r/1111681
[18:28:19] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations: Add an ownership field to cookbooks. - https://phabricator.wikimedia.org/T379258#10463938 (10Volans) 05Open→03Resolved This feature is now live and cookbook ownership can be clearly seen when listing cookbooks (`cookbook -l` or `cookbook -lv`) and at the botto...
[18:29:54] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Spicerack: allow cookbooks to abort execution from __init__ - https://phabricator.wikimedia.org/T365454#10463940 (10Volans) 05Open→03Resolved This is now live, see the related documentation in https://doc.wikimedia.org/spicerack/master/api/spice...
[18:30:24] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to releasers-mediawiki for MSantos - https://phabricator.wikimedia.org/T382616#10463942 (10Dzahn) 05In progress→03Resolved a:05Bmueller→03Dzahn @MSantos After this now had both needed approvals I took the liberty to slightly amen...
[18:30:32] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Spicerack: don't IRC log start/stop of cookbook - https://phabricator.wikimedia.org/T324655#10463945 (10Volans) 05Open→03Resolved This is now live, see the related documentation in https://doc.wikimedia.org/spicerack/master/api/spicerack.cookboo...
[18:32:05] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111681 (owner: 10CDanis)
[18:32:16] <wikibugs>	 (03CR) 10Majavah: "The external services that MW talks to (hopefully) use HTTPS, so Squid only sees a CONNECT request and then only the encrypted version of " [puppet] - 10https://gerrit.wikimedia.org/r/1111265 (https://phabricator.wikimedia.org/T340552) (owner: 10CDanis)
[18:33:27] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Spicerack fails to find host physical interface for ganeti nodes - https://phabricator.wikimedia.org/T383207#10463971 (10Volans) 05Open→03Resolved a:03Volans This is now live and works as expected: ` >>> spicerack.netbox_server("ganeti2027").access_vlan 'private1-...
[18:34:59] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM if you prefer this format for the output :)" [cookbooks] - 10https://gerrit.wikimedia.org/r/1111678 (owner: 10Ssingh)
[18:35:08] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] ci: Install memcached for MediaWiki success cache [puppet] - 10https://gerrit.wikimedia.org/r/1111295 (https://phabricator.wikimedia.org/T383243) (owner: 10Dduvall)
[18:35:44] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - No response from remote host 208.80.154.197 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[18:36:04] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "it seemed pointless to check a puppet run on the relevant VM in cloud since it was already cherry-picked anyways" [puppet] - 10https://gerrit.wikimedia.org/r/1111295 (https://phabricator.wikimedia.org/T383243) (owner: 10Dduvall)
[18:37:40] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] dns.admin: show descriptive text before calling admin_state [cookbooks] - 10https://gerrit.wikimedia.org/r/1111678 (owner: 10Ssingh)
[18:38:08] <wikibugs>	 (03CR) 10Dzahn: [C:04-1] "after double checking the yaml that is created for the blackbox checks I am now voting against this and would say we should keep it as is" [puppet] - 10https://gerrit.wikimedia.org/r/1108112 (https://phabricator.wikimedia.org/T382964) (owner: 10AOkoth)
[18:39:11] <wikibugs>	 (03CR) 10CDanis: "Yes, of course you are right, most of the traffic is indeed CONNECT -- although there is some plaintext as well, especially to archive.org" [puppet] - 10https://gerrit.wikimedia.org/r/1111265 (https://phabricator.wikimedia.org/T340552) (owner: 10CDanis)
[18:40:24] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] gerrit: restore IP addresses in ssh_known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828) (owner: 10Hashar)
[18:43:27] <wikibugs>	 (03Merged) 10jenkins-bot: dns.admin: show descriptive text before calling admin_state [cookbooks] - 10https://gerrit.wikimedia.org/r/1111678 (owner: 10Ssingh)
[18:45:05] <wikibugs>	 (03PS2) 10CDanis: urldownloader: squid_exporter [puppet] - 10https://gerrit.wikimedia.org/r/1111681
[18:45:20] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111681 (owner: 10CDanis)
[18:47:07] <wikibugs>	 (03PS1) 10Ssingh: Revert "dns.admin: show descriptive text before calling admin_state" [cookbooks] - 10https://gerrit.wikimedia.org/r/1111684
[18:49:50] <wikibugs>	 (03Abandoned) 10Ssingh: Revert "dns.admin: show descriptive text before calling admin_state" [cookbooks] - 10https://gerrit.wikimedia.org/r/1111684 (owner: 10Ssingh)
[18:49:55] <wikibugs>	 (03PS1) 10Ssingh: dns.admin: clarify show being called and dump admin_state [cookbooks] - 10https://gerrit.wikimedia.org/r/1111685
[18:50:19] <logmsgbot>	 !log andrew@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[18:51:22] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[18:53:48] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Tracking List: Relocating servers to free up 10G switch space in codfw - https://phabricator.wikimedia.org/T383709#10463997 (10Jhancock.wm)
[18:56:57] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] dns.admin: clarify show being called and dump admin_state [cookbooks] - 10https://gerrit.wikimedia.org/r/1111685 (owner: 10Ssingh)
[18:57:38] <wikibugs>	 (03CR) 10Majavah: "Huh, I shouldn't have assumed practically everything is encrypted these days then. That approach sounds fine to me." [puppet] - 10https://gerrit.wikimedia.org/r/1111265 (https://phabricator.wikimedia.org/T340552) (owner: 10CDanis)
[19:00:00] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T383764#10463999 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm
[19:00:05] <jouncebot>	 brennen and dduvall: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki train - Utc-7 Version . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250115T1900).
[19:00:09] <brennen>	 o/
[19:01:11] <brennen>	 just noticing a blocker that i somehow missed earlier.
[19:01:36] <brennen>	 doing a backport first here.
[19:02:48] <wikibugs>	 (03Merged) 10jenkins-bot: dns.admin: clarify show being called and dump admin_state [cookbooks] - 10https://gerrit.wikimedia.org/r/1111685 (owner: 10Ssingh)
[19:03:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by brennen@deploy2002 using scap backport" [core] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111680 (https://phabricator.wikimedia.org/T380677) (owner: 10BPirkle)
[19:03:50] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10464019 (10Jhancock.wm) thanks for the update and clarification. I've updated the ticket with the added info. maybe they'll quit stalling
[19:04:48] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "gerrit1003.. Ssh::Client/File[/etc/ssh/ssh_known_hosts]/content: content changed  .." [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828) (owner: 10Hashar)
[19:05:50] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:07:47] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.dns.admin DNS admin: depool site ulsfo [reason: this is a test, not actual depool, no task ID specified]
[19:07:54] <logmsgbot>	 !log sukhe@cumin1002 END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: depool site ulsfo [reason: this is a test, not actual depool, no task ID specified]
[19:08:04] <logmsgbot>	 !log andrew@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[19:08:08] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:08:17] <wikibugs>	 (03PS3) 10CDanis: urldownloader: squid_exporter monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1111681
[19:08:17] <wikibugs>	 (03PS2) 10CDanis: urldownloader: scrub outbound privacy-sensitive hdrs [puppet] - 10https://gerrit.wikimedia.org/r/1111265 (https://phabricator.wikimedia.org/T340552)
[19:08:31] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[19:10:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: httpbb_kubernetes_mw-api-int_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:14:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10464034 (10phaultfinder)
[19:14:44] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "ship it?:)" [puppet] - 10https://gerrit.wikimedia.org/r/1077466 (https://phabricator.wikimedia.org/T332220) (owner: 10BCornwall)
[19:15:22] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111688
[19:15:22] <logmsgbot>	 !log andrew@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[19:15:41] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[19:20:17] <logmsgbot>	 !log aqu@deploy2002 Started deploy [airflow-dags/analytics@cd03eb7]: Cascading backfill under projectview hourly
[19:21:24] <logmsgbot>	 !log aqu@deploy2002 Finished deploy [airflow-dags/analytics@cd03eb7]: Cascading backfill under projectview hourly (duration: 01m 06s)
[19:22:04] <wikibugs>	 (03Merged) 10jenkins-bot: RevisionStore: No first revision of non-existing page [core] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111680 (https://phabricator.wikimedia.org/T380677) (owner: 10BPirkle)
[19:22:29] <wikibugs>	 07sre-alert-triage, 06Infrastructure-Foundations: Alert in need of triage: Netbox report network (instance netbox1003) - https://phabricator.wikimedia.org/T383303#10464049 (10cmooney) 05Open→03Resolved I've added dummy interfaces in Netbox on all the fr-tech hosts and connected them to the switch ports...
[19:22:35] <logmsgbot>	 !log brennen@deploy2002 Started scap sync-world: Backport for [[gerrit:1111680|RevisionStore: No first revision of non-existing page (T380677)]]
[19:22:40] <stashbot>	 T380677: Wikimedia\Assert\ParameterAssertionException: Bad value for parameter $page: must represent an existing page - https://phabricator.wikimedia.org/T380677
[19:26:31] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1163 (T371742)', diff saved to https://phabricator.wikimedia.org/P72081 and previous config saved to /var/cache/conftool/dbconfig/20250115-192631-ladsgroup.json
[19:26:35] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[19:28:50] <wikibugs>	 (03PS1) 10Andrew Bogott: partman: change recipe for cloudcephosd1012 [puppet] - 10https://gerrit.wikimedia.org/r/1111691
[19:28:59] <logmsgbot>	 !log brennen@deploy2002 bpirkle, brennen: Backport for [[gerrit:1111680|RevisionStore: No first revision of non-existing page (T380677)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[19:29:03] <stashbot>	 T380677: Wikimedia\Assert\ParameterAssertionException: Bad value for parameter $page: must represent an existing page - https://phabricator.wikimedia.org/T380677
[19:34:41] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10464085 (10phaultfinder)
[19:34:46] <logmsgbot>	 !log brennen@deploy2002 bpirkle, brennen: Continuing with sync
[19:35:39] <wikibugs>	 (03PS18) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[19:37:19] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[19:39:36] <logmsgbot>	 !log brennen@deploy2002 Finished scap sync-world: Backport for [[gerrit:1111680|RevisionStore: No first revision of non-existing page (T380677)]] (duration: 17m 00s)
[19:39:40] <stashbot>	 T380677: Wikimedia\Assert\ParameterAssertionException: Bad value for parameter $page: must represent an existing page - https://phabricator.wikimedia.org/T380677
[19:41:39] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P72082 and previous config saved to /var/cache/conftool/dbconfig/20250115-194138-ladsgroup.json
[19:43:07] <wikibugs>	 (03CR) 10Ssingh: alerts: add alert for ferm_mss_cfg Prometheus metric (032 comments) [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[19:43:08] <wikibugs>	 (03PS19) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[19:43:34] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] partman: change recipe for cloudcephosd1012 [puppet] - 10https://gerrit.wikimedia.org/r/1111691 (owner: 10Andrew Bogott)
[19:44:20] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[19:44:26] <wikibugs>	 (03PS20) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[19:45:37] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[19:46:40] <logmsgbot>	 !log andrew@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[19:46:57] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[19:48:18] <wikibugs>	 (03PS21) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[19:49:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[19:49:39] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 to 1.44.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111693 (https://phabricator.wikimedia.org/T382363)
[19:49:41] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group1 to 1.44.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111693 (https://phabricator.wikimedia.org/T382363) (owner: 10TrainBranchBot)
[19:50:26] <wikibugs>	 (03Merged) 10jenkins-bot: group1 to 1.44.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111693 (https://phabricator.wikimedia.org/T382363) (owner: 10TrainBranchBot)
[19:53:21] <wikibugs>	 (03PS1) 10CDanis: varnish: x-analytics: Authorization header summary [puppet] - 10https://gerrit.wikimedia.org/r/1111695
[19:54:51] <logmsgbot>	 !log andrew@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[19:55:04] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[19:56:46] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P72083 and previous config saved to /var/cache/conftool/dbconfig/20250115-195645-ladsgroup.json
[19:57:28] <wikibugs>	 (03PS22) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[19:58:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[19:59:43] <tzatziki>	 !log Removing 1 file for legal compliance
[19:59:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:01:16] <wikibugs>	 (03PS23) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:01:30] <logmsgbot>	 !log brennen@deploy2002 rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.12  refs T382363
[20:01:34] <stashbot>	 T382363: 1.44.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T382363
[20:09:03] <wikibugs>	 (03PS5) 10Andrea Denisse: wmcs: Migrate network saturation alerts to the alerts.git repository [alerts] - 10https://gerrit.wikimedia.org/r/1111328 (https://phabricator.wikimedia.org/T328502)
[20:09:23] <wikibugs>	 (03PS24) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:09:27] <wikibugs>	 (03CR) 10Thcipriani: [C:03+1] add kemayo to deployers [puppet] - 10https://gerrit.wikimedia.org/r/1110867 (owner: 10CDanis)
[20:10:18] <wikibugs>	 (03CR) 10CI reject: [V:04-1] wmcs: Migrate network saturation alerts to the alerts.git repository [alerts] - 10https://gerrit.wikimedia.org/r/1111328 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[20:10:29] <wikibugs>	 (03CR) 10Thcipriani: [C:03+1] "Happy to pair during one of the backport windows to help get you started! Thanks for volunteering for deploys 😊" [puppet] - 10https://gerrit.wikimedia.org/r/1110867 (owner: 10CDanis)
[20:11:15] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:11:53] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1163 (T371742)', diff saved to https://phabricator.wikimedia.org/P72085 and previous config saved to /var/cache/conftool/dbconfig/20250115-201152-ladsgroup.json
[20:11:56] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[20:12:53] <wikibugs>	 (03PS6) 10Andrea Denisse: wmcs: Migrate network saturation alerts to the alerts.git repository [alerts] - 10https://gerrit.wikimedia.org/r/1111328 (https://phabricator.wikimedia.org/T328502)
[20:14:06] <wikibugs>	 (03PS25) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:14:45] <wikibugs>	 06SRE-OnFire, 10Sustainability (Incident Followup): create a place (whiteboard) where SRE advertises current site status / things for awareness - https://phabricator.wikimedia.org/T378038#10464234 (10Dzahn) Since T378039 has been resolved we now have this:   ` sukhe@cumin1002:~$ sudo cookbook sre.dns.admin sho...
[20:14:55] <wikibugs>	 (03CR) 10Andrea Denisse: "Thank you very much for your review, I sent a new patch." [alerts] - 10https://gerrit.wikimedia.org/r/1111328 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[20:15:17] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:20:38] <tzatziki>	 !log Removing 2 files for legal compliance
[20:20:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:21:49] <wikibugs>	 (03PS26) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:23:02] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:25:02] <wikibugs>	 (03PS27) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:26:13] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:26:20] <wikibugs>	 (03PS2) 10CDanis: varnish: x-analytics: Authorization header summary [puppet] - 10https://gerrit.wikimedia.org/r/1111695
[20:27:37] <wikibugs>	 (03PS28) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:28:13] <logmsgbot>	 !log andrew@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1012.eqiad.wmnet with OS bookworm
[20:28:48] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:31:04] <tzatziki>	 !log Removing 1 file for legal compliance
[20:31:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:32:33] <wikibugs>	 (03PS29) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:33:45] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:35:36] <wikibugs>	 (03PS30) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:36:50] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:37:59] <wikibugs>	 (03PS31) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:39:09] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:42:34] <wikibugs>	 (03PS1) 10CDanis: conftool: stub out extension configuration [puppet] - 10https://gerrit.wikimedia.org/r/1111703
[20:45:38] <wikibugs>	 (03PS32) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:46:50] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:47:46] <wikibugs>	 (03PS33) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:48:57] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:50:40] <wikibugs>	 (03PS34) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:51:51] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:53:16] <wikibugs>	 (03PS35) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:54:28] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:54:38] <tzatziki>	 !log Removing 1 file for legal compliance
[20:54:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:57:22] <wikibugs>	 (03PS36) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:58:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250115T2100).
[21:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[21:00:54] <wikibugs>	 (03PS37) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:02:05] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:03:00] <JJMC89>	 Is there a deployer available to deploy a config patch for a security task? since it is for a security task, the patch has CR+1 in phab but is not yet uploaded to gerrit
[21:03:43] <wikibugs>	 (03PS38) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:04:54] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:05:04] <JJMC89>	 TheresNoTime: maybe you?
[21:06:52] <wikibugs>	 (03PS39) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:07:08] <TheresNoTime>	 JJMC89: can do in about 5 minutes — what task?
[21:07:50] <JJMC89>	 T383747 - I can upload the patch to gerrit now
[21:08:04] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:08:08] <TheresNoTime>	 Please do :)
[21:09:16] * TheresNoTime is ready
[21:09:26] <wikibugs>	 (03PS40) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:10:39] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:10:44] <wikibugs>	 (03PS2) 10JJMC89: do not allow temp users to edit on loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111706 (https://phabricator.wikimedia.org/T383747)
[21:10:58] <JJMC89>	 TheresNoTime: ^
[21:11:10] <TheresNoTime>	 ack
[21:11:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by samtar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111706 (https://phabricator.wikimedia.org/T383747) (owner: 10JJMC89)
[21:12:36] <wikibugs>	 (03Merged) 10jenkins-bot: do not allow temp users to edit on loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111706 (https://phabricator.wikimedia.org/T383747) (owner: 10JJMC89)
[21:12:53] <wikibugs>	 (03PS41) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:13:07] <logmsgbot>	 !log samtar@deploy2002 Started scap sync-world: Backport for [[gerrit:1111706|do not allow temp users to edit on loginwiki (T383747)]]
[21:14:04] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:14:23] <wikibugs>	 (03PS42) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:14:50] <TheresNoTime>	 JJMC89: will you want to test this?
[21:15:12] <wikibugs>	 10ops-codfw, 06DC-Ops: restbase2037 periodically rebooting(?) - https://phabricator.wikimedia.org/T383820#10464462 (10Eevans) p:05Triage→03Medium
[21:15:33] <JJMC89>	 I don't have the debug extenstion to test
[21:15:34] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:16:00] <wikibugs>	 (03PS43) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:16:22] <JJMC89>	 https://login.wikimedia.org/wiki/Special:ListGroupRights should reflect - I can check after a full sync
[21:16:23] <ottomata>	 !log roll restarting eventgate-analytics to pick up new stream configuration for haproxy_requestctl 
[21:16:23] <ottomata>	  - T383392
[21:16:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:16:25] <stashbot>	 T383392: Define a schema for analytics pipeline ingestion - https://phabricator.wikimedia.org/T383392
[21:16:41] <wikibugs>	 10ops-codfw, 10Cassandra, 06DC-Ops: restbase2037 periodically rebooting(?) - https://phabricator.wikimedia.org/T383820#10464469 (10Eevans)
[21:16:42] <logmsgbot>	 !log otto@deploy2002 helmfile [staging] START helmfile.d/services/eventgate-analytics: sync
[21:16:56] <logmsgbot>	 !log otto@deploy2002 helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync
[21:17:03] <wikibugs>	 10ops-codfw, 10Cassandra, 06DC-Ops: restbase2037 is crashy - https://phabricator.wikimedia.org/T383820#10464471 (10Eevans)
[21:17:20] <TheresNoTime>	 ack, will just sync it then
[21:17:39] <tzatziki>	 !log Removing 11 files for legal compliance
[21:17:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:17:48] <logmsgbot>	 !log otto@deploy2002 helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync
[21:17:53] <logmsgbot>	 !log samtar@deploy2002 samtar, jjmc89: Backport for [[gerrit:1111706|do not allow temp users to edit on loginwiki (T383747)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:18:00] <logmsgbot>	 !log samtar@deploy2002 samtar, jjmc89: Continuing with sync
[21:18:14] <wikibugs>	 06SRE, 06Traffic: Define a schema for analytics pipeline ingestion - https://phabricator.wikimedia.org/T383392#10464483 (10Ottomata)
[21:18:18] <wikibugs>	 06SRE, 06Traffic: Define a schema for analytics pipeline ingestion - https://phabricator.wikimedia.org/T383392#10464485 (10Ottomata)
[21:18:32] <logmsgbot>	 !log otto@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync
[21:19:22] <logmsgbot>	 !log otto@deploy2002 helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync
[21:19:32] <logmsgbot>	 !log otto@deploy2002 helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync
[21:22:17] <wikibugs>	 (03PS44) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:22:41] <logmsgbot>	 !log samtar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1111706|do not allow temp users to edit on loginwiki (T383747)]] (duration: 09m 34s)
[21:22:50] <TheresNoTime>	 JJMC89: sync'd
[21:23:26] <JJMC89>	 TheresNoTime: looks good - thank you
[21:23:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:23:32] <TheresNoTime>	 np!
[21:25:52] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:26:41] <wikibugs>	 (03PS45) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:27:52] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:28:53] <wikibugs>	 (03PS46) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:31:46] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PowerSupplyFailure - https://phabricator.wikimedia.org/T383638#10464550 (10phaultfinder)
[21:34:22] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Console domain and property access request - https://phabricator.wikimedia.org/T381904#10464574 (10Scott_French) Alright, after discussion yesterday with @NBaca-WMF and @LSobanski, I believe the next steps in order to facilitate this involve building a couple of lists of do...
[21:34:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker1093:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1093 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[21:35:03] <wikibugs>	 (03PS47) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:36:16] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:36:57] <wikibugs>	 (03PS48) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:38:10] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:38:40] <wikibugs>	 (03PS49) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:39:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on wikikube-worker1093:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1093 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[21:39:52] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:40:54] <tzatziki>	 !log Removing 10 files for legal compliance
[21:40:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:46:11] <wikibugs>	 (03PS50) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:47:23] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:53:42] <wikibugs>	 (03PS51) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:54:53] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:55:49] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111260 (https://phabricator.wikimedia.org/T381964) (owner: 10Phuedx)
[21:56:15] <wikibugs>	 (03PS52) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:56:49] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111261 (https://phabricator.wikimedia.org/T381964) (owner: 10Phuedx)
[21:57:29] <wikibugs>	 (03CR) 10Clare Ming: [C:03+1] "scheduled for tomorrow's deployment window" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111260 (https://phabricator.wikimedia.org/T381964) (owner: 10Phuedx)
[21:57:56] <wikibugs>	 (03CR) 10Clare Ming: [C:03+1] "scheduled for tomorrow's deployment window" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111261 (https://phabricator.wikimedia.org/T381964) (owner: 10Phuedx)
[21:58:25] <wikibugs>	 (03CR) 10Clare Ming: [C:03+1] "will schedule for tomorrow's deployment window" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111262 (https://phabricator.wikimedia.org/T381964) (owner: 10Phuedx)
[21:58:39] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111262 (https://phabricator.wikimedia.org/T381964) (owner: 10Phuedx)
[22:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250115T2200)
[22:00:09] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:00:45] <wikibugs>	 (03PS53) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:01:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:02:30] <wikibugs>	 (03PS2) 10Jforrester: wikifunctions: Upgrade orchestrator from 2025-01-08-142250 to 2025-01-15-052609 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111626 (https://phabricator.wikimedia.org/T378785)
[22:02:34] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade orchestrator from 2025-01-08-142250 to 2025-01-15-052609 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111626 (https://phabricator.wikimedia.org/T378785) (owner: 10Jforrester)
[22:02:46] <wikibugs>	 (03PS54) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:03:51] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator from 2025-01-08-142250 to 2025-01-15-052609 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111626 (https://phabricator.wikimedia.org/T378785) (owner: 10Jforrester)
[22:04:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:05:21] <wikibugs>	 (03PS55) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:06:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:06:51] <wikibugs>	 (03PS56) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:08:07] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:10:37] <logmsgbot>	 !log dmartin@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[22:12:17] <logmsgbot>	 !log dmartin@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[22:14:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10464689 (10phaultfinder)
[22:15:58] <logmsgbot>	 !log dmartin@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[22:17:04] <logmsgbot>	 !log dmartin@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[22:17:45] <logmsgbot>	 !log dmartin@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[22:18:40] <logmsgbot>	 !log dmartin@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[22:24:53] <wikibugs>	 (03PS1) 10Jdlrobson: WebUIClick: Increase sampling rate to 100% for beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111715 (https://phabricator.wikimedia.org/T382080)
[22:25:27] <wikibugs>	 (03CR) 10C. Scott Ananian: [C:03+1] Turn on Parsoid Read Views on test2wiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111325 (https://phabricator.wikimedia.org/T378645) (owner: 10Subramanya Sastry)
[22:25:36] <wikibugs>	 (03PS2) 10Jdlrobson: Web UI actions: Increase sampling rate to 100% for beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111715 (https://phabricator.wikimedia.org/T382080)
[22:26:14] <icinga-wm>	 PROBLEM - Check if active EventStreams endpoint is delivering messages. on alert1002 is CRITICAL: CRITICAL: No EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration
[22:26:16] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - eventstreams_4892: Servers wikikube-worker1280.eqiad.wmnet, wikikube-worker1291.eqiad.wmnet, wikikube-worker1322.eqiad.wmnet, wikikube-worker1042.eqiad.wmnet, wikikube-worker1079.eqiad.wmnet, wikikube-worker1304.eqiad.wmnet, wikikube-worker1306.eqiad.wmnet, wikikube-worker1101.eqiad.wmnet, mw1442.eqiad.wmnet, wikikube-worker1281.eqiad.wmnet, wikikube
[22:26:16] <icinga-wm>	 036.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, wikikube-worker1049.eqiad.wmnet, mw1462.eqiad.wmnet, mw1480.eqiad.wmnet, parse1009.eqiad.wmnet, mw1484.eqiad.wmnet, wikikube-worker1016.eqiad.wmnet, wikikube-worker1273.eqiad.wmnet, wikikube-worker1071.eqiad.wmnet, wikikube-worker1279.eqiad.wmnet, wikikube-worker1282.eqiad.wmnet, wikikube-worker1263.eqiad.wmnet, wikikube-worker1307.eqiad.wmnet, mw1488.eqiad.wmnet, wikikube-worker1244.eqiad
[22:26:16] <icinga-wm>	 wikikube-worker1037.eqiad.wmnet, wikikube-worker1058.eqiad.wmnet, wikikube-worker1261.eqiad.wmnet, mw1466.eqiad.wmnet, mw1483.eqiad.wmnet, wikikube-worker1309.eqiad.wmnet, wikikube-work https://wikitech.wikimedia.org/wiki/PyBal
[22:26:50] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - eventstreams_4892: Servers wikikube-worker1051.eqiad.wmnet, wikikube-worker1291.eqiad.wmnet, wikikube-worker1042.eqiad.wmnet, parse1013.eqiad.wmnet, wikikube-worker1268.eqiad.wmnet, wikikube-worker1304.eqiad.wmnet, wikikube-worker1259.eqiad.wmnet, wikikube-worker1101.eqiad.wmnet, mw1442.eqiad.wmnet, wikikube-worker1050.eqiad.wmnet, wikikube-worker103
[22:26:50] <icinga-wm>	 wmnet, wikikube-worker1029.eqiad.wmnet, mw1470.eqiad.wmnet, wikikube-worker1079.eqiad.wmnet, mw1484.eqiad.wmnet, wikikube-worker1260.eqiad.wmnet, wikikube-worker1282.eqiad.wmnet, wikikube-worker1263.eqiad.wmnet, mw1467.eqiad.wmnet, mw1488.eqiad.wmnet, parse1010.eqiad.wmnet, wikikube-worker1287.eqiad.wmnet, wikikube-worker1270.eqiad.wmnet, wikikube-worker1244.eqiad.wmnet, wikikube-worker1278.eqiad.wmnet, wikikube-worker1106.eqiad.wmnet, wi
[22:26:50] <icinga-wm>	 orker1289.eqiad.wmnet, mw1465.eqiad.wmnet, mw1483.eqiad.wmnet, mw1486.eqiad.wmnet, wikikube-worker1309.eqiad.wmnet, wikikube-worker1062.eqiad.wmnet, wikikube-worker1272.eqiad.wmnet, wik https://wikitech.wikimedia.org/wiki/PyBal
[22:27:24] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111715 (https://phabricator.wikimedia.org/T382080) (owner: 10Jdlrobson)
[22:27:48] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1108135 (https://phabricator.wikimedia.org/T332728) (owner: 10Kimberly Sarabia)
[22:28:00] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1107964 (https://phabricator.wikimedia.org/T376446) (owner: 10Jdlrobson)
[22:29:37] <wikibugs>	 (03CR) 10Subramanya Sastry: "I'll schedule this for backport tomorrow." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111325 (https://phabricator.wikimedia.org/T378645) (owner: 10Subramanya Sastry)
[22:30:55] <wikibugs>	 (03PS1) 10Eevans: restbase: new hosts (refresh) restbase104[3-5] [puppet] - 10https://gerrit.wikimedia.org/r/1111717 (https://phabricator.wikimedia.org/T383673)
[22:31:02] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] whitelist kg.kunsten.be on wikidata query service [puppet] - 10https://gerrit.wikimedia.org/r/1111607 (https://phabricator.wikimedia.org/T380984) (owner: 10Stevemunene)
[22:31:12] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] Add linkeddata.cultureelerfgoed.nl to SPARQL allowlist [puppet] - 10https://gerrit.wikimedia.org/r/1105882 (https://phabricator.wikimedia.org/T381717) (owner: 10Stevemunene)
[22:31:15] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] Add FactGrid to WDQS allowlist [puppet] - 10https://gerrit.wikimedia.org/r/1111605 (https://phabricator.wikimedia.org/T381649) (owner: 10Stevemunene)
[22:31:17] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] Add api.finto.fi/sparql to Wikidata query service and WCQS whitelist [puppet] - 10https://gerrit.wikimedia.org/r/1111606 (https://phabricator.wikimedia.org/T378561) (owner: 10Stevemunene)
[22:34:29] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops, and 2 others: Q3:rack/setup/install restbase104[345] - https://phabricator.wikimedia.org/T383673#10464754 (10Eevans) a:05Eevans→03None
[22:35:51] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.wdqs.restart
[22:35:52] <logmsgbot>	 !log ryankemper@cumin2002 END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
[22:36:04] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.wdqs.restart
[22:37:57] <jinxer-wm>	 FIRING: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[22:40:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:40:42] <wikibugs>	 (03PS1) 10Ryan Kemper: wdqs: fix kg.kunsten.be URL [puppet] - 10https://gerrit.wikimedia.org/r/1111718 (https://phabricator.wikimedia.org/T380984)
[22:41:17] <logmsgbot>	 !log ryankemper@cumin2002 END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
[22:42:10] <wikibugs>	 (03CR) 10Bking: [C:03+1] wdqs: fix kg.kunsten.be URL [puppet] - 10https://gerrit.wikimedia.org/r/1111718 (https://phabricator.wikimedia.org/T380984) (owner: 10Ryan Kemper)
[22:42:13] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs: fix kg.kunsten.be URL [puppet] - 10https://gerrit.wikimedia.org/r/1111718 (https://phabricator.wikimedia.org/T380984) (owner: 10Ryan Kemper)
[22:43:23] <jinxer-wm>	 FIRING: [3x] SLOMetricAbsent: <no value>   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[22:43:26] <icinga-wm>	 PROBLEM - Check unit status of statograph_post on alert1002 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[22:43:29] <jinxer-wm>	 FIRING: SLOMetricAbsent: <no value>   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[22:43:36] <jinxer-wm>	 FIRING: NELNotReported: NEL metrics not reported - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELNotReported
[22:43:41] <jinxer-wm>	 FIRING: NELByCountryNotReported: NEL metrics by country not reported - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELByCountryNotReported
[22:43:44] <jinxer-wm>	 FIRING: [4x] SLOMetricAbsent: <no value>   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[22:43:50] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[22:44:16] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[22:45:15] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.wdqs.restart
[22:45:42] <jinxer-wm>	 FIRING: [4x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:47:57] <jinxer-wm>	 RESOLVED: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[22:48:23] <jinxer-wm>	 RESOLVED: [7x] SLOMetricAbsent: <no value>   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[22:48:32] <jinxer-wm>	 RESOLVED: [3x] SLOMetricAbsent: <no value>   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[22:48:36] <jinxer-wm>	 RESOLVED: NELNotReported: NEL metrics not reported - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELNotReported
[22:48:41] <jinxer-wm>	 RESOLVED: NELByCountryNotReported: NEL metrics by country not reported - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELByCountryNotReported
[22:49:40] <cwhite>	 !log restarted thanos-query-fronted on titan100[12]
[22:49:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:50:42] <jinxer-wm>	 RESOLVED: [4x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:53:26] <icinga-wm>	 RECOVERY - Check unit status of statograph_post on alert1002 is OK: OK: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[22:56:14] <icinga-wm>	 RECOVERY - Check if active EventStreams endpoint is delivering messages. on alert1002 is OK: OK: An EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration
[23:00:04] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250115T2300)
[23:10:29] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:27:30] <logmsgbot>	 !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[23:27:37] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1163 (T370903)', diff saved to https://phabricator.wikimedia.org/P72087 and previous config saved to /var/cache/conftool/dbconfig/20250115-232737-ladsgroup.json
[23:27:41] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[23:36:17] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1163 (T370903)', diff saved to https://phabricator.wikimedia.org/P72088 and previous config saved to /var/cache/conftool/dbconfig/20250115-233617-ladsgroup.json
[23:36:21] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[23:38:29] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
[23:51:24] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P72089 and previous config saved to /var/cache/conftool/dbconfig/20250115-235123-ladsgroup.json
[23:52:15] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.wdqs.restart
[23:55:54] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Idle https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status