[00:00:47] <wikibugs>	 (03CR) 10Krinkle: [C:03+1] MediaWiki: Redirect auth domain root to wikimedia.org portal [puppet] - 10https://gerrit.wikimedia.org/r/1100532 (https://phabricator.wikimedia.org/T380551) (owner: 10Bartosz Dziewoński)
[00:08:25] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10392233 (10Jclark-ctr) @elukey  the 10g card is copper rj45 and not in use.   AOC-ATGC-i2TM.   The 10g port is connected...
[00:38:25] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1101604
[00:38:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1101604 (owner: 10TrainBranchBot)
[00:58:12] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1101604 (owner: 10TrainBranchBot)
[01:08:17] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1101605
[01:08:17] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1101605 (owner: 10TrainBranchBot)
[01:20:32] <wikibugs>	 06SRE-OnFire, 10MW-on-K8s, 06serviceops, 13Patch-For-Review, 10Sustainability (Incident Followup): mwscript-k8s creates too many resources - https://phabricator.wikimedia.org/T376795#10392357 (10RLazarus) Yes, naively this would be too many invocations at present.  We could easily add the release name to...
[01:27:25] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1101605 (owner: 10TrainBranchBot)
[01:28:32] <wikibugs>	 (03PS1) 10RLazarus: deployment_server: Add release to mwscript-k8s -ojson output [puppet] - 10https://gerrit.wikimedia.org/r/1101607 (https://phabricator.wikimedia.org/T376795)
[01:29:38] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10392384 (10VRiley-WMF)
[02:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:24:05] <wikibugs>	 (03CR) 10Scott French: [C:03+1] deployment_server: Add release to mwscript-k8s -ojson output [puppet] - 10https://gerrit.wikimedia.org/r/1101607 (https://phabricator.wikimedia.org/T376795) (owner: 10RLazarus)
[02:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:00:05] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T0300)
[03:01:07] <wikibugs>	 06SRE, 06Data-Engineering, 06Data-Platform-SRE: Data Platform access streamlining for WMDE staff - https://phabricator.wikimedia.org/T381824#10392465 (10Dzahn) One thing to answer here would be how you would know who actually is WMDE staff. There used to be a public page that lists them but then that stopped...
[03:06:03] <icinga-wm>	 PROBLEM - Checks that the local airflow scheduler for airflow @analytics is working properly on an-launcher1002 is CRITICAL: CRITICAL: /usr/bin/env PYTHONPATH=/srv/deployment/airflow-dags/analytics AIRFLOW_HOME=/srv/airflow-analytics /usr/lib/airflow/bin/airflow jobs check --job-type SchedulerJob --hostname an-launcher1002.eqiad.wmnet did not succeed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow
[03:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:07:09] <icinga-wm>	 RECOVERY - Checks that the local airflow scheduler for airflow @analytics is working properly on an-launcher1002 is OK: OK: /usr/bin/env PYTHONPATH=/srv/deployment/airflow-dags/analytics AIRFLOW_HOME=/srv/airflow-analytics /usr/lib/airflow/bin/airflow jobs check --job-type SchedulerJob --hostname an-launcher1002.eqiad.wmnet succeeded https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow
[04:00:04] <jouncebot>	 Deploy window Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T0400)
[04:16:39] <icinga-wm>	 RECOVERY - Disk space on build2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=build2001&var-datasource=codfw+prometheus/ops
[04:40:07] <icinga-wm>	 PROBLEM - Disk space on ml-lab1001 is CRITICAL: DISK CRITICAL - free space: /srv 12109MiB (3% inode=95%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-lab1001&var-datasource=eqiad+prometheus/ops
[04:49:29] <wikibugs>	 10ops-codfw, 06DC-Ops: ManagementSSHDown - https://phabricator.wikimedia.org/T381843 (10phaultfinder) 03NEW
[05:00:05] <jouncebot>	 Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T0500)
[05:01:27] <logmsgbot>	 !log mwpresync@deploy2002 Pruned MediaWiki: 1.44.0-wmf.4 (duration: 01m 25s)
[06:04:48] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: update article-country deployment in the experimental ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101741 (https://phabricator.wikimedia.org/T371897)
[06:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:15:13] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: update article-country deployment in the article-models ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101743 (https://phabricator.wikimedia.org/T371897)
[06:34:05] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[06:34:55] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.192 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[06:36:11] <wikibugs>	 (03PS2) 10Stevemunene: Enable airflow task pods  access to mx server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101527 (https://phabricator.wikimedia.org/T377926)
[06:38:52] <wikibugs>	 (03CR) 10Stevemunene: Enable airflow task pods  access to mx server (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101527 (https://phabricator.wikimedia.org/T377926) (owner: 10Stevemunene)
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T0700)
[07:00:05] <jouncebot>	 marostegui, Amir1, and arnaudb: Time to do the Primary database switchover deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T0700).
[07:24:09] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] ml-services: update article-country deployment in the experimental ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101741 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[07:24:44] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] ml-services: update article-country deployment in the experimental ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101741 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[07:24:55] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] ml-services: update article-country deployment in the article-models ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101743 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[07:26:47] <wikibugs>	 (03PS1) 10Jelto: Rename kubernetes[1051-1054] to wikikube-worker[1076-1079] [puppet] - 10https://gerrit.wikimedia.org/r/1101789 (https://phabricator.wikimedia.org/T377876)
[07:32:40] <kart_>	 Q: Where is source code of liveness_probe in deployment-charts configuration? We want to check how it is functioning..
[07:38:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] netbox::db: Use new helper function [puppet] - 10https://gerrit.wikimedia.org/r/1101497 (owner: 10Muehlenhoff)
[07:40:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] prometheus/pop: Restrict access to Envoy port [puppet] - 10https://gerrit.wikimedia.org/r/1100810 (owner: 10Muehlenhoff)
[07:44:07] <wikibugs>	 (03PS1) 10Muehlenhoff: profile::analytics::postgresql: Use debian_postgresql_version [puppet] - 10https://gerrit.wikimedia.org/r/1101791
[07:48:30] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101791 (owner: 10Muehlenhoff)
[07:57:57] <wikibugs>	 06SRE, 06Data-Engineering, 06Data-Platform-SRE: Data Platform access streamlining for WMDE staff - https://phabricator.wikimedia.org/T381824#10392591 (10MoritzMuehlenhoff) >>! In T381824#10392465, @Dzahn wrote: > One thing to answer here would be how you would know who actually is WMDE staff. There used to b...
[08:00:04] <jouncebot>	 Amir1, Urbanecm, and awight: OwO what's this, a deployment window?? UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T0800). nyaa~
[08:00:04] <jouncebot>	 gmodena: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:58] * gmodena waves
[08:07:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71647 and previous config saved to /var/cache/conftool/dbconfig/20241210-080710-root.json
[08:08:45] <wikibugs>	 (03PS1) 10Marostegui: db1159: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1101793 (https://phabricator.wikimedia.org/T381550)
[08:10:23] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1159: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1101793 (https://phabricator.wikimedia.org/T381550) (owner: 10Marostegui)
[08:10:40] <gmodena>	 Amir1 urbanecm  anyone around for backport window deployments?  
[08:10:58] <gmodena>	 I can deploy 1100417 myself, but I'd like an ack from a responsible adult in case :)
[08:13:28] <wikibugs>	 (03CR) 10David Caro: [C:03+2] "LGTM" [alerts] - 10https://gerrit.wikimedia.org/r/1101584 (https://phabricator.wikimedia.org/T381807) (owner: 10FNegri)
[08:14:47] <wikibugs>	 (03PS1) 10Marostegui: instances: Add db1159 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1101794 (https://phabricator.wikimedia.org/T381550)
[08:15:07] <wikibugs>	 (03Merged) 10jenkins-bot: WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584 (https://phabricator.wikimedia.org/T381807) (owner: 10FNegri)
[08:16:17] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances: Add db1159 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1101794 (https://phabricator.wikimedia.org/T381550) (owner: 10Marostegui)
[08:16:48] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "Perfect, thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101527 (https://phabricator.wikimedia.org/T377926) (owner: 10Stevemunene)
[08:18:39] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1101791 (owner: 10Muehlenhoff)
[08:20:17] <gmodena>	 jouncebot I can do the deploys today!
[08:20:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Add db1159 to dbctl depooled T381550', diff saved to https://phabricator.wikimedia.org/P71648 and previous config saved to /var/cache/conftool/dbconfig/20241210-082020-marostegui.json
[08:20:25] <stashbot>	 T381550: Move db1159 to s5 - https://phabricator.wikimedia.org/T381550
[08:21:07] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by gmodena@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100417 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[08:21:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] profile::analytics::postgresql: Use debian_postgresql_version [puppet] - 10https://gerrit.wikimedia.org/r/1101791 (owner: 10Muehlenhoff)
[08:21:49] <wikibugs>	 (03Merged) 10jenkins-bot: EventStreamConfig: add content_history streams. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100417 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[08:22:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71649 and previous config saved to /var/cache/conftool/dbconfig/20241210-082216-root.json
[08:22:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1159 (re)pooling @ 10%: 5', diff saved to https://phabricator.wikimedia.org/P71650 and previous config saved to /var/cache/conftool/dbconfig/20241210-082221-root.json
[08:22:32] <logmsgbot>	 !log gmodena@deploy2002 Started scap sync-world: Backport for [[gerrit:1100417|EventStreamConfig: add content_history streams. (T381322)]]
[08:22:35] <stashbot>	 T381322: Rename Flink application and streams to match prod conventions - https://phabricator.wikimedia.org/T381322
[08:26:47] <logmsgbot>	 !log gmodena@deploy2002 gmodena: Backport for [[gerrit:1100417|EventStreamConfig: add content_history streams. (T381322)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:26:52] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1101789 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[08:29:59] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend access for aarora [puppet] - 10https://gerrit.wikimedia.org/r/1101797
[08:31:57] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1051-1054].eqiad.wmnet
[08:34:13] <gmodena>	 jouncebot Amir1  urbanecm tested on mwdebug host. Config changes (two new streams have been added) have been applied as expected. No regression found. I'll continue with sync.
[08:34:18] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1051-1054].eqiad.wmnet
[08:34:27] <logmsgbot>	 !log gmodena@deploy2002 gmodena: Continuing with sync
[08:34:45] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename kubernetes[1051-1054] to wikikube-worker[1076-1079] [puppet] - 10https://gerrit.wikimedia.org/r/1101789 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[08:35:55] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1101797 (owner: 10Muehlenhoff)
[08:36:24] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10392652 (10elukey) @Jclark-ctr @bking given that the 10g card will never be used (Rj45, coppet, etc..) we can go ahead wi...
[08:36:58] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1051 to wikikube-worker1076
[08:37:17] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:37:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71652 and previous config saved to /var/cache/conftool/dbconfig/20241210-083721-root.json
[08:37:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1159 (re)pooling @ 25%: 5', diff saved to https://phabricator.wikimedia.org/P71653 and previous config saved to /var/cache/conftool/dbconfig/20241210-083726-root.json
[08:39:02] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: es2045 went down: CPU error - https://phabricator.wikimedia.org/T381549#10392654 (10Marostegui) 05Open→03Resolved @Jhancock.wm The transfer didn't make the host crash. So I am going to start giving it some production traffic. Will reopen the task if...
[08:39:25] <icinga-wm>	 PROBLEM - BGP status on lsw1-f3-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[08:39:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Extend access for aarora [puppet] - 10https://gerrit.wikimedia.org/r/1101797 (owner: 10Muehlenhoff)
[08:39:48] <logmsgbot>	 !log gmodena@deploy2002 Finished scap sync-world: Backport for [[gerrit:1100417|EventStreamConfig: add content_history streams. (T381322)]] (duration: 17m 16s)
[08:39:52] <stashbot>	 T381322: Rename Flink application and streams to match prod conventions - https://phabricator.wikimedia.org/T381322
[08:40:16] <wikibugs>	 (03PS1) 10Marostegui: es2045: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1101799 (https://phabricator.wikimedia.org/T381259)
[08:41:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1051 to wikikube-worker1076 - jelto@cumin1002"
[08:41:19] <gmodena>	 !log UTC morning backport deploys done
[08:41:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:32] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] es2045: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1101799 (https://phabricator.wikimedia.org/T381259) (owner: 10Marostegui)
[08:41:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1051 to wikikube-worker1076 - jelto@cumin1002"
[08:41:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:41:43] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1076
[08:42:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1076
[08:42:06] <wikibugs>	 (03CR) 10Elukey: [C:03+1] maps::postgresql_common: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1101461 (owner: 10Muehlenhoff)
[08:42:41] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1051 to wikikube-worker1076
[08:42:54] <wikibugs>	 (03PS3) 10Slyngshede: Updated notification handling [software/bitu] - 10https://gerrit.wikimedia.org/r/1100388 (https://phabricator.wikimedia.org/T381075)
[08:43:01] <wikibugs>	 (03CR) 10Elukey: [C:03+2] profile::k8s::deployment_server: add config for Kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/1101483 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[08:43:41] <wikibugs>	 (03Abandoned) 10Elukey: TEST: dump bios changes to be applied [cookbooks] - 10https://gerrit.wikimedia.org/r/1100996 (owner: 10Elukey)
[08:43:49] <wikibugs>	 (03Abandoned) 10Elukey: WIP: sre.hosts.provision: skip IPv6 autoconfig disable for Supermicro [cookbooks] - 10https://gerrit.wikimedia.org/r/1091601 (owner: 10Elukey)
[08:44:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1052 to wikikube-worker1077
[08:44:27] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:45:05] <wikibugs>	 (03PS1) 10Marostegui: instances: Add es2045 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1101802 (https://phabricator.wikimedia.org/T381259)
[08:46:21] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances: Add es2045 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1101802 (https://phabricator.wikimedia.org/T381259) (owner: 10Marostegui)
[08:47:40] <jinxer-wm>	 FIRING: [2x] KubernetesRsyslogDown: rsyslog on kubernetes1053:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:47:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2024 (re)pooling @ 1%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71654 and previous config saved to /var/cache/conftool/dbconfig/20241210-084743-root.json
[08:48:05] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1052 to wikikube-worker1077 - jelto@cumin1002"
[08:48:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Add es2045 to dbctl T381259', diff saved to https://phabricator.wikimedia.org/P71655 and previous config saved to /var/cache/conftool/dbconfig/20241210-084844-marostegui.json
[08:48:48] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1052 to wikikube-worker1077 - jelto@cumin1002"
[08:48:48] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:48:49] <stashbot>	 T381259: Productionize es204[1-6] - https://phabricator.wikimedia.org/T381259
[08:48:49] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1077
[08:49:00] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1077
[08:49:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Change es2024 weight', diff saved to https://phabricator.wikimedia.org/P71656 and previous config saved to /var/cache/conftool/dbconfig/20241210-084932-marostegui.json
[08:49:38] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1052 to wikikube-worker1077
[08:49:58] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1053 to wikikube-worker1078
[08:49:59] <elukey>	 !log manual run of docker-system-prune-all on build2001 to free some space
[08:50:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:50:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2045 (re)pooling @ 1%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71657 and previous config saved to /var/cache/conftool/dbconfig/20241210-085006-root.json
[08:50:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:52:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71658 and previous config saved to /var/cache/conftool/dbconfig/20241210-085227-root.json
[08:52:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1159 (re)pooling @ 50%: 5', diff saved to https://phabricator.wikimedia.org/P71659 and previous config saved to /var/cache/conftool/dbconfig/20241210-085232-root.json
[08:53:47] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1053 to wikikube-worker1078 - jelto@cumin1002"
[08:54:04] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1053 to wikikube-worker1078 - jelto@cumin1002"
[08:54:04] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:54:04] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1078
[08:54:10] <wikibugs>	 (03CR) 10DCausse: [C:03+1] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1101607 (https://phabricator.wikimedia.org/T376795) (owner: 10RLazarus)
[08:54:22] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1078
[08:55:01] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1053 to wikikube-worker1078
[08:55:07] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] Enable airflow task pods  access to mx server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101527 (https://phabricator.wikimedia.org/T377926) (owner: 10Stevemunene)
[08:55:37] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1054 to wikikube-worker1079
[08:55:57] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:56:26] <wikibugs>	 (03Merged) 10jenkins-bot: Enable airflow task pods  access to mx server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101527 (https://phabricator.wikimedia.org/T377926) (owner: 10Stevemunene)
[08:56:38] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet with reason: Alter table
[08:56:42] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet with reason: Alter table
[08:59:36] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1054 to wikikube-worker1079 - jelto@cumin1002"
[08:59:52] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1054 to wikikube-worker1079 - jelto@cumin1002"
[08:59:53] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:59:53] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1079
[09:00:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1079
[09:00:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1054 to wikikube-worker1079
[09:01:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1076.eqiad.wmnet wikikube-worker1077.eqiad.wmnet wikikube-worker1078.eqiad.wmnet wikikube-worker1079.eqiad.wmnet on all recursors
[09:01:21] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1076.eqiad.wmnet wikikube-worker1077.eqiad.wmnet wikikube-worker1078.eqiad.wmnet wikikube-worker1079.eqiad.wmnet on all recursors
[09:02:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2024 (re)pooling @ 5%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71660 and previous config saved to /var/cache/conftool/dbconfig/20241210-090248-root.json
[09:04:12] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1076.eqiad.wmnet with OS bookworm
[09:04:40] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1077.eqiad.wmnet with OS bookworm
[09:05:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2045 (re)pooling @ 5%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71661 and previous config saved to /var/cache/conftool/dbconfig/20241210-090511-root.json
[09:05:17] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1078.eqiad.wmnet with OS bookworm
[09:05:39] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1079.eqiad.wmnet with OS bookworm
[09:06:51] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:07:13] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:07:31] <wikibugs>	 (03PS4) 10Slyngshede: Updated notification handling [software/bitu] - 10https://gerrit.wikimedia.org/r/1100388 (https://phabricator.wikimedia.org/T381075)
[09:07:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71662 and previous config saved to /var/cache/conftool/dbconfig/20241210-090732-root.json
[09:07:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1159 (re)pooling @ 75%: 5', diff saved to https://phabricator.wikimedia.org/P71663 and previous config saved to /var/cache/conftool/dbconfig/20241210-090738-root.json
[09:14:35] <wikibugs>	 (03PS5) 10Slyngshede: Updated notification handling [software/bitu] - 10https://gerrit.wikimedia.org/r/1100388 (https://phabricator.wikimedia.org/T381075)
[09:15:59] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[09:16:10] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[09:17:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2024 (re)pooling @ 10%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71664 and previous config saved to /var/cache/conftool/dbconfig/20241210-091754-root.json
[09:19:50] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] ml-services: update article-country deployment in the article-models ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101743 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[09:20:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2045 (re)pooling @ 10%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71665 and previous config saved to /var/cache/conftool/dbconfig/20241210-092016-root.json
[09:20:56] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: update article-country deployment in the article-models ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101743 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[09:22:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1159 (re)pooling @ 100%: 5', diff saved to https://phabricator.wikimedia.org/P71666 and previous config saved to /var/cache/conftool/dbconfig/20241210-092243-root.json
[09:23:30] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1076.eqiad.wmnet with reason: host reimage
[09:24:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1077.eqiad.wmnet with reason: host reimage
[09:24:28] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1078.eqiad.wmnet with reason: host reimage
[09:24:44] <wikibugs>	 (03PS2) 10Kevin Bazira: ml-services: update article-country deployment in the experimental ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101741 (https://phabricator.wikimedia.org/T371897)
[09:24:58] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1079.eqiad.wmnet with reason: host reimage
[09:26:43] <wikibugs>	 (03CR) 10Atieno: [C:03+1] "Confirming it's the one" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577 (owner: 10Arlolra)
[09:27:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1076.eqiad.wmnet with reason: host reimage
[09:27:14] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] ml-services: update article-country deployment in the experimental ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101741 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[09:29:08] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: update article-country deployment in the experimental ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101741 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[09:29:30] <wikibugs>	 06SRE, 10Dumps 2.0, 10Dumps-Generation: Dumps generation cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#10392790 (10Marostegui) @xcollazo I like @BTullis idea. @BTullis do you think you could find some time to explore this idea. I am interesting in knowing how the...
[09:30:21] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1077.eqiad.wmnet with reason: host reimage
[09:32:20] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
[09:33:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2024 (re)pooling @ 25%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71667 and previous config saved to /var/cache/conftool/dbconfig/20241210-093259-root.json
[09:34:13] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1079.eqiad.wmnet with reason: host reimage
[09:34:25] <moritzm>	 !log rebalance Ganeti cluster in codfw/c following server refresh T376594
[09:34:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:29] <stashbot>	 T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018 - https://phabricator.wikimedia.org/T376594
[09:35:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2045 (re)pooling @ 25%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71668 and previous config saved to /var/cache/conftool/dbconfig/20241210-093522-root.json
[09:36:03] <wikibugs>	 (03CR) 10Brouberol: dse-k8s-services: introduce Blunderbuss config (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091827 (https://phabricator.wikimedia.org/T371994) (owner: 10Bking)
[09:36:03] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
[09:37:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1078.eqiad.wmnet with reason: host reimage
[09:41:28] <logmsgbot>	 !log aqu@deploy2002 Started deploy [airflow-dags/analytics@7428c06]: Backfill webrequest actor metrics 2024 12
[09:41:43] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@0ffc330]: Analytics backfill train [analytics/refinery@0ffc3306]
[09:43:43] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
[09:43:47] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@0ffc330]: Analytics backfill train [analytics/refinery@0ffc3306] (duration: 02m 04s)
[09:44:05] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@0ffc330] (thin): Analytics backfill train - THIN [analytics/refinery@0ffc3306]
[09:44:06] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
[09:44:22] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
[09:44:36] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@0ffc330] (thin): Analytics backfill train - THIN [analytics/refinery@0ffc3306] (duration: 00m 31s)
[09:44:50] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@0ffc330] (hadoop-test): Analytics backfill train - TEST [analytics/refinery@0ffc3306]
[09:45:17] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@0ffc330] (hadoop-test): Analytics backfill train - TEST [analytics/refinery@0ffc3306] (duration: 00m 26s)
[09:46:24] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1076.eqiad.wmnet with OS bookworm
[09:47:03] <wikibugs>	 (03Abandoned) 10Muehlenhoff: Extend access request email template [software/bitu] - 10https://gerrit.wikimedia.org/r/1100133 (owner: 10Muehlenhoff)
[09:48:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2024 (re)pooling @ 50%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71669 and previous config saved to /var/cache/conftool/dbconfig/20241210-094805-root.json
[09:48:51] <logmsgbot>	 !log aqu@deploy2002 Finished deploy [airflow-dags/analytics@7428c06]: Backfill webrequest actor metrics 2024 12 (duration: 07m 22s)
[09:48:53] <logmsgbot>	 !log aqu@deploy2002 Started deploy [airflow-dags/analytics@7428c06]: Backfill webrequest actor metrics 2024 12
[09:49:21] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1077.eqiad.wmnet with OS bookworm
[09:50:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2045 (re)pooling @ 50%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71670 and previous config saved to /var/cache/conftool/dbconfig/20241210-095027-root.json
[09:53:02] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1079.eqiad.wmnet with OS bookworm
[09:55:06] <wikibugs>	 (03CR) 10Hamish: [C:03+1] "I've tested a similar patch on my private server, LGTM." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100228 (https://phabricator.wikimedia.org/T380020) (owner: 10Stang)
[09:56:30] <icinga-wm>	 RECOVERY - BGP status on lsw1-f3-eqiad.mgmt is OK: BGP OK - up: 22, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:56:31] <logmsgbot>	 !log aqu@deploy2002 Finished deploy [airflow-dags/analytics@7428c06]: Backfill webrequest actor metrics 2024 12 (duration: 07m 37s)
[09:56:34] <logmsgbot>	 !log aqu@deploy2002 Started deploy [airflow-dags/analytics@7428c06]: Backfill webrequest actor metrics 2024 12
[09:56:51] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1078.eqiad.wmnet with OS bookworm
[09:57:12] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for Ammarpad - https://phabricator.wikimedia.org/T381851 (10Ammarpad) 03NEW
[09:57:41] <jelto>	 !log homer 'lsw1-f3-eqiad*' commit 'T377876' , homer 'lsw1-e3-eqiad*' commit 'T377876'
[09:57:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:45] <stashbot>	 T377876: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876
[09:59:00] <wikibugs>	 (03CR) 10Slyngshede: "I forgot about this one." [software/bitu] - 10https://gerrit.wikimedia.org/r/1098881 (https://phabricator.wikimedia.org/T380998) (owner: 10Slyngshede)
[10:00:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1076-1079].eqiad.wmnet
[10:00:25] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1076-1079].eqiad.wmnet
[10:01:04] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10392875 (10Jelto)
[10:03:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2024 (re)pooling @ 75%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71672 and previous config saved to /var/cache/conftool/dbconfig/20241210-100310-root.json
[10:04:19] <wikibugs>	 (03PS1) 10Jelto: Rename kubernetes[1055-1058] to wikikube-worker[1080-1083] [puppet] - 10https://gerrit.wikimedia.org/r/1101818 (https://phabricator.wikimedia.org/T377876)
[10:05:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2045 (re)pooling @ 75%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71673 and previous config saved to /var/cache/conftool/dbconfig/20241210-100532-root.json
[10:05:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good!" [software/bitu] - 10https://gerrit.wikimedia.org/r/1100388 (https://phabricator.wikimedia.org/T381075) (owner: 10Slyngshede)
[10:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:10:16] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for Ammarpad - https://phabricator.wikimedia.org/T381851#10392897 (10Ammarpad)
[10:12:09] <wikibugs>	 (03CR) 10Slyngshede: Updated notification handling (031 comment) [software/bitu] - 10https://gerrit.wikimedia.org/r/1100388 (https://phabricator.wikimedia.org/T381075) (owner: 10Slyngshede)
[10:12:13] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Updated notification handling [software/bitu] - 10https://gerrit.wikimedia.org/r/1100388 (https://phabricator.wikimedia.org/T381075) (owner: 10Slyngshede)
[10:12:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "The patch looks good, but I'm wondering about the context, all logins go via CAS,so this is only relevant for non-WMF deployments, right?" [software/bitu] - 10https://gerrit.wikimedia.org/r/1098881 (https://phabricator.wikimedia.org/T380998) (owner: 10Slyngshede)
[10:17:26] <logmsgbot>	 !log aqu@deploy2002 Finished deploy [airflow-dags/analytics@7428c06]: Backfill webrequest actor metrics 2024 12 (duration: 20m 51s)
[10:17:43] <wikibugs>	 (03Merged) 10jenkins-bot: Updated notification handling [software/bitu] - 10https://gerrit.wikimedia.org/r/1100388 (https://phabricator.wikimedia.org/T381075) (owner: 10Slyngshede)
[10:18:14] <wikibugs>	 (03PS1) 10Slyngshede: P:idm add notification settings to test system [puppet] - 10https://gerrit.wikimedia.org/r/1101820 (https://phabricator.wikimedia.org/T381075)
[10:18:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2024 (re)pooling @ 100%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71674 and previous config saved to /var/cache/conftool/dbconfig/20241210-101815-root.json
[10:19:13] <wikibugs>	 (03CR) 10Slyngshede: "This is for people who are not signed in, but goes to view the account block/unblock logs, which are public and have a menu." [software/bitu] - 10https://gerrit.wikimedia.org/r/1098881 (https://phabricator.wikimedia.org/T380998) (owner: 10Slyngshede)
[10:19:16] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Only show sign in link for anonymous users [software/bitu] - 10https://gerrit.wikimedia.org/r/1098881 (https://phabricator.wikimedia.org/T380998) (owner: 10Slyngshede)
[10:20:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2045 (re)pooling @ 100%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71675 and previous config saved to /var/cache/conftool/dbconfig/20241210-102038-root.json
[10:21:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1101820 (https://phabricator.wikimedia.org/T381075) (owner: 10Slyngshede)
[10:22:23] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] P:idm add notification settings to test system [puppet] - 10https://gerrit.wikimedia.org/r/1101820 (https://phabricator.wikimedia.org/T381075) (owner: 10Slyngshede)
[10:23:39] <wikibugs>	 (03Merged) 10jenkins-bot: Only show sign in link for anonymous users [software/bitu] - 10https://gerrit.wikimedia.org/r/1098881 (https://phabricator.wikimedia.org/T380998) (owner: 10Slyngshede)
[10:29:16] <wikibugs>	 (03PS1) 10Slyngshede: P:idm fix setting configuration error. [puppet] - 10https://gerrit.wikimedia.org/r/1101823
[10:30:14] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] P:idm fix setting configuration error. [puppet] - 10https://gerrit.wikimedia.org/r/1101823 (owner: 10Slyngshede)
[10:38:08] <wikibugs>	 (03CR) 10Mvolz: "Yeah, sorry for the review spam! PipelineBot automatically adds the reviewers from the merged change." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099658 (owner: 10PipelineBot)
[10:50:25] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Rename kubernetes[1055-1058] to wikikube-worker[1080-1083] [puppet] - 10https://gerrit.wikimedia.org/r/1101818 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[10:53:59] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1055-1058].eqiad.wmnet
[10:56:20] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1055-1058].eqiad.wmnet
[11:00:00] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename kubernetes[1055-1058] to wikikube-worker[1080-1083] [puppet] - 10https://gerrit.wikimedia.org/r/1101818 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T1100)
[11:00:54] <wikibugs>	 (03PS1) 10Wangombe: Event Logging: Update streamName and schemaId [extensions/Translate] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101830 (https://phabricator.wikimedia.org/T364460)
[11:01:57] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1055 to wikikube-worker1080
[11:02:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[11:02:56] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove obsolete reference to wikitech password changes [software/bitu] - 10https://gerrit.wikimedia.org/r/1101831
[11:03:21] <wikibugs>	 (03PS4) 10Máté Szabó: Prep pilot wiki config for IRS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1099213 (https://phabricator.wikimedia.org/T374105)
[11:03:28] <wikibugs>	 (03CR) 10Máté Szabó: Prep pilot wiki config for IRS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1099213 (https://phabricator.wikimedia.org/T374105) (owner: 10Máté Szabó)
[11:03:48] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/Translate] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101830 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[11:04:32] <icinga-wm>	 PROBLEM - BGP status on lsw1-f3-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[11:04:33] <wikibugs>	 (03PS1) 10Muehlenhoff: Polish password reset statement a little [software/bitu] - 10https://gerrit.wikimedia.org/r/1101832
[11:05:49] <claime>	 !log Deploying no-op cfssl-issuer admin_ng change - 1101455
[11:05:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:06:14] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[11:07:24] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1055 to wikikube-worker1080 - jelto@cumin1002"
[11:08:15] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[11:08:20] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1055 to wikikube-worker1080 - jelto@cumin1002"
[11:08:20] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:08:21] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1080
[11:08:25] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[11:08:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1080
[11:09:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1055 to wikikube-worker1080
[11:09:13] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[11:09:45] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
[11:09:59] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
[11:10:04] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1056 to wikikube-worker1081
[11:10:25] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[11:10:33] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
[11:10:52] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
[11:11:02] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
[11:11:14] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
[11:11:51] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
[11:12:10] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[11:12:27] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[11:12:40] <jinxer-wm>	 FIRING: [2x] KubernetesRsyslogDown: rsyslog on kubernetes1057:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[11:12:44] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[11:12:52] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[11:13:02] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[11:14:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1056 to wikikube-worker1081 - jelto@cumin1002"
[11:14:19] <claime>	 !log Done deploying no-op cfssl-issuer admin_ng change - 1101455
[11:14:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:14:34] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1056 to wikikube-worker1081 - jelto@cumin1002"
[11:14:34] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:14:34] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1081
[11:14:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1081
[11:15:24] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1056 to wikikube-worker1081
[11:15:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1057 to wikikube-worker1082
[11:16:10] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[11:18:26] <wikibugs>	 (03PS1) 10Physikerwelt: Add new properties for math popups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101834 (https://phabricator.wikimedia.org/T381046)
[11:18:40] <wikibugs>	 (03PS2) 10Physikerwelt: Add new properties for math popups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101834 (https://phabricator.wikimedia.org/T381046)
[11:19:48] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1057 to wikikube-worker1082 - jelto@cumin1002"
[11:19:51] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101834 (https://phabricator.wikimedia.org/T381046) (owner: 10Physikerwelt)
[11:20:11] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1057 to wikikube-worker1082 - jelto@cumin1002"
[11:20:11] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:20:12] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1082
[11:21:29] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1082
[11:22:01] <wikibugs>	 (03PS1) 10Muehlenhoff: puppetdb: Use debian_postgresql_version [puppet] - 10https://gerrit.wikimedia.org/r/1101835
[11:22:08] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1057 to wikikube-worker1082
[11:23:40] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1058 to wikikube-worker1083
[11:24:01] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[11:25:15] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101835 (owner: 10Muehlenhoff)
[11:26:57] <wikibugs>	 (03CR) 10Physikerwelt: "I scheduled this for tonight's deployment window. However, I understand that this can be merged at any time, as this only affects wikipedi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101834 (https://phabricator.wikimedia.org/T381046) (owner: 10Physikerwelt)
[11:27:29] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1058 to wikikube-worker1083 - jelto@cumin1002"
[11:27:48] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1058 to wikikube-worker1083 - jelto@cumin1002"
[11:27:48] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:27:49] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1083
[11:28:50] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "Do we need the `ENV:RW_PROTO` thing? Can it just be `https`?" [puppet] - 10https://gerrit.wikimedia.org/r/1101462 (https://phabricator.wikimedia.org/T381625) (owner: 10Gergő Tisza)
[11:28:56] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1083
[11:29:35] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1058 to wikikube-worker1083
[11:29:48] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1080.eqiad.wmnet wikikube-worker1081.eqiad.wmnet wikikube-worker1082.eqiad.wmnet wikikube-worker1083.eqiad.wmnet on all recursors
[11:29:52] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1080.eqiad.wmnet wikikube-worker1081.eqiad.wmnet wikikube-worker1082.eqiad.wmnet wikikube-worker1083.eqiad.wmnet on all recursors
[11:32:41] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1080.eqiad.wmnet with OS bookworm
[11:33:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1081.eqiad.wmnet with OS bookworm
[11:33:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1082.eqiad.wmnet with OS bookworm
[11:33:40] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1083.eqiad.wmnet with OS bookworm
[11:35:10] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "LGTM" [software/bitu] - 10https://gerrit.wikimedia.org/r/1101831 (owner: 10Muehlenhoff)
[11:35:12] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Remove obsolete reference to wikitech password changes [software/bitu] - 10https://gerrit.wikimedia.org/r/1101831 (owner: 10Muehlenhoff)
[11:35:46] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "LGTM" [software/bitu] - 10https://gerrit.wikimedia.org/r/1101832 (owner: 10Muehlenhoff)
[11:35:50] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Polish password reset statement a little [software/bitu] - 10https://gerrit.wikimedia.org/r/1101832 (owner: 10Muehlenhoff)
[11:38:12] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 10decommission-hardware: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10393186 (10VRiley-WMF)
[11:41:08] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 10decommission-hardware: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10393191 (10VRiley-WMF) Hey @andrew  I was able to do the first unit, however, when I was running the script on the other 2 devices, it s...
[11:42:38] <wikibugs>	 (03Merged) 10jenkins-bot: Remove obsolete reference to wikitech password changes [software/bitu] - 10https://gerrit.wikimedia.org/r/1101831 (owner: 10Muehlenhoff)
[11:42:38] <wikibugs>	 (03Merged) 10jenkins-bot: Polish password reset statement a little [software/bitu] - 10https://gerrit.wikimedia.org/r/1101832 (owner: 10Muehlenhoff)
[11:45:12] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101839
[11:46:43] <wikibugs>	 (03PS2) 10Samtar: IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101545 (https://phabricator.wikimedia.org/T377121)
[11:47:06] <TheresNoTime>	 jouncebot: nowandnext
[11:47:06] <jouncebot>	 For the next 0 hour(s) and 12 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T1100)
[11:47:06] <jouncebot>	 In 1 hour(s) and 12 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T1300)
[11:49:25] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1082.eqiad.wmnet with reason: host reimage
[11:49:44] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1083.eqiad.wmnet with reason: host reimage
[11:50:23] <TheresNoTime>	 I intend to deploy https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1101545 in a moment - any issues? :)
[11:51:44] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1080.eqiad.wmnet with reason: host reimage
[11:52:27] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by samtar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101545 (https://phabricator.wikimedia.org/T377121) (owner: 10Samtar)
[11:53:08] <wikibugs>	 (03Merged) 10jenkins-bot: IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101545 (https://phabricator.wikimedia.org/T377121) (owner: 10Samtar)
[11:53:12] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1082.eqiad.wmnet with reason: host reimage
[11:53:29] <logmsgbot>	 !log samtar@deploy2002 Started scap sync-world: Backport for [[gerrit:1101545|IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki (T377121)]]
[11:53:32] <stashbot>	 T377121: Deploy Codex Special:Block / Multiblocks - https://phabricator.wikimedia.org/T377121
[11:53:34] <wikibugs>	 (03CR) 10Abijeet Patro: [C:03+1] Event Logging: Update streamName and schemaId [extensions/Translate] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101830 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[11:56:33] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1080.eqiad.wmnet with reason: host reimage
[11:56:34] <wikibugs>	 (03PS8) 10Hnowlan: mediawiki: add multi-job support to mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099752 (https://phabricator.wikimedia.org/T371701)
[11:58:32] <logmsgbot>	 !log samtar@deploy2002 samtar: Backport for [[gerrit:1101545|IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki (T377121)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[11:58:35] <stashbot>	 T377121: Deploy Codex Special:Block / Multiblocks - https://phabricator.wikimedia.org/T377121
[11:58:37] * TheresNoTime testing.. ^
[12:00:14] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1083.eqiad.wmnet with reason: host reimage
[12:00:26] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: s2 on dbstore1007 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:02:31] <logmsgbot>	 !log samtar@deploy2002 samtar: Continuing with sync
[12:07:35] <logmsgbot>	 !log samtar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101545|IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki (T377121)]] (duration: 14m 06s)
[12:07:39] <stashbot>	 T377121: Deploy Codex Special:Block / Multiblocks - https://phabricator.wikimedia.org/T377121
[12:07:53] <wikibugs>	 (03PS1) 10Phuedx: Beta Cluster: Enable MetricsPlatform extension on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101840 (https://phabricator.wikimedia.org/T381849)
[12:10:59] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Platform, 06DC-Ops: Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10393376 (10VRiley-WMF) Making a note that we have very little room for 10 gig servers in rows A-D. However, we have more room in E and F. As long as they are not in the same...
[12:11:44] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1082.eqiad.wmnet with OS bookworm
[12:13:56] <wikibugs>	 (03CR) 10Gmodena: dse-k8s-services: rename mw-dumps helmfiles. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[12:15:59] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1080.eqiad.wmnet with OS bookworm
[12:19:09] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1083.eqiad.wmnet with OS bookworm
[12:36:08] <icinga-wm>	 PROBLEM - Host ripe-atlas-eqsin is DOWN: PING CRITICAL - Packet loss = 100%
[12:39:18] <icinga-wm>	 PROBLEM - Host ripe-atlas-eqsin IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[12:40:54] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Manage VRRP priority from Netbox - https://phabricator.wikimedia.org/T381873 (10cmooney) 03NEW p:05Triage→03Low
[12:51:52] <wikibugs>	 (03PS1) 10Gmodena: data-engineering: add alerts for dumps2 flink app. [alerts] - 10https://gerrit.wikimedia.org/r/1101849 (https://phabricator.wikimedia.org/T379362)
[12:52:43] <jinxer-wm>	 FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[12:53:05] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1081.eqiad.wmnet with OS bookworm
[12:53:07] <wikibugs>	 (03CR) 10CI reject: [V:04-1] data-engineering: add alerts for dumps2 flink app. [alerts] - 10https://gerrit.wikimedia.org/r/1101849 (https://phabricator.wikimedia.org/T379362) (owner: 10Gmodena)
[12:53:43] <jinxer-wm>	 FIRING: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[12:54:12] <wikibugs>	 (03CR) 10Gmodena: data-engineering: add alerts for dumps2 flink app. (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1101849 (https://phabricator.wikimedia.org/T379362) (owner: 10Gmodena)
[12:55:40] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1081.eqiad.wmnet with OS bookworm
[12:57:09] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service ml-staging-ctrl2001:6443 has failed probes (http_ml_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ml-staging-ctrl2001:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:58:56] <wikibugs>	 (03PS1) 10Slyngshede: Show expiry date for password reset link [software/bitu] - 10https://gerrit.wikimedia.org/r/1101851
[12:58:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] maps::postgresql_common: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1101461 (owner: 10Muehlenhoff)
[12:59:14] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service ml-staging-ctrl2001:6443 has failed probes (http_ml_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ml-staging-ctrl2001:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:00:05] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T1300)
[13:00:12] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ml-lab1002.eqiad.wmnet with reason: Moving to analytics network
[13:00:12] <wikibugs>	 (03PS2) 10Slyngshede: Show expiry date for password reset link [software/bitu] - 10https://gerrit.wikimedia.org/r/1101851
[13:00:26] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ml-lab1002.eqiad.wmnet with reason: Moving to analytics network
[13:01:52] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.hosts.decommission for hosts ml-lab1002.eqiad.wmnet
[13:03:43] <jinxer-wm>	 RESOLVED: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[13:07:43] <jinxer-wm>	 RESOLVED: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[13:10:21] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] "Currently, `WMF_MAINTENANCE_OFFLINE` is set in `mwscript.py`. I don't understand this part well enough to say whether that is also execute" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100864 (https://phabricator.wikimedia.org/T380609) (owner: 10Cwhite)
[13:13:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [software/bitu] - 10https://gerrit.wikimedia.org/r/1101851 (owner: 10Slyngshede)
[13:18:31] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Show expiry date for password reset link [software/bitu] - 10https://gerrit.wikimedia.org/r/1101851 (owner: 10Slyngshede)
[13:19:43] <jinxer-wm>	 FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[13:19:44] <jinxer-wm>	 FIRING: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[13:22:45] <wikibugs>	 (03PS1) 10Muehlenhoff: Clarify access request text [software/bitu] - 10https://gerrit.wikimedia.org/r/1101857
[13:24:53] <wikibugs>	 (03Merged) 10jenkins-bot: Show expiry date for password reset link [software/bitu] - 10https://gerrit.wikimedia.org/r/1101851 (owner: 10Slyngshede)
[13:26:00] <icinga-wm>	 PROBLEM - Host poolcounter2005 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:08] <icinga-wm>	 PROBLEM - Host irc2003 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:22] <icinga-wm>	 PROBLEM - Host config-master2001 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:26] <icinga-wm>	 PROBLEM - Host crm2001 is DOWN: PING CRITICAL - Packet loss = 100%
[13:26:52] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti2027 is CRITICAL: PROCS CRITICAL: 3 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[13:26:57] <jinxer-wm>	 FIRING: ProbeDown: Service thumbor:8800 has failed probes (http_thumbor_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#thumbor:8800 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:27:09] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service irc2003:6667 has failed probes (tcp_ircstream_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#irc2003:6667 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:27:24] <godog>	 checking
[13:27:36] <moritzm>	 I'm powercycling ganeti2027
[13:28:35] <godog>	 thx moritzm, thumbor paged and I'm not sure that's related?
[13:29:05] <claime>	 Bet thumbor has failed probes because poolcounter2005 is down
[13:29:07] <moritzm>	 ganeti2027 is one of the ganeti nodes still running Bullseye and these sometimes hit a DRBD bug T348730
[13:29:08] <stashbot>	 T348730: repeated Ganeti VMs deadlocks due to DRBD bug on bullseye - https://phabricator.wikimedia.org/T348730
[13:29:14] <fabfur>	 !incidents
[13:29:15] <sirenbot>	 5533 (ACKED)  ProbeDown sre (10.2.1.24 ip4 thumbor:8800 probes/service http_thumbor_ip4 codfw)
[13:29:15] <sirenbot>	 5530 (RESOLVED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[13:29:19] <moritzm>	 yeah, it hardcodes poolcounter2005
[13:29:53] <godog>	 hah
[13:29:55] <moritzm>	 ganeti2027 is rebooting and in POST stage, should be back in a minute
[13:30:04] <icinga-wm>	 PROBLEM - Host ganeti2027 is DOWN: PING CRITICAL - Packet loss = 100%
[13:30:42] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job ircstream in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:31:52] <icinga-wm>	 RECOVERY - Host ganeti2027 is UP: PING OK - Packet loss = 0%, RTA = 30.35 ms
[13:31:54] <icinga-wm>	 RECOVERY - ganeti-noded running on ganeti2027 is OK: PROCS OK: 1 process with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[13:32:09] <jinxer-wm>	 FIRING: [3x] ProbeDown: Service ganeti2027:1811 has failed probes (tcp_ganeti_noded_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:32:44] <wikibugs>	 (03PS1) 10Cathal Mooney: Update JunOS templates to use VRRP priority exposed from Netbox [homer/public] - 10https://gerrit.wikimedia.org/r/1101861 (https://phabricator.wikimedia.org/T381873)
[13:33:58] <wikibugs>	 (03PS2) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101092
[13:34:14] <jinxer-wm>	 FIRING: [3x] ProbeDown: Service ganeti2027:1811 has failed probes (tcp_ganeti_noded_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:35:26] <icinga-wm>	 RECOVERY - Host config-master2001 is UP: PING OK - Packet loss = 0%, RTA = 30.40 ms
[13:35:40] <icinga-wm>	 RECOVERY - Host irc2003 is UP: PING OK - Packet loss = 0%, RTA = 30.60 ms
[13:35:40] <icinga-wm>	 RECOVERY - Host poolcounter2005 is UP: PING OK - Packet loss = 0%, RTA = 30.34 ms
[13:35:56] <icinga-wm>	 RECOVERY - Host crm2001 is UP: PING OK - Packet loss = 0%, RTA = 30.58 ms
[13:36:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations: repeated Ganeti VMs deadlocks due to DRBD bug on bullseye - https://phabricator.wikimedia.org/T348730#10393651 (10MoritzMuehlenhoff) Happened again on ganeti2027 today.
[13:36:47] <wikibugs>	 (03CR) 10Dbrant: [C:03+2] mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101092 (owner: 10PipelineBot)
[13:36:57] <moritzm>	 we're down to like 50% bullseye nodes, the remaining ones will be reimaged in January
[13:36:57] <jinxer-wm>	 RESOLVED: ProbeDown: Service thumbor:8800 has failed probes (http_thumbor_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#thumbor:8800 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:37:09] <jinxer-wm>	 RESOLVED: [4x] ProbeDown: Service ganeti2027:1811 has failed probes (tcp_ganeti_noded_ip4)   - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:37:18] <godog>	 ack thx
[13:38:09] <wikibugs>	 (03Merged) 10jenkins-bot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101092 (owner: 10PipelineBot)
[13:40:23] <wikibugs>	 (03PS1) 10Muehlenhoff: Configure new maps nodes with nftables [puppet] - 10https://gerrit.wikimedia.org/r/1101864 (https://phabricator.wikimedia.org/T381565)
[13:40:42] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job ircstream in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:50:05] <wikibugs>	 (03PS1) 10أنون: [enwikinews] & [plwikinews]: Upgrade license to CC BY 4.0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101867 (https://phabricator.wikimedia.org/T381421)
[13:54:14] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101867 (https://phabricator.wikimedia.org/T381421) (owner: 10أنون)
[13:56:14] <wikibugs>	 (03CR) 10أنون: [C:03+1] [enwikinews] & [plwikinews]: Upgrade license to CC BY 4.0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101867 (https://phabricator.wikimedia.org/T381421) (owner: 10أنون)
[14:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T1400).
[14:00:05] <jouncebot>	 Daimona, wangombe_g, and lolekek: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:33] <Daimona>	 Hello there
[14:00:39] <lolekek>	 o/
[14:00:45] <wangombe_g>	 o/
[14:02:11] <Lucas_WMDE>	 I’m slightly busy right now but can probably deploy soon
[14:02:59] <Daimona>	 Mine is a beta change BTW, so fire and forget
[14:03:16] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Comm Error: backplane 0 when reimaging wikikube-worker1081 - https://phabricator.wikimedia.org/T381878 (10Jelto) 03NEW
[14:03:55] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101596 (https://phabricator.wikimedia.org/T380077) (owner: 10Daimona Eaytoy)
[14:03:59] <Lucas_WMDE>	 yeah, let’s start with that one
[14:04:29] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Comm Error: backplane 0 when reimaging wikikube-worker1081 - https://phabricator.wikimedia.org/T381878#10393729 (10Jelto) The following commands have to be executed when the host is back (just noting it down so I don't forget it):  ` c...
[14:04:38] <wikibugs>	 (03Merged) 10jenkins-bot: beta: Enable $wgCampaignEventsEnableEventWikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101596 (https://phabricator.wikimedia.org/T380077) (owner: 10Daimona Eaytoy)
[14:05:44] <wikibugs>	 (03PS1) 10Samtar: Revert "IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101871
[14:06:09] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "Looks good" [software/bitu] - 10https://gerrit.wikimedia.org/r/1101857 (owner: 10Muehlenhoff)
[14:06:11] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Clarify access request text [software/bitu] - 10https://gerrit.wikimedia.org/r/1101857 (owner: 10Muehlenhoff)
[14:09:08] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "LGTM! I have limited understanding of this plugin but the logic looks good! And it is impressive how fast it runs, we'll surely benefit of" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1099225 (https://phabricator.wikimedia.org/T381175) (owner: 10Cathal Mooney)
[14:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:11:14] <wikibugs>	 (03Merged) 10jenkins-bot: Clarify access request text [software/bitu] - 10https://gerrit.wikimedia.org/r/1101857 (owner: 10Muehlenhoff)
[14:14:12] <Daimona>	 Thanks, Lucas!
[14:14:51] <wikibugs>	 (03PS1) 10DDesouza: Reader Survey: Increase coverage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101875 (https://phabricator.wikimedia.org/T378660)
[14:15:27] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Reader Survey: Increase coverage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101875 (https://phabricator.wikimedia.org/T378660) (owner: 10DDesouza)
[14:15:54] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1081.eqiad.wmnet with OS bookworm
[14:16:48] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101875 (https://phabricator.wikimedia.org/T378660) (owner: 10DDesouza)
[14:17:53] <jelto>	 !log homer 'lsw1-f3-eqiad*' commit 'T377876' , homer 'cr*eqiad*' commit 'T377876'
[14:17:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:58] <stashbot>	 T377876: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876
[14:18:00] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[14:18:14] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[14:18:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1169 (T381532)', diff saved to https://phabricator.wikimedia.org/P71678 and previous config saved to /var/cache/conftool/dbconfig/20241210-141820-marostegui.json
[14:18:24] <stashbot>	 T381532: Fix AntiSpoof database schema drifts in production - https://phabricator.wikimedia.org/T381532
[14:18:45] <wikibugs>	 (03CR) 10Stevemunene: [C:03+1] "lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/1101588 (https://phabricator.wikimedia.org/T376150) (owner: 10Bking)
[14:19:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/Translate] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101830 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[14:20:58] <wikibugs>	 (03CR) 10Ottomata: dse-k8s-services: rename mw-dumps helmfiles. (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[14:21:05] <wikibugs>	 (03PS2) 10Samtar: Revert "IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101871
[14:23:06] <wikibugs>	 (03PS1) 10Elukey: Add postgresql maps replica config to k8s' external services [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826)
[14:24:27] <Lucas_WMDE>	 TheresNoTime: zuul says 14mins left for the current backport, if you want to roll out your config change before then…
[14:24:36] <wikibugs>	 10ops-eqiad, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T381881 (10phaultfinder) 03NEW
[14:25:15] <TheresNoTime>	 Lucas_WMDE: likely will, just double-checking I should do it :D
[14:26:07] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4655/co" [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[14:26:38] <Lucas_WMDE>	 ok ^^
[14:27:43] <TheresNoTime>	 Lucas_WMDE: going to do it now
[14:27:52] <Lucas_WMDE>	 alright, I’ll Ctrl+C my scap backport
[14:27:57] <Lucas_WMDE>	 (but let gate-and-submit run through)
[14:28:03] <Lucas_WMDE>	 done
[14:28:05] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by samtar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101871 (owner: 10Samtar)
[14:28:07] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[14:28:10] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[14:28:49] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101871 (owner: 10Samtar)
[14:29:00] <lolekek>	 I would like to cancel my patch for now
[14:29:06] <logmsgbot>	 !log samtar@deploy2002 Started scap sync-world: Backport for [[gerrit:1101871|Revert "IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki"]]
[14:29:07] <TheresNoTime>	 !log revert 1101545 for T377121
[14:29:07] <lolekek>	 Apologies for inconvinience
[14:29:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:12] <stashbot>	 T377121: Deploy Codex Special:Block / Multiblocks - https://phabricator.wikimedia.org/T377121
[14:29:39] <Lucas_WMDE>	 lolekek: no problem – you can just remove it from the deployment calendar
[14:29:53] <Lucas_WMDE>	 (or I can do it if you prefer)
[14:31:06] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.dns.netbox
[14:31:51] <wikibugs>	 (03CR) 10Bking: [C:03+2] wdqs1025: enable as wdqs-internal-main host [puppet] - 10https://gerrit.wikimedia.org/r/1101588 (https://phabricator.wikimedia.org/T376150) (owner: 10Bking)
[14:32:20] <logmsgbot>	 !log samtar@deploy2002 samtar: Backport for [[gerrit:1101871|Revert "IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:32:34] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1080,1082-1083].eqiad.wmnet
[14:32:36] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1080,1082-1083].eqiad.wmnet
[14:32:36] * TheresNoTime testing ^
[14:32:42] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[14:32:45] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[14:32:58] <logmsgbot>	 !log samtar@deploy2002 samtar: Continuing with sync
[14:33:25] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10393828 (10Jelto)
[14:33:55] <wikibugs>	 (03CR) 10Gergő Tisza: "Everything else is using the env variable. Not sure if that's intentional or just leftover from the time when we were mixed-protocol. Mayb" [puppet] - 10https://gerrit.wikimedia.org/r/1101462 (https://phabricator.wikimedia.org/T381625) (owner: 10Gergő Tisza)
[14:35:26] <wikibugs>	 (03CR) 10Brouberol: Add postgresql maps replica config to k8s' external services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[14:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:38:35] <wikibugs>	 (03Merged) 10jenkins-bot: Event Logging: Update streamName and schemaId [extensions/Translate] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101830 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[14:39:22] <logmsgbot>	 !log samtar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101871|Revert "IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki"]] (duration: 10m 15s)
[14:39:37] <TheresNoTime>	 Lucas_WMDE: all yours
[14:39:47] <Lucas_WMDE>	 thanks!
[14:40:00] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1101830|Event Logging: Update streamName and schemaId (T364460)]]
[14:40:03] <stashbot>	 T364460: Implement the instrumentation to track usage of MinT in the Translate extension - https://phabricator.wikimedia.org/T364460
[14:40:22] <wikibugs>	 (03CR) 10Ottomata: dse-k8s-services: rename mw-dumps helmfiles. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[14:42:24] <wikibugs>	 (03CR) 10Elukey: [V:03+1] Add postgresql maps replica config to k8s' external services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[14:42:32] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[14:42:35] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[14:44:30] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, wangombe: Backport for [[gerrit:1101830|Event Logging: Update streamName and schemaId (T364460)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:44:49] <Lucas_WMDE>	 wangombe_g: please test on mwdebug :)
[14:46:21] <wikibugs>	 (03CR) 10Brouberol: Add postgresql maps replica config to k8s' external services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[14:46:26] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ml-lab1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1002"
[14:47:37] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ml-lab1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1002"
[14:47:38] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:47:39] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ml-lab1002.eqiad.wmnet
[14:49:34] <wikibugs>	 (03CR) 10Brouberol: dse-k8s-services: rename mw-dumps helmfiles. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[14:49:58] <Lucas_WMDE>	 wangombe_g: are you still there?
[14:49:59] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] puppetdb: Use debian_postgresql_version [puppet] - 10https://gerrit.wikimedia.org/r/1101835 (owner: 10Muehlenhoff)
[14:51:30] <wikibugs>	 (03PS2) 10Elukey: Add postgresql maps replica config to k8s' external services [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826)
[14:51:59] <Lucas_WMDE>	 lolekek: I’ve removed  it now https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=prev&oldid=2253259
[14:53:52] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4656/co" [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[14:53:56] <wangombe_g>	 testing...
[14:54:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid (k8s) 820.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[14:54:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T381881#10393910 (10phaultfinder)
[14:55:55] <wikibugs>	 (03CR) 10Elukey: [V:03+1] Add postgresql maps replica config to k8s' external services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[14:59:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid (k8s) 812.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[15:00:02] <wangombe_g>	 My patch llooks good.
[15:00:06] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] "Don't forget to add the service/users to `deployment_server.yaml`" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101487 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[15:00:15] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, wangombe: Continuing with sync
[15:00:21] <Lucas_WMDE>	 alright, thanks!
[15:01:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:02:58] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] mediawiki: add multi-job support to mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099752 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[15:03:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71681 and previous config saved to /var/cache/conftool/dbconfig/20241210-150300-root.json
[15:05:09] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Move kafka-main2010 within the same rack - https://phabricator.wikimedia.org/T381788#10393948 (10Jhancock.wm) p:05Medium→03Low
[15:05:40] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101830|Event Logging: Update streamName and schemaId (T364460)]] (duration: 25m 40s)
[15:05:44] <stashbot>	 T364460: Implement the instrumentation to track usage of MinT in the Translate extension - https://phabricator.wikimedia.org/T364460
[15:06:22] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[15:06:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:08:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1025:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:11:00] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s2 on dbstore1007 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[15:11:21] <wikibugs>	 (03CR) 10Elukey: "already done thanks! <3" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101487 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[15:11:36] <wikibugs>	 (03PS1) 10Hnowlan: kubernetes: add mw-videoscaler to scap deployments [puppet] - 10https://gerrit.wikimedia.org/r/1101887 (https://phabricator.wikimedia.org/T371700)
[15:11:55] <wikibugs>	 (03PS1) 10Bking: wdqs1025: remove unneeded host hieradata [puppet] - 10https://gerrit.wikimedia.org/r/1101888 (https://phabricator.wikimedia.org/T376150)
[15:12:11] <wikibugs>	 (03PS1) 10Hnowlan: mediawiki: get mercurius label from mediawiki image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101889 (https://phabricator.wikimedia.org/T371700)
[15:12:38] <wikibugs>	 (03PS2) 10Hnowlan: kubernetes: add mw-videoscaler to scap deployments [puppet] - 10https://gerrit.wikimedia.org/r/1101887 (https://phabricator.wikimedia.org/T371700)
[15:13:19] <wikibugs>	 (03CR) 10CI reject: [V:04-1] kubernetes: add mw-videoscaler to scap deployments [puppet] - 10https://gerrit.wikimedia.org/r/1101887 (https://phabricator.wikimedia.org/T371700) (owner: 10Hnowlan)
[15:13:53] <wikibugs>	 (03PS4) 10Muehlenhoff: Add ferm macro/nftables set for loadbalancer nodes [puppet] - 10https://gerrit.wikimedia.org/r/1098936
[15:13:56] <wikibugs>	 (03PS3) 10Hnowlan: kubernetes: add mw-videoscaler to scap deployments [puppet] - 10https://gerrit.wikimedia.org/r/1101887 (https://phabricator.wikimedia.org/T371700)
[15:14:26] <wikibugs>	 (03PS3) 10Elukey: Add postgresql maps replica config to k8s' external services [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826)
[15:15:04] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1025.eqiad.wmnet with reason: T376150
[15:15:07] <stashbot>	 T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly - https://phabricator.wikimedia.org/T376150
[15:15:19] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1025.eqiad.wmnet with reason: T376150
[15:16:50] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4657/co" [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[15:17:47] <wikibugs>	 (03CR) 10Elukey: [V:03+1] Add postgresql maps replica config to k8s' external services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[15:18:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71682 and previous config saved to /var/cache/conftool/dbconfig/20241210-151805-root.json
[15:23:49] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[15:25:26] <wikibugs>	 (03CR) 10JMeybohm: services: add helmfile config for Kartotherian (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101488 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[15:28:09] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10394045 (10Jhancock.wm) @Jelto heads up, these are showing up in a netbox report.  >Device is Active in Netbox but is missing from PuppetDB (should be ('decommis...
[15:30:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission ganeti1009 / ganeti1016 / ganeti1017 / ganeti1018 / ganeti1020 - https://phabricator.wikimedia.org/T381652#10394062 (10Jhancock.wm) @MoritzMuehlenhoff heads up, ganeti1009 is triggering an alert in netbox.  > Device is in PuppetDB but is D...
[15:30:54] <wikibugs>	 (03PS1) 10Slyngshede: Inform users that their permission request have been approved/rejected [software/bitu] - 10https://gerrit.wikimedia.org/r/1101894
[15:32:18] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1098936 (owner: 10Muehlenhoff)
[15:33:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71683 and previous config saved to /var/cache/conftool/dbconfig/20241210-153311-root.json
[15:33:13] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] kubernetes: add mw-videoscaler to scap deployments [puppet] - 10https://gerrit.wikimedia.org/r/1101887 (https://phabricator.wikimedia.org/T371700) (owner: 10Hnowlan)
[15:33:55] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bullseye 11.10 point update - https://phabricator.wikimedia.org/T368288#10394076 (10MoritzMuehlenhoff)
[15:34:04] <wikibugs>	 10ops-codfw, 06SRE, 06cloud-services-team, 06DC-Ops: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T380479#10394077 (10Andrew) Timing is flexible although I'd like to do a graceful shutdown and check after the fact. Can this wait until next wee...
[15:34:13] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bullseye 11.10 point update - https://phabricator.wikimedia.org/T368288#10394079 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff All resolved.
[15:35:52] <moritzm>	 !log installing imagemagick security updates
[15:35:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:59] <wikibugs>	 (03CR) 10Elukey: [V:03+1 C:03+2] Add postgresql maps replica config to k8s' external services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1101876 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[15:37:36] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mediawiki: add multi-job support to mercurius (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099752 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[15:39:40] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: add multi-job support to mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099752 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[15:42:33] <wikibugs>	 (03PS1) 10Herron: pyrra: onboard liftwing api ng latency/availability [puppet] - 10https://gerrit.wikimedia.org/r/1101896 (https://phabricator.wikimedia.org/T302995)
[15:44:28] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons: Interieur - 's-Gravenhage - 20085391 - RCE.jpg inconsistent, needs new upload - https://phabricator.wikimedia.org/T381891 (10MatthewVernon) 03NEW
[15:44:52] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: ManagementSSHDown - https://phabricator.wikimedia.org/T381843#10394134 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm reseated mgmt cable. pings
[15:45:04] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[15:45:46] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[15:46:14] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q2:rack/setup/install puppetserver2004 - https://phabricator.wikimedia.org/T381274#10394138 (10Jhancock.wm)
[15:47:40] <wikibugs>	 (03PS6) 10Elukey: charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (https://phabricator.wikimedia.org/T216826)
[15:47:40] <wikibugs>	 (03PS2) 10Elukey: admin_ng: add the kartotherian namespace on Wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101487 (https://phabricator.wikimedia.org/T216826)
[15:47:40] <wikibugs>	 (03PS2) 10Elukey: services: add helmfile config for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101488 (https://phabricator.wikimedia.org/T216826)
[15:47:41] <wikibugs>	 (03PS1) 10Elukey: services: use external_services for maps read replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101897
[15:48:13] <moritzm>	 !log installing usb.ids updates from Bullseye point release
[15:48:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71684 and previous config saved to /var/cache/conftool/dbconfig/20241210-154816-root.json
[15:48:50] <wikibugs>	 10ops-codfw, 06SRE, 06cloud-services-team, 06DC-Ops: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T380479#10394157 (10Jhancock.wm) It can wait until you're back from offsite. Enjoy =)
[15:51:13] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bullseye 11.11 point update - https://phabricator.wikimedia.org/T373795#10394163 (10MoritzMuehlenhoff)
[15:52:08] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10394165 (10Jelto) >>! In T381504#10394045, @Jhancock.wm wrote: > @Jelto heads up, these are showing up in a netbox report.  >>Device is Active in Netbox but is m...
[15:52:33] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
[15:52:39] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
[15:53:02] <wikibugs>	 (03PS1) 10Clare Ming: Remove extraneous config for Metrics Platform instruments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101898 (https://phabricator.wikimedia.org/T356939)
[15:53:23] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
[15:53:28] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
[15:58:39] <taavi>	 jouncebot: nowandnext
[15:58:39] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 1 minute(s)
[15:58:39] <jouncebot>	 In 0 hour(s) and 1 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T1600)
[15:58:57] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101840 (https://phabricator.wikimedia.org/T381849) (owner: 10Phuedx)
[15:59:43] <wikibugs>	 (03CR) 10Andrea Denisse: "Adding Scott as reviewer since he's On Clinic Duty this week." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577 (owner: 10Arlolra)
[16:00:04] <jouncebot>	 eoghan, jelto, arnoldokoth, and mutante: SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T1600). Please do the needful.
[16:00:33] <wikibugs>	 (03PS2) 10Elukey: services: use external_services for maps read replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101897
[16:00:33] <wikibugs>	 (03PS7) 10Elukey: charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (https://phabricator.wikimedia.org/T216826)
[16:00:33] <wikibugs>	 (03PS3) 10Elukey: admin_ng: add the kartotherian namespace on Wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101487 (https://phabricator.wikimedia.org/T216826)
[16:00:34] <wikibugs>	 (03PS3) 10Elukey: services: add helmfile config for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101488 (https://phabricator.wikimedia.org/T216826)
[16:00:35] <wikibugs>	 (03PS2) 10Herron: pyrra: onboard liftwing api ng latency/availability [puppet] - 10https://gerrit.wikimedia.org/r/1101896 (https://phabricator.wikimedia.org/T302995)
[16:00:36] <wikibugs>	 (03CR) 10Herron: [C:03+2] "self merge for onboarding" [puppet] - 10https://gerrit.wikimedia.org/r/1101896 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[16:03:04] <wikibugs>	 (03CR) 10SBassett: [C:03+1] Add Atieno's public key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577 (owner: 10Arlolra)
[16:03:16] <taavi>	 is anyone planning to deploy anything in the collab window?
[16:03:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71685 and previous config saved to /var/cache/conftool/dbconfig/20241210-160322-root.json
[16:04:10] <wikibugs>	 (03PS3) 10Elukey: services: use external_services for maps read replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101897
[16:04:10] <wikibugs>	 (03PS8) 10Elukey: charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (https://phabricator.wikimedia.org/T216826)
[16:04:10] <wikibugs>	 (03PS4) 10Elukey: admin_ng: add the kartotherian namespace on Wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101487 (https://phabricator.wikimedia.org/T216826)
[16:04:11] <wikibugs>	 (03PS4) 10Elukey: services: add helmfile config for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101488 (https://phabricator.wikimedia.org/T216826)
[16:06:23] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: ganeti1009.eqiad.wmnet
[16:06:25] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: ganeti1009.eqiad.wmnet
[16:06:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission ganeti1009 / ganeti1016 / ganeti1017 / ganeti1018 / ganeti1020 - https://phabricator.wikimedia.org/T381652#10394193 (10ops-monitoring-bot) Cookbook cookbooks.sre.debmonitor.remove-hosts run by jmm: for 1 hosts: ganeti1009.eqiad.wmnet
[16:07:02] <moritzm>	 !log manually clean out ganeti1009 from puppetdb, decom cookbook got interrupted T381652
[16:07:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:05] <stashbot>	 T381652: decommission ganeti1009 / ganeti1016 / ganeti1017 / ganeti1018 / ganeti1020 - https://phabricator.wikimedia.org/T381652
[16:07:56] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] mediawiki: get mercurius label from mediawiki image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101889 (https://phabricator.wikimedia.org/T371700) (owner: 10Hnowlan)
[16:08:05] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission ganeti1009 / ganeti1016 / ganeti1017 / ganeti1018 / ganeti1020 - https://phabricator.wikimedia.org/T381652#10394216 (10MoritzMuehlenhoff) >>! In T381652#10394062, @Jhancock.wm wrote: > @MoritzMuehlenhoff heads up, ganeti1009 is triggering...
[16:08:31] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.dns.netbox
[16:08:55] <logmsgbot>	 !log elukey@deploy2002 helmfile [eqiad] START helmfile.d/admin 'sync'.
[16:09:21] <logmsgbot>	 !log elukey@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'sync'.
[16:09:27] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.downtime for 0:10:00 on phab1004.eqiad.wmnet with reason: nftables
[16:09:29] <mutante>	 !log phabricator production host needs a maintenance reboot - expect short downtime
[16:09:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:40] <logmsgbot>	 !log elukey@deploy2002 helmfile [codfw] START helmfile.d/admin 'sync'.
[16:09:43] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on phab1004.eqiad.wmnet with reason: nftables
[16:09:48] <logmsgbot>	 !log elukey@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'sync'.
[16:09:57] <wikibugs>	 (03CR) 10PleaseStand: "I notice that the new key was added at the end, below a few keys that seem to be marked as historic or no longer in use, in that a range o" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577 (owner: 10Arlolra)
[16:10:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] puppetdb: Use debian_postgresql_version [puppet] - 10https://gerrit.wikimedia.org/r/1101835 (owner: 10Muehlenhoff)
[16:12:11] <wikibugs>	 (03CR) 10Elukey: "Hey folks I am trying to simplify the config, and I noticed that we use the maps masters for read traffic. I created read-replicas-only la" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101897 (owner: 10Elukey)
[16:12:46] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS for newly-provisioned ml-lab1002 - klausman@cumin1002"
[16:12:50] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS for newly-provisioned ml-lab1002 - klausman@cumin1002"
[16:12:50] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:13:07] <moritzm>	 !log installing postgresql-15 security updates
[16:13:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:24] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply
[16:13:36] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply
[16:15:02] <icinga-wm>	 PROBLEM - Host gitlab-runner2004 is DOWN: PING CRITICAL - Packet loss = 100%
[16:15:41] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1002
[16:15:59] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-lab1002
[16:20:07] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.hosts.provision for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[16:21:47] <wikibugs>	 (03PS1) 10Hnowlan: videoscaling: disable changeprop webVideoTranscode, enable mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101899 (https://phabricator.wikimedia.org/T371701)
[16:23:00] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Confirmed out of band with Atieno that this is in fact their signing public key." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577 (owner: 10Arlolra)
[16:24:26] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons: Interieur - 's-Gravenhage - 20089866 - RCE.jpg inconsistent, needs new upload - https://phabricator.wikimedia.org/T381893 (10MatthewVernon) 03NEW
[16:25:20] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[16:26:01] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
[16:27:09] <wikibugs>	 (03CR) 10Arlolra: "Maybe?  Do you think the keys will interacted with other than with `gpg --fetch-keys "https://www.mediawiki.org/keys/keys.txt"`?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577 (owner: 10Arlolra)
[16:30:08] <wikibugs>	 (03CR) 10Scott French: [C:03+1] videoscaling: disable changeprop webVideoTranscode, enable mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101899 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:30:23] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Adjust how we build list of server BGP peerings for CRs [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1099225 (https://phabricator.wikimedia.org/T381175) (owner: 10Cathal Mooney)
[16:31:56] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577 (owner: 10Arlolra)
[16:35:55] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] videoscaling: disable changeprop webVideoTranscode, enable mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101899 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:35:59] <wikibugs>	 (03PS1) 10Cathal Mooney: Expose VRRP group assignment priority to Homer templates [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1101903 (https://phabricator.wikimedia.org/T381873)
[16:37:32] <wikibugs>	 (03Merged) 10jenkins-bot: videoscaling: disable changeprop webVideoTranscode, enable mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101899 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:37:57] <wikibugs>	 (03PS3) 10Fabfur: Enable new countries for magru (Cohort 3) [dns] - 10https://gerrit.wikimedia.org/r/1100084 (https://phabricator.wikimedia.org/T371141)
[16:38:37] <wikibugs>	 (03PS4) 10Fabfur: Enable new countries for magru (Cohort 3) [dns] - 10https://gerrit.wikimedia.org/r/1100084 (https://phabricator.wikimedia.org/T371141)
[16:38:41] <logmsgbot>	 !log denisse@deploy2002 Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.10.0 - T381785
[16:38:43] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons: Interieur - 's-Gravenhage - 20085391 - RCE.jpg inconsistent, needs new upload - https://phabricator.wikimedia.org/T381891#10394295 (10MatthewVernon) FTR, `rclone` does at least notice something went wrong: ` Dec  9 03:26:31 ms-be2069 swift-rclone-sync[2652562]: ERROR :...
[16:38:55] <logmsgbot>	 !log denisse@deploy2002 Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.10.0 - T381785 (duration: 00m 13s)
[16:39:28] <wikibugs>	 (03PS1) 10Scott French: shellbox: release latest image 2024-12-07-073046 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101902 (https://phabricator.wikimedia.org/T381830)
[16:40:07] <icinga-wm>	 RECOVERY - Disk space on ml-lab1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-lab1001&var-datasource=eqiad+prometheus/ops
[16:41:00] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] shellbox: release latest image 2024-12-07-073046 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101902 (https://phabricator.wikimedia.org/T381830) (owner: 10Scott French)
[16:43:27] <wikibugs>	 (03CR) 10Scott French: [C:03+2] shellbox: release latest image 2024-12-07-073046 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101902 (https://phabricator.wikimedia.org/T381830) (owner: 10Scott French)
[16:44:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: librenms-alerts.service on netmon1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:44:47] <wikibugs>	 (03Merged) 10jenkins-bot: shellbox: release latest image 2024-12-07-073046 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101902 (https://phabricator.wikimedia.org/T381830) (owner: 10Scott French)
[16:46:14] <wikibugs>	 (03PS2) 10Hnowlan: mediawiki: get mercurius label from mediawiki image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101889 (https://phabricator.wikimedia.org/T371700)
[16:47:53] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox: apply
[16:48:30] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox: apply
[16:48:42] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
[16:49:03] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
[16:49:14] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-media: apply
[16:49:25] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: librenms-alerts.service on netmon1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:49:29] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
[16:49:41] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
[16:50:16] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[16:50:27] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
[16:50:49] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
[16:51:00] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-video: apply
[16:51:19] <wikibugs>	 (03PS2) 10Cathal Mooney: Update JunOS templates to use VRRP priority exposed from Netbox [homer/public] - 10https://gerrit.wikimedia.org/r/1101861 (https://phabricator.wikimedia.org/T381873)
[16:51:25] <logmsgbot>	 !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
[16:54:25] <jinxer-wm>	 RESOLVED: [5x] SystemdUnitFailed: librenms-alerts.service on netmon1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:54:35] <icinga-wm>	 PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: /var/lib/archiva 8805 MB (3% inode=80%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops
[16:56:18] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox: apply
[16:56:53] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
[16:57:14] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
[16:57:36] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
[16:57:57] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
[16:58:16] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
[16:58:37] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
[16:58:58] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[16:59:03] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[16:59:18] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[16:59:25] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
[16:59:55] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
[17:00:05] <jouncebot>	 jhathaway and rzl: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T1700).
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:00:16] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
[17:01:00] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
[17:02:38] <wikibugs>	 (03PS1) 10CDobbins: Remove eqiad from public and private IP spaces [dns] - 10https://gerrit.wikimedia.org/r/1101908 (https://phabricator.wikimedia.org/T380858)
[17:02:59] <wikibugs>	 (03PS1) 10Fabfur: varnish: pass WME HEAD reqs to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1101909 (https://phabricator.wikimedia.org/T381771)
[17:03:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] varnish: pass WME HEAD reqs to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1101909 (https://phabricator.wikimedia.org/T381771) (owner: 10Fabfur)
[17:03:56] <ottomata>	 !log restarting eventgate-analytics to pick up stream config changes for T381322
[17:04:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:04:02] <stashbot>	 T381322: Rename Flink application and streams to match prod conventions - https://phabricator.wikimedia.org/T381322
[17:04:32] <logmsgbot>	 !log otto@deploy2002 helmfile [staging] START helmfile.d/services/eventgate-analytics: sync
[17:05:11] <logmsgbot>	 !log otto@deploy2002 helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync
[17:05:45] <logmsgbot>	 !log otto@deploy2002 helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync
[17:06:39] <logmsgbot>	 !log otto@deploy2002 helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync
[17:07:41] <logmsgbot>	 !log otto@deploy2002 helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync
[17:08:28] <logmsgbot>	 !log otto@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync
[17:08:35] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox: apply
[17:09:09] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox: apply
[17:09:30] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
[17:09:43] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
[17:10:05] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-media: apply
[17:10:20] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
[17:10:41] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[17:10:57] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[17:11:18] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
[17:11:41] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
[17:12:02] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-video: apply
[17:12:28] <logmsgbot>	 !log klausman@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-lab1002.eqiad.wmnet with OS bookworm
[17:12:58] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
[17:13:15] <swfrench-wmf>	 !log deployed shellbox 2024-12-07-073046 for T381830
[17:13:17] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
[17:13:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:18] <stashbot>	 T381830: Deploy Shellbox 4.1.1 server - https://phabricator.wikimedia.org/T381830
[17:13:58] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s6 #page on db2158 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Index for table recentchanges is corrupt: try to repair it on query. Default database: ruwiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[17:14:34] <herron>	 !incidents
[17:14:35] <sirenbot>	 5534 (UNACKED)  db2158 (paged)/MariaDB Replica SQL: s6 (paged)
[17:14:35] <sirenbot>	 5533 (RESOLVED)  ProbeDown sre (10.2.1.24 ip4 thumbor:8800 probes/service http_thumbor_ip4 codfw)
[17:15:15] <herron>	 !ack 5534
[17:15:16] <sirenbot>	 5534 (ACKED)  db2158 (paged)/MariaDB Replica SQL: s6 (paged)
[17:15:45] <wikibugs>	 (03PS2) 10Fabfur: varnish: pass WME HEAD reqs to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1101909 (https://phabricator.wikimedia.org/T381771)
[17:17:44] <herron>	 I'll depool db2158 in a min unless I hear an objection
[17:18:43] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
[17:18:48] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
[17:19:43] <jinxer-wm>	 FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[17:19:44] <jinxer-wm>	 FIRING: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[17:21:58] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s6 #page on db2158 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 652.98 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[17:23:27] <herron>	 !incidents
[17:23:28] <sirenbot>	 5534 (ACKED)  db2158 (paged)/MariaDB Replica SQL: s6 (paged)
[17:23:28] <sirenbot>	 5535 (UNACKED)  db2158 (paged)/MariaDB Replica Lag: s6 (paged)
[17:23:28] <sirenbot>	 5533 (RESOLVED)  ProbeDown sre (10.2.1.24 ip4 thumbor:8800 probes/service http_thumbor_ip4 codfw)
[17:23:33] <herron>	 !ack 5534
[17:23:33] <sirenbot>	 5534 (ACKED)  db2158 (paged)/MariaDB Replica SQL: s6 (paged)
[17:24:25] <logmsgbot>	 !log herron@cumin1002 dbctl commit (dc=all): 'depooling db2158 T381901', diff saved to https://phabricator.wikimedia.org/P71687 and previous config saved to /var/cache/conftool/dbconfig/20241210-172424-herron.json
[17:24:29] <stashbot>	 T381901: MariaDB Replica SQL: s6 on db2158 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Index for table recentchanges is corrupt: try to repair it on query. Default database: ruwiki.  - https://phabricator.wikimedia.org/T381901
[17:25:15] <marostegui>	 I'll fix that
[17:25:38] <herron>	 thanks marostegui
[17:25:48] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
[17:27:58] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: s6 #page on db2158 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[17:28:49] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on db2158.codfw.wmnet with reason: maintenance
[17:29:00] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s6 #page on db2158 is OK: OK slave_sql_lag Replication lag: 0.28 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[17:29:02] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2158.codfw.wmnet with reason: maintenance
[17:30:15] <wikibugs>	 (03PS1) 10Herron: pyrra: onboard liftwing slos [puppet] - 10https://gerrit.wikimedia.org/r/1101911 (https://phabricator.wikimedia.org/T302995)
[17:30:50] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
[17:33:09] <wikibugs>	 (03PS1) 10Hnowlan: mediawiki: fix mercurius multi-job [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101912 (https://phabricator.wikimedia.org/T371701)
[17:33:54] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] mediawiki: fix mercurius multi-job [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101912 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:35:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71688 and previous config saved to /var/cache/conftool/dbconfig/20241210-173524-root.json
[17:36:26] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mediawiki: fix mercurius multi-job [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101912 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:38:21] <icinga-wm>	 RECOVERY - Host gitlab-runner2004 is UP: PING OK - Packet loss = 0%, RTA = 30.38 ms
[17:38:36] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: fix mercurius multi-job [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101912 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:39:59] <wikibugs>	 (03PS2) 10Herron: pyrra: onboard liftwing slos [puppet] - 10https://gerrit.wikimedia.org/r/1101911 (https://phabricator.wikimedia.org/T302995)
[17:39:59] <wikibugs>	 (03CR) 10Herron: [C:03+2] "self merge for onboarding" [puppet] - 10https://gerrit.wikimedia.org/r/1101911 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[17:41:56] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
[17:42:01] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
[17:42:27] <wikibugs>	 06SRE, 06Data-Engineering, 06Data-Platform-SRE: Data Platform access streamlining for WMDE staff - https://phabricator.wikimedia.org/T381824#10394660 (10odimitrijevic) Yes, I approve streamlining the access to WMDE staff in the same way that we do for WMF staff as proposed in https://phabricator.wikimedia.or...
[17:42:38] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply
[17:42:50] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply
[17:43:02] <wikibugs>	 10ops-eqiad, 06DC-Ops: hw troubleshooting: Stuck/bugged BMC on ml-lab1002.eqiad.wmnet - https://phabricator.wikimedia.org/T381902 (10klausman) 03NEW
[17:43:27] <wikibugs>	 10ops-eqiad, 06DC-Ops, 06Machine-Learning-Team: hw troubleshooting: Stuck/bugged BMC on ml-lab1002.eqiad.wmnet - https://phabricator.wikimedia.org/T381902#10394680 (10klausman)
[17:47:03] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply
[17:47:16] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply
[17:48:01] <wikibugs>	 (03PS1) 10Cwhite: Revert^2 "Stats: Move StatsFactory flush into emitBufferedStats" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101913
[17:48:40] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100864 (https://phabricator.wikimedia.org/T380609) (owner: 10Cwhite)
[17:49:29] <wikibugs>	 06SRE: Console domain and property access request - https://phabricator.wikimedia.org/T381904 (10NBaca-WMF) 03NEW
[17:49:38] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101913 (owner: 10Cwhite)
[17:50:18] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101915
[17:50:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71690 and previous config saved to /var/cache/conftool/dbconfig/20241210-175029-root.json
[17:50:33] <wikibugs>	 06SRE, 06Data-Platform-SRE, 06Infrastructure-Foundations, 10netops: Add QoS markings to profile Hadoop/HDFS analytics traffic - https://phabricator.wikimedia.org/T381389#10394722 (10cmooney) >>! In T381389#10389706, @BTullis wrote: > This change looks fine to me, but would it be OK to wait until the New Ye...
[17:54:41] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
[17:55:29] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
[17:56:17] <wikibugs>	 (03CR) 10Dbrant: [C:03+2] mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101915 (owner: 10PipelineBot)
[17:57:23] <wikibugs>	 (03Merged) 10jenkins-bot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101915 (owner: 10PipelineBot)
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T1800)
[18:00:24] <logmsgbot>	 !log dbrant@deploy2002 helmfile [staging] START helmfile.d/services/mobileapps: apply
[18:00:53] <logmsgbot>	 !log dbrant@deploy2002 helmfile [staging] DONE helmfile.d/services/mobileapps: apply
[18:01:22] <logmsgbot>	 !log dbrant@deploy2002 helmfile [eqiad] START helmfile.d/services/mobileapps: apply
[18:01:35] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
[18:02:07] <logmsgbot>	 !log dbrant@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
[18:02:20] <logmsgbot>	 !log dbrant@deploy2002 helmfile [codfw] START helmfile.d/services/mobileapps: apply
[18:02:52] <logmsgbot>	 !log dbrant@deploy2002 helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
[18:05:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71691 and previous config saved to /var/cache/conftool/dbconfig/20241210-180534-root.json
[18:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:20:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71692 and previous config saved to /var/cache/conftool/dbconfig/20241210-182040-root.json
[18:22:44] <wikibugs>	 (03PS1) 10Herron: pyrra: disable liftwing-readability-latency slo [puppet] - 10https://gerrit.wikimedia.org/r/1101916
[18:25:37] <wikibugs>	 (03PS6) 10Gmodena: dse-k8s-services: rename mw-dumps helmfiles. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322)
[18:26:51] <wikibugs>	 (03CR) 10Herron: [C:03+2] pyrra: disable liftwing-readability-latency slo [puppet] - 10https://gerrit.wikimedia.org/r/1101916 (owner: 10Herron)
[18:31:07] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for Ammarpad - https://phabricator.wikimedia.org/T381851#10394858 (10Scott_French)
[18:31:37] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for Ammarpad - https://phabricator.wikimedia.org/T381851#10394860 (10Scott_French) 05Open→03Stalled Thanks, @Ammarpad - It would great if you could you please confirm your SSH public key via a second authenticated channel. A common solution for...
[18:33:24] <wikibugs>	 (03CR) 10Brouberol: dse-k8s-services: rename mw-dumps helmfiles. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[18:35:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Pooling in production', diff saved to https://phabricator.wikimedia.org/P71693 and previous config saved to /var/cache/conftool/dbconfig/20241210-183545-root.json
[18:39:16] <wikibugs>	 (03PS1) 10Hnowlan: mesh.configuration: dummy commit for 1.11.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101917
[18:39:16] <wikibugs>	 (03PS1) 10Hnowlan: mesh.configuration: add tcp_keepalive support in 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701)
[18:39:18] <wikibugs>	 (03PS1) 10Hnowlan: mediawiki: use mesh.configuration 1.11 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101919 (https://phabricator.wikimedia.org/T371701)
[18:40:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mediawiki: use mesh.configuration 1.11 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101919 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[18:41:54] <wikibugs>	 (03PS2) 10Hnowlan: mesh.configuration: add tcp_keepalive support in 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701)
[18:42:24] <wikibugs>	 (03PS1) 10Dzahn: Revert "miscweb: Update Envoy firewall config" [puppet] - 10https://gerrit.wikimedia.org/r/1101922
[18:42:44] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mesh.configuration: add tcp_keepalive support in 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[18:43:20] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] Revert "miscweb: Update Envoy firewall config" [puppet] - 10https://gerrit.wikimedia.org/r/1101922 (owner: 10Dzahn)
[18:46:07] <wikibugs>	 (03PS3) 10Gmodena: data-engineering: add alerts for dumps2 flink app. [alerts] - 10https://gerrit.wikimedia.org/r/1101849 (https://phabricator.wikimedia.org/T379362)
[18:50:07] <icinga-wm>	 PROBLEM - Disk space on ml-lab1001 is CRITICAL: DISK CRITICAL - free space: /srv 9609MiB (2% inode=95%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-lab1001&var-datasource=eqiad+prometheus/ops
[18:54:27] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "this caused problems for https://commons-query.wikimedia.org/ and reverting fixed that for now. this will either move to k8s or go away or" [puppet] - 10https://gerrit.wikimedia.org/r/1092827 (owner: 10Muehlenhoff)
[18:56:32] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "https://phabricator.wikimedia.org/T381909" [puppet] - 10https://gerrit.wikimedia.org/r/1101922 (owner: 10Dzahn)
[18:56:42] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "https://phabricator.wikimedia.org/T381909" [puppet] - 10https://gerrit.wikimedia.org/r/1092827 (owner: 10Muehlenhoff)
[18:57:52] <wikibugs>	 (03PS2) 10Hnowlan: mesh.configuration: dummy commit for 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101917
[18:58:39] <wikibugs>	 (03PS8) 10Ottomata: mediawiki.org/beacon/event/index.php - use EventBus->send [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1063222 (https://phabricator.wikimedia.org/T353817)
[19:04:49] <wikibugs>	 (03CR) 10Ottomata: mediawiki.org/beacon/event/index.php - use EventBus->send (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1063222 (https://phabricator.wikimedia.org/T353817) (owner: 10Ottomata)
[19:05:02] <wikibugs>	 (03CR) 10Ottomata: mediawiki.org/beacon/event/index.php - use EventBus->send (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1063222 (https://phabricator.wikimedia.org/T353817) (owner: 10Ottomata)
[19:09:59] <wikibugs>	 (03PS9) 10Ottomata: mediawiki.org/beacon/event/index.php - use EventBus->send [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1063222 (https://phabricator.wikimedia.org/T353817)
[19:12:54] <wikibugs>	 (03PS1) 10Aleksandar Mastilovic: Add the GitLab runner firewall rule for Blunderbuss [puppet] - 10https://gerrit.wikimedia.org/r/1101925
[19:16:18] <wikibugs>	 (03CR) 10Ebernhardson: Add the GitLab runner firewall rule for Blunderbuss (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1101925 (owner: 10Aleksandar Mastilovic)
[19:21:57] <wikibugs>	 (03PS2) 10DDesouza: Reader Survey: Increase coverage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101875 (https://phabricator.wikimedia.org/T378660)
[19:22:37] <wikibugs>	 (03PS2) 10Hnowlan: mediawiki: use mesh.configuration 1.11 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101919 (https://phabricator.wikimedia.org/T371701)
[19:23:38] <wikibugs>	 (03PS5) 10Hnowlan: mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701)
[19:25:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T381881#10395002 (10phaultfinder)
[19:28:09] <wikibugs>	 (03PS2) 10Aleksandar Mastilovic: Added some explanatory comments [puppet] - 10https://gerrit.wikimedia.org/r/1101925
[19:35:35] <wikibugs>	 (03CR) 10Scott French: [C:03+1] mesh.configuration: dummy commit for 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101917 (owner: 10Hnowlan)
[19:39:09] <wikibugs>	 (03CR) 10Pppery: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101867 (https://phabricator.wikimedia.org/T381421) (owner: 10أنون)
[19:39:44] <wikibugs>	 (03CR) 10Pppery: "Why was this scheduled for deployment on December 10? The task says it shouldn't be deployed until December 16." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101867 (https://phabricator.wikimedia.org/T381421) (owner: 10أنون)
[19:40:53] <wikibugs>	 (03PS3) 10Aleksandar Mastilovic: Add Blunderbuss firewall rule to GitLab runner set [puppet] - 10https://gerrit.wikimedia.org/r/1101925 (https://phabricator.wikimedia.org/T371994)
[19:41:31] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add Blunderbuss firewall rule to GitLab runner set [puppet] - 10https://gerrit.wikimedia.org/r/1101925 (https://phabricator.wikimedia.org/T371994) (owner: 10Aleksandar Mastilovic)
[19:43:01] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is CRITICAL: 1.019e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[19:46:25] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for Ammarpad - https://phabricator.wikimedia.org/T381851#10395080 (10Jdlrobson) Ammarpad has had +2 for some time and has demonstrated a good knowledge of our code and how it interconnects, particularly in the rendering layer. He has been super help...
[19:48:27] <wikibugs>	 (03CR) 10Scott French: [C:03+1] mesh.configuration: add tcp_keepalive/idle_timeout to 1.11.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101918 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[19:51:56] <wikibugs>	 (03PS4) 10Aleksandar Mastilovic: Add Blunderbuss firewall rule to GitLab runner set [puppet] - 10https://gerrit.wikimedia.org/r/1101925 (https://phabricator.wikimedia.org/T371994)
[19:58:26] <wikibugs>	 (03CR) 10Scott French: [C:03+1] mediawiki: use mesh.configuration 1.11 (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101919 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[20:04:39] <logmsgbot>	 !log jhathaway@cumin1002 START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be1088.eqiad.wmnet with reason: T381919
[20:04:43] <stashbot>	 T381919: Supermicro: unable to set boot order after using Redfish to boot once - https://phabricator.wikimedia.org/T381919
[20:04:52] <logmsgbot>	 !log jhathaway@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be1088.eqiad.wmnet with reason: T381919
[20:05:39] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] wdqs1025: remove unneeded host hieradata [puppet] - 10https://gerrit.wikimedia.org/r/1101888 (https://phabricator.wikimedia.org/T376150) (owner: 10Bking)
[20:28:08] <logmsgbot>	 !log jhathaway@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on ms-be1088.eqiad.wmnet with reason: T381919
[20:28:10] <logmsgbot>	 !log jhathaway@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be1088.eqiad.wmnet with reason: T381919
[20:28:12] <stashbot>	 T381919: Supermicro: unable to set boot order after using Redfish to boot once - https://phabricator.wikimedia.org/T381919
[20:32:35] <logmsgbot>	 !log mforns@deploy2002 Started deploy [analytics/refinery@25c1946]: Regular analytics weekly train [analytics/refinery@25c1946c]
[20:35:37] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs1025: remove unneeded host hieradata [puppet] - 10https://gerrit.wikimedia.org/r/1101888 (https://phabricator.wikimedia.org/T376150) (owner: 10Bking)
[20:36:35] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T381881#10395227 (10VRiley-WMF) rebalanced power
[20:36:41] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T381881#10395228 (10VRiley-WMF) 05Open→03Resolved
[20:37:15] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.wdqs.data-transfer (T376150, xfer wdqs scholarly 2023(public)->2026(internal)) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
[20:37:19] <stashbot>	 T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly - https://phabricator.wikimedia.org/T376150
[20:38:12] <logmsgbot>	 !log ryankemper@cumin2002 END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T376150, xfer wdqs scholarly 2023(public)->2026(internal)) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
[20:38:22] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - ps1-b4-eqiad.mgmt.eqiad - https://phabricator.wikimedia.org/T381540#10395232 (10VRiley-WMF) 05Open→03Resolved a:03VRiley-WMF rebalanced power
[20:38:24] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.wdqs.data-transfer (T376150, xfer wdqs scholarly 2023(public)->2026(internal)) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2026.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
[20:45:47] <logmsgbot>	 !log mforns@deploy2002 Finished deploy [analytics/refinery@25c1946]: Regular analytics weekly train [analytics/refinery@25c1946c] (duration: 13m 12s)
[20:45:55] <logmsgbot>	 !log mforns@deploy2002 Started deploy [analytics/refinery@25c1946] (thin): Regular analytics weekly train THIN [analytics/refinery@25c1946c]
[20:46:13] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Prod-Kubernetes, 06serviceops: wikikube-ctrl1002 and wikikube-ctrl1003: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379717#10395266 (10VRiley-WMF) Can we proceed with swapping these?
[20:46:26] <logmsgbot>	 !log mforns@deploy2002 Finished deploy [analytics/refinery@25c1946] (thin): Regular analytics weekly train THIN [analytics/refinery@25c1946c] (duration: 00m 31s)
[20:46:45] <logmsgbot>	 !log mforns@deploy2002 Started deploy [analytics/refinery@25c1946] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@25c1946c]
[20:47:12] <logmsgbot>	 !log mforns@deploy2002 Finished deploy [analytics/refinery@25c1946] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@25c1946c] (duration: 00m 27s)
[20:54:28] <logmsgbot>	 !log mforns@deploy2002 Started deploy [airflow-dags/analytics@2af4e1a]: Fix for the Commons Impact Metrics job
[20:55:56] <logmsgbot>	 !log mforns@deploy2002 Finished deploy [airflow-dags/analytics@2af4e1a]: Fix for the Commons Impact Metrics job (duration: 01m 38s)
[20:58:55] <wikibugs>	 (03PS1) 10Kgraessle: Enable AutoModerator on bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101937 (https://phabricator.wikimedia.org/T381000)
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: How many deployers does it take to do UTC late backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241210T2100).
[21:00:05] <jouncebot>	 bvibber, physikerwelt, danisztls, cjming, arlolra, and cwhite: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:14] <cjming>	 o/
[21:00:15] <danisztls>	 o/
[21:00:17] <cjming>	 i can deploy
[21:00:20] <physikerwelt>	 here
[21:00:39] <cwhite>	 o/
[21:00:55] <cjming>	 physikerwelt: i'll start with yours
[21:01:06] <physikerwelt>	 thank you
[21:01:08] <wikibugs>	 (03PS3) 10Physikerwelt: Add new properties for math popups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101834 (https://phabricator.wikimedia.org/T381046)
[21:01:37] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101834 (https://phabricator.wikimedia.org/T381046) (owner: 10Physikerwelt)
[21:02:03] <bvibber>	 o/
[21:02:19] <wikibugs>	 (03Merged) 10jenkins-bot: Add new properties for math popups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101834 (https://phabricator.wikimedia.org/T381046) (owner: 10Physikerwelt)
[21:02:26] <cjming>	 hi vibber: i'll do yours next
[21:02:28] <bvibber>	 tx
[21:02:34] <cjming>	 *bvibber
[21:03:03] <wikibugs>	 (03PS2) 10Bvibber: LanguageConverter: Ignore content inside <math> and <svg> elements [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101600 (https://phabricator.wikimedia.org/T381617)
[21:03:09] <wikibugs>	 06SRE: Console domain and property access request - https://phabricator.wikimedia.org/T381904#10395283 (10Scott_French) a:03Scott_French Thanks for the summary @NBaca-WMF.  > 1. For the specific request, is there a way to get a list of all domains and properties that we own as an organization, so I can be sure...
[21:03:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101600 (https://phabricator.wikimedia.org/T381617) (owner: 10Bvibber)
[21:11:29] <physikerwelt>	 cjming: Thank you again. Tested everything on https://en.wikipedia.beta.wmflabs.org/wiki/T381046. Works fine as expected. Have a great day/night/...
[21:11:53] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 10decommission-hardware: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10395298 (10Jhancock.wm) An exception occurred: KeyError: 'device_name'  Traceback (most recent call last): File "/srv/netbox/customscrip...
[21:12:27] <wikibugs>	 (03PS3) 10DDesouza: Reader Survey: Increase coverage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101875 (https://phabricator.wikimedia.org/T378660)
[21:12:38] <cjming>	 physikerwelt: ur welcome :)
[21:19:43] <jinxer-wm>	 FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable
[21:19:44] <jinxer-wm>	 FIRING: [2x] IPv6AnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv6AnchorUnreachable
[21:19:50] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] deployment_server: Add release to mwscript-k8s -ojson output [puppet] - 10https://gerrit.wikimedia.org/r/1101607 (https://phabricator.wikimedia.org/T376795) (owner: 10RLazarus)
[21:22:52] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, xfer wdqs scholarly 2023(public)->2026(internal)) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2026.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
[21:22:56] <stashbot>	 T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly - https://phabricator.wikimedia.org/T376150
[21:23:03] <wikibugs>	 (03Merged) 10jenkins-bot: LanguageConverter: Ignore content inside <math> and <svg> elements [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101600 (https://phabricator.wikimedia.org/T381617) (owner: 10Bvibber)
[21:23:21] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1101600|LanguageConverter: Ignore content inside <math> and <svg> elements (T381617)]]
[21:24:07] <bvibber>	 whee
[21:24:56] <cjming>	 😁
[21:27:05] <cjming>	 bvibber: on test servers
[21:27:40] <logmsgbot>	 !log cjming@deploy2002 bvibber, cjming: Backport for [[gerrit:1101600|LanguageConverter: Ignore content inside <math> and <svg> elements (T381617)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:27:48] <bvibber>	 testing
[21:29:55] <bvibber>	 cjming: looks good!
[21:29:56] <bvibber>	 proceed :D
[21:30:01] <cjming>	 yay!
[21:30:03] <logmsgbot>	 !log cjming@deploy2002 bvibber, cjming: Continuing with sync
[21:30:39] <wikibugs>	 (03PS4) 10DDesouza: Reader Survey: Increase coverage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101875 (https://phabricator.wikimedia.org/T378660)
[21:34:11] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Mail: Log tls cipher information - https://phabricator.wikimedia.org/T381927 (10jhathaway) 03NEW
[21:35:17] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101600|LanguageConverter: Ignore content inside <math> and <svg> elements (T381617)]] (duration: 11m 55s)
[21:36:10] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101875 (https://phabricator.wikimedia.org/T378660) (owner: 10DDesouza)
[21:36:50] <wikibugs>	 (03Merged) 10jenkins-bot: Reader Survey: Increase coverage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101875 (https://phabricator.wikimedia.org/T378660) (owner: 10DDesouza)
[21:37:05] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1101875|Reader Survey: Increase coverage (T378660)]]
[21:37:09] <stashbot>	 T378660: Quicksurvey deployment for Reader Survey - https://phabricator.wikimedia.org/T378660
[21:41:00] <cjming>	 danisztls: your patch is up on test servers if you'd like to verify
[21:41:23] <danisztls>	 cjming: all looks good thanks
[21:41:43] <logmsgbot>	 !log cjming@deploy2002 cjming, dani: Backport for [[gerrit:1101875|Reader Survey: Increase coverage (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:41:46] <logmsgbot>	 !log cjming@deploy2002 cjming, dani: Continuing with sync
[21:42:36] <wikibugs>	 (03PS2) 10Phuedx: Beta Cluster: Enable MetricsPlatform extension on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101840 (https://phabricator.wikimedia.org/T381849)
[21:47:07] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101875|Reader Survey: Increase coverage (T378660)]] (duration: 10m 02s)
[21:47:11] <stashbot>	 T378660: Quicksurvey deployment for Reader Survey - https://phabricator.wikimedia.org/T378660
[21:47:31] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101840 (https://phabricator.wikimedia.org/T381849) (owner: 10Phuedx)
[21:48:14] <wikibugs>	 (03Merged) 10jenkins-bot: Beta Cluster: Enable MetricsPlatform extension on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101840 (https://phabricator.wikimedia.org/T381849) (owner: 10Phuedx)
[21:48:31] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1101840|Beta Cluster: Enable MetricsPlatform extension on all wikis (T381849 T381853)]]
[21:48:37] <stashbot>	 T381849: Community Updates module impressions lack experiment and variant information - https://phabricator.wikimedia.org/T381849
[21:48:37] <stashbot>	 T381853: MetricsPlatform: Update MetricsPlatformEnable config variable - https://phabricator.wikimedia.org/T381853
[21:52:45] <logmsgbot>	 !log cjming@deploy2002 cjming, phuedx: Backport for [[gerrit:1101840|Beta Cluster: Enable MetricsPlatform extension on all wikis (T381849 T381853)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:53:58] <logmsgbot>	 !log cjming@deploy2002 cjming, phuedx: Continuing with sync
[21:55:48] <wikibugs>	 06SRE: Console domain and property access request - https://phabricator.wikimedia.org/T381904#10395444 (10Scott_French) @NBaca-WMF - When you get a chance could you please confirm the following: * That my (edited) interpretation for question #1 in T381904#10395283 is correct - i.e., you're interested in enumerat...
[21:56:30] <logmsgbot>	 !log jhathaway@cumin1002 START - Cookbook sre.hosts.reimage for host ms-be1088.eqiad.wmnet with OS bookworm
[21:59:22] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101840|Beta Cluster: Enable MetricsPlatform extension on all wikis (T381849 T381853)]] (duration: 10m 50s)
[21:59:27] <stashbot>	 T381849: Community Updates module impressions lack experiment and variant information - https://phabricator.wikimedia.org/T381849
[21:59:27] <stashbot>	 T381853: MetricsPlatform: Update MetricsPlatformEnable config variable - https://phabricator.wikimedia.org/T381853
[22:00:30] <cjming>	 arlolra: are you around?
[22:00:57] <cjming>	 cwhite: are you around?
[22:01:26] <cwhite>	 o/
[22:02:38] <wikibugs>	 (03PS3) 10Cwhite: Disable stats collection when WMF_MAINTENANCE_OFFLINE is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100864 (https://phabricator.wikimedia.org/T380609)
[22:02:58] <cjming>	 cwhite: i'll pick up with your config patch
[22:03:06] <cwhite>	 Thank you!
[22:03:22] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100864 (https://phabricator.wikimedia.org/T380609) (owner: 10Cwhite)
[22:04:06] <wikibugs>	 (03Merged) 10jenkins-bot: Disable stats collection when WMF_MAINTENANCE_OFFLINE is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100864 (https://phabricator.wikimedia.org/T380609) (owner: 10Cwhite)
[22:04:22] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1100864|Disable stats collection when WMF_MAINTENANCE_OFFLINE is set (T380609)]]
[22:04:26] <stashbot>	 T380609: Maintenance scripts do not emit StatsLib metrics - https://phabricator.wikimedia.org/T380609
[22:04:39] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[22:07:32] <cjming>	 cwhite: on mwdebug
[22:08:04] <logmsgbot>	 !log cjming@deploy2002 cwhite, cjming: Backport for [[gerrit:1100864|Disable stats collection when WMF_MAINTENANCE_OFFLINE is set (T380609)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:09:29] <cjming>	 cwhite: lmk if/when to sync
[22:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:10:01] <cwhite>	 cjming: looks good on mwdebug
[22:10:09] <cjming>	 nice
[22:10:11] <logmsgbot>	 !log cjming@deploy2002 cwhite, cjming: Continuing with sync
[22:11:07] <wikibugs>	 (03PS2) 10Cwhite: Revert^2 "Stats: Move StatsFactory flush into emitBufferedStats" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101913
[22:15:47] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1100864|Disable stats collection when WMF_MAINTENANCE_OFFLINE is set (T380609)]] (duration: 11m 24s)
[22:15:51] <stashbot>	 T380609: Maintenance scripts do not emit StatsLib metrics - https://phabricator.wikimedia.org/T380609
[22:19:19] <logmsgbot>	 !log jhathaway@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
[22:22:01] <cjming>	 cwhite: i'll do your backport now
[22:22:42] <logmsgbot>	 !log jhathaway@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
[22:23:25] <cjming>	 cwhite: unless it can wait?
[22:24:19] <cwhite>	 cjming: It can wait if you need.  There's a chance it will fail in the scap mwscript steps.
[22:27:11] <cjming>	 that would be great - i gotta run
[22:28:47] <cwhite>	 ok :)
[22:31:53] <wikibugs>	 (03PS1) 10Scott French: shellbox-video: allow egress to swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101944 (https://phabricator.wikimedia.org/T292322)
[22:36:08] <cjming>	 !log end of UTC late backport window
[22:36:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:49:58] <logmsgbot>	 !log jhathaway@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1088.eqiad.wmnet with OS bookworm
[22:54:18] <logmsgbot>	 !log jhathaway@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be1088.eqiad.wmnet with reason: T381919
[22:54:20] <logmsgbot>	 !log jhathaway@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1088.eqiad.wmnet with reason: T381919
[22:54:21] <stashbot>	 T381919: Supermicro: unable to set boot order after using Redfish to boot once - https://phabricator.wikimedia.org/T381919
[22:57:53] <wikibugs>	 06SRE: Console domain and property access request - https://phabricator.wikimedia.org/T381904#10395576 (10NBaca-WMF) Hi @Scott_French  - Thank you for taking a look at this! 1. Yes - this is a good summary. We can currently only see a subset of domains and properties, but I know there are many more out there, bo...
[23:23:53] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] shellbox-video: allow egress to swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101944 (https://phabricator.wikimedia.org/T292322) (owner: 10Scott French)
[23:40:45] <wikibugs>	 (03CR) 10Scott French: [C:03+1] mediawiki: get mercurius label from mediawiki image version (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101889 (https://phabricator.wikimedia.org/T371700) (owner: 10Hnowlan)