[00:02:07] <jinxer-wm>	 (ProbeDown) firing: Service videoscaler:443 has failed probes (http_videoscaler_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#videoscaler:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:06:07] <jinxer-wm>	 (ProbeDown) resolved: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:06:37] <jinxer-wm>	 (ProbeDown) firing: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:07:07] <jinxer-wm>	 (ProbeDown) resolved: Service videoscaler:443 has failed probes (http_videoscaler_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#videoscaler:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:11:22] <jinxer-wm>	 (ProbeDown) firing: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:16:22] <jinxer-wm>	 (ProbeDown) firing: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:16:37] <jinxer-wm>	 (ProbeDown) firing: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:17:07] <jinxer-wm>	 (ProbeDown) firing: Service videoscaler:443 has failed probes (http_videoscaler_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#videoscaler:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:21:22] <jinxer-wm>	 (ProbeDown) firing: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:22:07] <jinxer-wm>	 (ProbeDown) resolved: Service videoscaler:443 has failed probes (http_videoscaler_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#videoscaler:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:26:22] <jinxer-wm>	 (ProbeDown) resolved: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:31:22] <jinxer-wm>	 (ProbeDown) firing: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:39:23] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/924127
[00:39:29] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/924127 (owner: 10TrainBranchBot)
[00:41:22] <jinxer-wm>	 (ProbeDown) firing: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:41:37] <jinxer-wm>	 (ProbeDown) firing: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:46:22] <jinxer-wm>	 (ProbeDown) resolved: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:49:07] <jinxer-wm>	 (ProbeDown) firing: Service videoscaler:443 has failed probes (http_videoscaler_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#videoscaler:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:54:07] <jinxer-wm>	 (ProbeDown) resolved: Service videoscaler:443 has failed probes (http_videoscaler_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#videoscaler:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:55:39] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/924127 (owner: 10TrainBranchBot)
[00:56:07] <jinxer-wm>	 (ProbeDown) firing: Service videoscaler:443 has failed probes (http_videoscaler_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#videoscaler:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:01:07] <jinxer-wm>	 (ProbeDown) resolved: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:03:34] <wikibugs>	 10ops-codfw: Inbound interface errors - https://phabricator.wikimedia.org/T337705 (10phaultfinder)
[01:41:07] <jinxer-wm>	 (ProbeDown) firing: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:42:26] <icinga-wm>	 PROBLEM - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is CRITICAL: /api/rest_v1/page/talk/{title} (Get structured talk page for enwiki Salt article) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[01:43:56] <icinga-wm>	 RECOVERY - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[01:46:07] <jinxer-wm>	 (ProbeDown) resolved: (2) Service jobrunner:443 has failed probes (http_jobrunner_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:49:12] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:50:06] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:51:08] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:52:32] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Tue 20 Jun 2023 04:41:39 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:53:04] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 49994 bytes in 0.208 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:56:50] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 1.917 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[02:06:32] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:07:57] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.41.0-wmf.11 [core] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924128 (https://phabricator.wikimedia.org/T337525)
[02:08:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.41.0-wmf.11 [core] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924128 (https://phabricator.wikimedia.org/T337525) (owner: 10TrainBranchBot)
[02:23:19] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.41.0-wmf.11 [core] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924128 (https://phabricator.wikimedia.org/T337525) (owner: 10TrainBranchBot)
[02:26:32] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:29:12] <jinxer-wm>	 (SystemdUnitFailed) firing: wmf_auto_restart_prometheus-ipmi-exporter.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:31:32] <jinxer-wm>	 (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:01:22] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis wikis to 1.41.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924176 (https://phabricator.wikimedia.org/T337525)
[03:01:24] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] testwikis wikis to 1.41.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924176 (https://phabricator.wikimedia.org/T337525) (owner: 10TrainBranchBot)
[03:02:07] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis wikis to 1.41.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924176 (https://phabricator.wikimedia.org/T337525) (owner: 10TrainBranchBot)
[03:02:39] <logmsgbot>	 !log mwpresync@deploy1002 Started scap: testwikis wikis to 1.41.0-wmf.11  refs T337525
[03:02:44] <stashbot>	 T337525: 1.41.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T337525
[03:35:50] <icinga-wm>	 PROBLEM - Check systemd state on cumin2002 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-web_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:41:46] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[03:52:33] <logmsgbot>	 !log mwpresync@deploy1002 Finished scap: testwikis wikis to 1.41.0-wmf.11  refs T337525 (duration: 49m 54s)
[03:52:38] <stashbot>	 T337525: 1.41.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T337525
[03:53:49] <wikibugs>	 (03PS3) 10KartikMistry: Undeploy Special:Contribute from unsupported skins [mediawiki-config] - 10https://gerrit.wikimedia.org/r/923527 (https://phabricator.wikimedia.org/T337366)
[03:54:46] <logmsgbot>	 !log mwpresync@deploy1002 Pruned MediaWiki: 1.41.0-wmf.9 (duration: 02m 10s)
[04:10:41] * kart_ updating cxserver..
[04:11:19] <wikibugs>	 (03PS3) 10KartikMistry: Update cxserver to 2023-05-29-112644-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/923920 (https://phabricator.wikimedia.org/T337657)
[04:14:10] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2023-05-29-112644-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/923920 (https://phabricator.wikimedia.org/T337657) (owner: 10KartikMistry)
[04:15:09] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2023-05-29-112644-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/923920 (https://phabricator.wikimedia.org/T337657) (owner: 10KartikMistry)
[04:20:57] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] START helmfile.d/services/cxserver: apply
[04:21:24] <icinga-wm>	 PROBLEM - dump of db_inventory in codfw on backupmon1001 is CRITICAL: Last dump for db_inventory at codfw (db2185) taken on 2023-05-30 03:55:35 is 109 KiB, but the previous one was 93 KiB, a change of +16.6 % https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[04:21:26] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[04:24:03] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[04:24:37] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[04:26:38] <icinga-wm>	 PROBLEM - dump of db_inventory in eqiad on backupmon1001 is CRITICAL: Last dump for db_inventory at eqiad (db1215) taken on 2023-05-30 04:02:03 is 109 KiB, but the previous one was 93 KiB, a change of +17.1 % https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[04:27:32] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[04:28:08] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[04:28:30] <kart_>	 !log Updated cxserver to 2023-05-29-112644-production (T337657)
[04:28:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:28:34] <stashbot>	 T337657: Shutdown OpusMT service - https://phabricator.wikimedia.org/T337657
[04:31:42] <icinga-wm>	 RECOVERY - Check systemd state on cumin2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:34:28] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[05:01:05] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[05:08:08] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s3 on clouddb1017 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 1236, Errmsg: Got fatal error 1236 from master when reading data from binary log: Error: connecting slave requested to start from GTID 171966471-171966471-66240, which is not in the masters binlog https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[05:17:58] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'configure' for AS: 62597
[05:22:23] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62597
[05:24:27] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove access for xihua [puppet] - 10https://gerrit.wikimedia.org/r/924339
[05:25:11] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Remove access for xihua [puppet] - 10https://gerrit.wikimedia.org/r/924339 (owner: 10Muehlenhoff)
[05:25:35] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 1255 hosts
[05:26:27] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 1255 hosts
[05:27:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 784 hosts
[05:28:10] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 784 hosts
[05:29:18] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove access for xihua [puppet] - 10https://gerrit.wikimedia.org/r/924339
[05:31:05] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[05:33:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove access for xihua [puppet] - 10https://gerrit.wikimedia.org/r/924339 (owner: 10Muehlenhoff)
[05:36:12] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove access for nray [puppet] - 10https://gerrit.wikimedia.org/r/924340
[05:39:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove access for nray [puppet] - 10https://gerrit.wikimedia.org/r/924340 (owner: 10Muehlenhoff)
[05:40:18] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.idm.logout Logging Nray out of all services on: 784 hosts
[05:40:40] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 784 hosts
[05:40:50] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.idm.logout Logging Nray out of all services on: 1255 hosts
[05:41:27] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 1255 hosts
[05:42:59] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'configure' for AS: 62597
[05:43:01] <logmsgbot>	 !log ayounsi@cumin1001 END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 62597
[05:59:00] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2110: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/924163
[05:59:12] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert "db2110: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/924163 (owner: 10Marostegui)
[05:59:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48623 and previous config saved to /var/cache/conftool/dbconfig/20230530-055913-root.json
[05:59:32] <wikibugs>	 (03Abandoned) 10Marostegui: Revert "db2110: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/924163 (owner: 10Marostegui)
[05:59:54] <wikibugs>	 (03PS3) 10Muehlenhoff: debmonitor::server: Add bookworm support [puppet] - 10https://gerrit.wikimedia.org/r/922145
[05:59:57] <wikibugs>	 (03PS1) 10Marostegui: db2110: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/924341
[06:02:57] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/922145 (owner: 10Muehlenhoff)
[06:04:55] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2110: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/924341 (owner: 10Marostegui)
[06:14:18] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48624 and previous config saved to /var/cache/conftool/dbconfig/20230530-061417-root.json
[06:16:29] <wikibugs>	 (03PS1) 10Vgutierrez: service: Disable monitors for wikireplicas [puppet] - 10https://gerrit.wikimedia.org/r/924342 (https://phabricator.wikimedia.org/T337446)
[06:20:11] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] service: Disable monitors for wikireplicas [puppet] - 10https://gerrit.wikimedia.org/r/924342 (https://phabricator.wikimedia.org/T337446) (owner: 10Vgutierrez)
[06:20:49] <vgutierrez>	 marostegui: Wmflib::Service::Lvs sets the monitors key as mandatory
[06:21:04] <vgutierrez>	 so it won't work
[06:21:21] <marostegui>	 vgutierrez: Did arturo or dcaro came back to you yesterday?
[06:21:39] <vgutierrez>	 nope
[06:21:59] <marostegui>	 I am almost done with the current transfer, but there will be more coming
[06:29:12] <jinxer-wm>	 (SystemdUnitFailed) firing: wmf_auto_restart_prometheus-ipmi-exporter.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:29:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48625 and previous config saved to /var/cache/conftool/dbconfig/20230530-062922-root.json
[06:33:58] <wikibugs>	 (03PS1) 10Jelto: miscweb: set ipv4 and port for 15 and annual blackbox check [puppet] - 10https://gerrit.wikimedia.org/r/924345 (https://phabricator.wikimedia.org/T300171)
[06:34:18] <marostegui>	 vgutierrez: Until this is fixed I guess I will do the transfer in a different way so only one of the backends will be done instead of two
[06:38:51] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] miscweb: set ipv4 and port for 15 and annual blackbox check [puppet] - 10https://gerrit.wikimedia.org/r/924345 (https://phabricator.wikimedia.org/T300171) (owner: 10Jelto)
[06:40:54] <vgutierrez>	 marostegui: it's a weird scenario from pybal's point of view. Two servers defined as backend servers for 16 services
[06:41:25] <vgutierrez>	 so 32 monitors attempting to reconnect as fast as possible at the same time
[06:44:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48628 and previous config saved to /var/cache/conftool/dbconfig/20230530-064427-root.json
[06:45:38] <wikibugs>	 (03PS2) 10Vgutierrez: service: Disable monitors for wikireplicas [puppet] - 10https://gerrit.wikimedia.org/r/924342 (https://phabricator.wikimedia.org/T337446)
[06:47:55] <marostegui>	  vgutierrez ^ would that allow me to do what I did yesterday? (so stopping all backends) that'd help to recover things faster
[06:48:05] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
[06:48:06] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[06:48:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] debmonitor::server: Add bookworm support [puppet] - 10https://gerrit.wikimedia.org/r/922145 (owner: 10Muehlenhoff)
[06:49:44] <vgutierrez>	 marostegui: via puppet isn't feasible.. all our puppetization expects that services get some kind of monitoring/healthchecking
[06:50:06] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[06:50:44] <vgutierrez>	 considering it isn't harming traffic I'd just ignore/ack the lvs alerts
[06:50:57] <vgutierrez>	 just let us (traffic) know when you start/finish please
[06:51:02] <vgutierrez>	 and sorry for the inconvenience
[06:51:07] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[06:51:07] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[06:51:07] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[06:51:10] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[06:52:45] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] tables_to_check: drop revision_comment_temp [software] - 10https://gerrit.wikimedia.org/r/924122 (https://phabricator.wikimedia.org/T215466) (owner: 10Zabe)
[06:53:18] <wikibugs>	 (03Merged) 10jenkins-bot: tables_to_check: drop revision_comment_temp [software] - 10https://gerrit.wikimedia.org/r/924122 (https://phabricator.wikimedia.org/T215466) (owner: 10Zabe)
[06:57:45] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[06:58:47] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[06:58:48] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[06:59:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48629 and previous config saved to /var/cache/conftool/dbconfig/20230530-065932-root.json
[07:00:05] <jouncebot>	 Amir1, Urbanecm, and taavi: May I have your attention please! UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T0700)
[07:00:05] <jouncebot>	 Func and kart_: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:00:14] * kart_ is here
[07:00:20] <Func>	 o/
[07:00:52] <Amir1>	 kart_: you can self-serve, right?
[07:01:04] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:01:48] <kart_>	 Amir1: sure
[07:01:54] <kart_>	 Yes
[07:01:55] <Amir1>	 once done, ping me to do Func's patch
[07:02:00] <kart_>	 OK!
[07:02:08] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:02:08] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:02:08] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[07:02:11] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[07:02:11] <logmsgbot>	 !log slyngshede@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
[07:02:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by kartik@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/923527 (https://phabricator.wikimedia.org/T337366) (owner: 10KartikMistry)
[07:03:59] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
[07:04:00] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[07:04:01] <wikibugs>	 (03Merged) 10jenkins-bot: Undeploy Special:Contribute from unsupported skins [mediawiki-config] - 10https://gerrit.wikimedia.org/r/923527 (https://phabricator.wikimedia.org/T337366) (owner: 10KartikMistry)
[07:04:47] <logmsgbot>	 !log kartik@deploy1002 Started scap: Backport for [[gerrit:923527|Undeploy Special:Contribute from unsupported skins (T337366)]]
[07:04:52] <stashbot>	 T337366: Tabs on the "Contribute" page not showing for some skins - https://phabricator.wikimedia.org/T337366
[07:05:49] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] SRE: Add a new cookbook that allows to run puppet configuration while restarting Varnish [cookbooks] - 10https://gerrit.wikimedia.org/r/922844 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[07:06:07] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:06:09] <RhinosF1>	 Not that it matters for any of them deploys as none are beta only but beta scap is broken
[07:06:26] <logmsgbot>	 !log kartik@deploy1002 kartik: Backport for [[gerrit:923527|Undeploy Special:Contribute from unsupported skins (T337366)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
[07:07:10] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:07:10] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:07:10] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[07:07:14] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[07:07:16] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[07:09:14] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:10:19] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:10:19] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:10:19] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[07:10:22] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[07:10:29] <logmsgbot>	 !log slyngshede@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
[07:10:47] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
[07:10:48] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[07:11:42] <kart_>	 RhinosF1: noted. Thanks!
[07:12:56] <wikibugs>	 (03PS2) 10KartikMistry: testwiki: Enable Section Translation for 9 Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924050 (https://phabricator.wikimedia.org/T337290)
[07:14:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48630 and previous config saved to /var/cache/conftool/dbconfig/20230530-071436-root.json
[07:15:20] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: db2110 crashed - https://phabricator.wikimedia.org/T337445 (10Marostegui) 05Open→03Resolved The host is repooled. Thanks for your help!
[07:16:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: "FWIW cadvisor can run no problem on VMs, sorry for the breakage though!" [puppet] - 10https://gerrit.wikimedia.org/r/924106 (https://phabricator.wikimedia.org/T108027) (owner: 10Andrew Bogott)
[07:16:16] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:16:33] <moritzm>	 !log update bookworm installer to rc4 T330495
[07:16:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:16:37] <stashbot>	 T330495: Prepare our custom installer for Bookworm - https://phabricator.wikimedia.org/T330495
[07:16:37] <logmsgbot>	 !log kartik@deploy1002 Finished scap: Backport for [[gerrit:923527|Undeploy Special:Contribute from unsupported skins (T337366)]] (duration: 11m 49s)
[07:16:41] <stashbot>	 T337366: Tabs on the "Contribute" page not showing for some skins - https://phabricator.wikimedia.org/T337366
[07:17:22] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:17:22] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:17:22] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[07:17:26] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[07:18:21] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by kartik@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924050 (https://phabricator.wikimedia.org/T337290) (owner: 10KartikMistry)
[07:19:07] <wikibugs>	 (03Merged) 10jenkins-bot: testwiki: Enable Section Translation for 9 Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924050 (https://phabricator.wikimedia.org/T337290) (owner: 10KartikMistry)
[07:19:36] <logmsgbot>	 !log kartik@deploy1002 Started scap: Backport for [[gerrit:924050|testwiki: Enable Section Translation for 9 Wikipedia (T337290)]]
[07:19:41] <stashbot>	 T337290: Enable MinT, Content and Section Translation for 10 languages previously lacking machine translation - https://phabricator.wikimedia.org/T337290
[07:21:08] <logmsgbot>	 !log kartik@deploy1002 kartik: Backport for [[gerrit:924050|testwiki: Enable Section Translation for 9 Wikipedia (T337290)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
[07:23:01] <tgr_>	 I added a bunch of commits that were originally scheduled for the UTC afternoon window. Please ping me when the original commits are done.
[07:27:23] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:28:29] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:28:30] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[07:29:15] <logmsgbot>	 !log kartik@deploy1002 Finished scap: Backport for [[gerrit:924050|testwiki: Enable Section Translation for 9 Wikipedia (T337290)]] (duration: 09m 38s)
[07:29:20] <stashbot>	 T337290: Enable MinT, Content and Section Translation for 10 languages previously lacking machine translation - https://phabricator.wikimedia.org/T337290
[07:29:20] <kart_>	 Amir1: I'm done with my 2 config deployments.
[07:29:34] <Amir1>	 cool
[07:29:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48632 and previous config saved to /var/cache/conftool/dbconfig/20230530-072941-root.json
[07:29:54] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "LGTM but Somone from data engineering should do the deployment." [puppet] - 10https://gerrit.wikimedia.org/r/923545 (https://phabricator.wikimedia.org/T275246) (owner: 10Zabe)
[07:30:29] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially [core] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924086 (https://phabricator.wikimedia.org/T337634) (owner: 10Func)
[07:30:30] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:30:50] <moritzm>	 !log move LDAP permissions for hghani from cn=nda to cn=wmf T322145
[07:30:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:31:06] <wikibugs>	 (03PS1) 10Gergő Tisza: Section images: Accept more recommendation types [extensions/GrowthExperiments] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924356
[07:31:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by ladsgroup@deploy1002 using scap backport" [core] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924086 (https://phabricator.wikimedia.org/T337634) (owner: 10Func)
[07:31:34] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:31:34] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:31:34] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[07:31:37] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[07:31:37] <logmsgbot>	 !log slyngshede@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
[07:32:30] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Section images: Accept more recommendation types [extensions/GrowthExperiments] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924356 (owner: 10Gergő Tisza)
[07:38:40] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
[07:38:42] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[07:40:35] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:41:43] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:41:43] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:41:43] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[07:41:46] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[07:42:08] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[07:44:03] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:44:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48633 and previous config saved to /var/cache/conftool/dbconfig/20230530-074445-root.json
[07:45:08] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[07:45:08] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:45:08] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[07:45:11] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[07:45:11] <logmsgbot>	 !log slyngshede@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
[07:46:25] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially [core] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924086 (https://phabricator.wikimedia.org/T337634) (owner: 10Func)
[07:46:49] <logmsgbot>	 !log ladsgroup@deploy1002 Started scap: Backport for [[gerrit:924086|Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]]
[07:46:53] <stashbot>	 T337634: Sorting broken for anonymous users - TypeError: Cannot read properties of undefined (reading 'type') / TypeError: Language ID should be string or object. / TypeError: undefined is not an object (evaluating 'cachedParsers[sortList[i][0]].type') /  TypeError: locale value must be a string or object - https://phabricator.wikimedia.org/T337634
[07:48:13] <logmsgbot>	 !log ladsgroup@deploy1002 func and ladsgroup: Backport for [[gerrit:924086|Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
[07:48:24] <Amir1>	 Func: it's live in mwdebug
[07:48:29] <Func>	 testing
[07:49:21] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host puppetdb2003.codfw.wmnet with OS bookworm
[07:49:32] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Setup an initial bookworm host pair with Puppetdb 7 - https://phabricator.wikimedia.org/T321783 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host puppetdb2003.codfw.wmnet with OS bookworm
[07:50:28] <Func>	 Amir1: Good to go
[07:50:37] <Amir1>	 awesome
[07:51:35] <wikibugs>	 (03Merged) 10jenkins-bot: Section images: Accept more recommendation types [extensions/GrowthExperiments] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924356 (owner: 10Gergő Tisza)
[07:52:21] <wikibugs>	 (03CR) 10Ladsgroup: Switch VisualEditor to not use RESTbase on small and medium wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/923650 (https://phabricator.wikimedia.org/T320529) (owner: 10Daniel Kinzler)
[07:56:06] <logmsgbot>	 !log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:924086|Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]] (duration: 09m 17s)
[07:56:11] <stashbot>	 T337634: Sorting broken for anonymous users - TypeError: Cannot read properties of undefined (reading 'type') / TypeError: Language ID should be string or object. / TypeError: undefined is not an object (evaluating 'cachedParsers[sortList[i][0]].type') /  TypeError: locale value must be a string or object - https://phabricator.wikimedia.org/T337634
[07:56:32] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:56:34] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (POST pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[07:57:53] <tgr_>	 Amir1: all done?
[07:57:59] <icinga-wm>	 PROBLEM - SSH on wdqs2021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[07:58:05] <Amir1>	 yup
[07:58:07] <Amir1>	 sorry
[07:58:34] <tgr_>	 thanks! I'll backport a few more things
[07:59:59] <wikibugs>	 (03PS5) 10D3r1ck01: Switch VisualEditor to not use RESTbase on small and medium wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/923650 (https://phabricator.wikimedia.org/T320529) (owner: 10Daniel Kinzler)
[08:00:09] <logmsgbot>	 !log tgr@deploy1002 Started scap: Backport for [[gerrit:924356|Section images: Accept more recommendation types]]
[08:00:53] <wikibugs>	 (03CR) 10D3r1ck01: [C: 03+2] Switch VisualEditor to not use RESTbase on small and medium wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/923650 (https://phabricator.wikimedia.org/T320529) (owner: 10Daniel Kinzler)
[08:01:33] <logmsgbot>	 !log tgr@deploy1002 tgr: Backport for [[gerrit:924356|Section images: Accept more recommendation types]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
[08:01:34] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (POST pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[08:01:50] <wikibugs>	 (03Merged) 10jenkins-bot: Switch VisualEditor to not use RESTbase on small and medium wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/923650 (https://phabricator.wikimedia.org/T320529) (owner: 10Daniel Kinzler)
[08:03:02] <wikibugs>	 (03PS1) 10Ladsgroup: Revert "Switch VisualEditor to not use RESTbase on small and medium wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924357
[08:03:10] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: remove 'global' instance references [puppet] - 10https://gerrit.wikimedia.org/r/921350 (https://phabricator.wikimedia.org/T288196) (owner: 10Filippo Giunchedi)
[08:03:16] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Improve logging of invalid image recommendation kinds [extensions/GrowthExperiments] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/923643 (owner: 10Gergő Tisza)
[08:03:20] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] "PCC https://puppet-compiler.wmflabs.org/output/921350/41393/" [puppet] - 10https://gerrit.wikimedia.org/r/921350 (https://phabricator.wikimedia.org/T288196) (owner: 10Filippo Giunchedi)
[08:03:20] <wikibugs>	 (03PS3) 10Filippo Giunchedi: prometheus: remove 'global' instance references [puppet] - 10https://gerrit.wikimedia.org/r/921350 (https://phabricator.wikimedia.org/T288196)
[08:04:05] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] "Please don't +2 patches in config like that. Follow https://wikitech.wikimedia.org/wiki/Backport_windows/Deployers" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924357 (owner: 10Ladsgroup)
[08:04:31] <Amir1>	 tgr_: Someone +2'ed a config patch without the intention of deploying. I'm reverting it.
[08:04:53] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Switch VisualEditor to not use RESTbase on small and medium wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924357 (owner: 10Ladsgroup)
[08:05:30] <tgr_>	 thanks, didn't notice that.
[08:06:10] <xSavitar>	 Amir1, tgr_: It was me, sorry wrong button. I hit rebase then hit +2 mistakenly. Sorry. Thanks Amir1 for the revert.
[08:08:01] <logmsgbot>	 !log tgr@deploy1002 Finished scap: Backport for [[gerrit:924356|Section images: Accept more recommendation types]] (duration: 07m 51s)
[08:08:13] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
[08:09:58] <wikibugs>	 (03PS1) 10D3r1ck01: Revert "Revert "Switch VisualEditor to not use RESTbase on small and medium wikis"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924358
[08:10:57] <wikibugs>	 (03Abandoned) 10Jelto: service::catalog add miscweb 15 and annual to service catalog [puppet] - 10https://gerrit.wikimedia.org/r/923263 (https://phabricator.wikimedia.org/T300171) (owner: 10Jelto)
[08:11:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/923620 (https://phabricator.wikimedia.org/T279683) (owner: 10Jbond)
[08:11:31] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
[08:12:48] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
[08:12:49] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[08:14:12] <jinxer-wm>	 (SystemdUnitFailed) resolved: wmf_auto_restart_prometheus-ipmi-exporter.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:14:45] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:15:26] <wikibugs>	 (03Abandoned) 10Vgutierrez: service: Disable monitors for wikireplicas [puppet] - 10https://gerrit.wikimedia.org/r/924342 (https://phabricator.wikimedia.org/T337446) (owner: 10Vgutierrez)
[08:15:49] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:15:49] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:15:50] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[08:15:53] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[08:19:06] <wikibugs>	 10SRE, 10API Platform, 10Anti-Harassment, 10Cloud-Services, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10kostajh)
[08:19:55] <icinga-wm>	 PROBLEM - Check systemd state on wdqs2021 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service,wmf_auto_restart_prometheus-ipmi-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:20:04] <jayme>	 !log disable puppet on P:kubernetes::node (apart from staging-codfw) for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
[08:20:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:20:52] <wikibugs>	 (03Merged) 10jenkins-bot: Improve logging of invalid image recommendation kinds [extensions/GrowthExperiments] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/923643 (owner: 10Gergő Tisza)
[08:21:23] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1 C: 03+2] Make kubernetes::clusters the central place for k8s config [puppet] - 10https://gerrit.wikimedia.org/r/909687 (https://phabricator.wikimedia.org/T325268) (owner: 10JMeybohm)
[08:21:32] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:21:42] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) systemd-timedated.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:23:37] <wikibugs>	 (03PS1) 10Gergő Tisza: Improve handling of missing image recommendation [extensions/GrowthExperiments] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924361
[08:25:11] <wikibugs>	 (03PS6) 10JMeybohm: Remove profile::kubernetes::deployment_server from role::releases [puppet] - 10https://gerrit.wikimedia.org/r/912785 (https://phabricator.wikimedia.org/T288629)
[08:25:50] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[08:26:27] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Remove profile::kubernetes::deployment_server from role::releases [puppet] - 10https://gerrit.wikimedia.org/r/912785 (https://phabricator.wikimedia.org/T288629) (owner: 10JMeybohm)
[08:26:32] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:27:18] <zabe>	 tgr_: could you ping me when you are done deploying?
[08:27:30] <tgr_>	 will do
[08:27:39] <jayme>	 !log re-enable puppet on P:kubernetes::node for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
[08:27:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:52] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:28:56] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:28:57] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:28:57] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[08:28:57] <logmsgbot>	 !log tgr@deploy1002 Started scap: Backport for [[gerrit:923643|Improve logging of invalid image recommendation kinds]]
[08:28:59] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Section images: Do not treat unexpected kinds as production errors [extensions/GrowthExperiments] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/923644 (owner: 10Gergő Tisza)
[08:29:00] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[08:29:00] <logmsgbot>	 !log slyngshede@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
[08:29:35] <urbanecm>	 zabe: do you want to backport T337599, or some other thing?
[08:29:35] <stashbot>	 T337599: Running a "get edits" check on user with no edits gives fatal exception - https://phabricator.wikimedia.org/T337599
[08:30:10] <zabe>	 urbanecm: that and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/922492
[08:30:23] <urbanecm>	 okay. in that case, no need for me to queue :). thanks!
[08:30:24] <logmsgbot>	 !log tgr@deploy1002 tgr: Backport for [[gerrit:923643|Improve logging of invalid image recommendation kinds]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
[08:30:29] <wikibugs>	 (03PS16) 10JMeybohm: deployment_server: Create k8s configs with pki certs [puppet] - 10https://gerrit.wikimedia.org/r/904500 (https://phabricator.wikimedia.org/T325268)
[08:30:31] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] firewall: drop block_abuse_nets parameter [puppet] - 10https://gerrit.wikimedia.org/r/923620 (https://phabricator.wikimedia.org/T279683) (owner: 10Jbond)
[08:31:50] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
[08:31:52] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[08:32:40] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/924059 (owner: 10Volans)
[08:33:48] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:34:33] <wikibugs>	 10SRE, 10Observability-Metrics, 10User-fgiunchedi: Extend router ACLs to block 4194/tcp on LVSes - https://phabricator.wikimedia.org/T337689 (10ayounsi) a:03fgiunchedi What I pushed is an extra safeguard, but a more viable fix is to have the daemon listen on the host's primary IP (like all the other simila...
[08:34:53] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:34:53] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:34:53] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[08:34:56] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[08:35:18] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:36:19] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:36:20] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[08:38:21] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:39:22] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:39:22] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:39:22] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[08:39:26] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[08:39:26] <logmsgbot>	 !log slyngshede@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
[08:39:28] <logmsgbot>	 !log tgr@deploy1002 Finished scap: Backport for [[gerrit:923643|Improve logging of invalid image recommendation kinds]] (duration: 10m 30s)
[08:39:34] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (POST pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[08:40:11] <godog>	 XioNoX: thanks for the update re: T337689 that's indeed a better fix, I'll look into that
[08:40:11] <stashbot>	 T337689: Extend router ACLs to block 4194/tcp on LVSes - https://phabricator.wikimedia.org/T337689
[08:40:33] <XioNoX>	 no pb! let me know if I can help
[08:41:11] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
[08:41:12] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[08:41:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: (2) systemd-timedated.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:43:05] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:43:14] <wikibugs>	 (03PS1) 10Volans: .gitmessage: add Hosts: line [puppet] - 10https://gerrit.wikimedia.org/r/924438
[08:43:42] <wikibugs>	 (03PS2) 10Jbond: ganeti. add GanetiRAPI.nodes and GanetiRAPI.groups [software/spicerack] - 10https://gerrit.wikimedia.org/r/924081
[08:43:44] <wikibugs>	 (03CR) 10Jbond: ganeti. add GanetiRAPI.nodes and GanetiRAPI.groups (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/924081 (owner: 10Jbond)
[08:43:48] <wikibugs>	 (03CR) 10Volans: [C: 03+2] spicerack: add test-cookbook script [puppet] - 10https://gerrit.wikimedia.org/r/924059 (owner: 10Volans)
[08:44:05] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:44:05] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:44:05] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[08:44:08] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[08:44:11] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[08:44:34] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (POST pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[08:45:12] <wikibugs>	 (03CR) 10Hoo man: [C: 03+1] install_console: restrict options used [puppet] - 10https://gerrit.wikimedia.org/r/922559 (https://phabricator.wikimedia.org/T117348) (owner: 10Jbond)
[08:45:45] <wikibugs>	 (03CR) 10Jbond: "thanks all" [puppet] - 10https://gerrit.wikimedia.org/r/922559 (https://phabricator.wikimedia.org/T117348) (owner: 10Jbond)
[08:45:47] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] install_console: restrict options used [puppet] - 10https://gerrit.wikimedia.org/r/922559 (https://phabricator.wikimedia.org/T117348) (owner: 10Jbond)
[08:48:01] <wikibugs>	 (03PS1) 10Slyngshede: WMF signup message, stray " [software/bitu] - 10https://gerrit.wikimedia.org/r/924439
[08:48:42] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:49:42] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:49:42] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:49:42] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[08:49:45] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[08:49:52] <logmsgbot>	 !log slyngshede@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
[08:50:33] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
[08:50:34] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, thx" [software/spicerack] - 10https://gerrit.wikimedia.org/r/924081 (owner: 10Jbond)
[08:50:35] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[08:50:36] <wikibugs>	 (03CR) 10Muehlenhoff: ganeti. add GanetiRAPI.nodes and GanetiRAPI.groups (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/924081 (owner: 10Jbond)
[08:51:09] <wikibugs>	 (03Merged) 10jenkins-bot: Section images: Do not treat unexpected kinds as production errors [extensions/GrowthExperiments] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/923644 (owner: 10Gergő Tisza)
[08:51:45] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Improve handling of missing image recommendation [extensions/GrowthExperiments] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924361 (owner: 10Gergő Tisza)
[08:51:49] <logmsgbot>	 !log tgr@deploy1002 Started scap: Backport for [[gerrit:923644|Section images: Do not treat unexpected kinds as production errors]]
[08:52:36] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:53:14] <logmsgbot>	 !log tgr@deploy1002 tgr: Backport for [[gerrit:923644|Section images: Do not treat unexpected kinds as production errors]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
[08:53:42] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[08:53:42] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:53:42] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[08:53:45] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[08:54:10] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[08:54:49] <wikibugs>	 (03PS3) 10Jbond: ganeti. add GanetiRAPI.nodes and GanetiRAPI.groups [software/spicerack] - 10https://gerrit.wikimedia.org/r/924081
[08:55:02] <wikibugs>	 (03CR) 10Jbond: ganeti. add GanetiRAPI.nodes and GanetiRAPI.groups (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/924081 (owner: 10Jbond)
[08:55:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [software/spicerack] - 10https://gerrit.wikimedia.org/r/924081 (owner: 10Jbond)
[08:59:01] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:00:03] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:00:04] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:00:04] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[09:00:07] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[09:00:07] <logmsgbot>	 !log slyngshede@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
[09:01:09] <icinga-wm>	 PROBLEM - Check systemd state on wdqs2021 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:01:32] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:02:42] <jinxer-wm>	 (SystemdUnitFailed) firing: systemd-timedated.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:04:55] <wikibugs>	 (03PS4) 10Jbond: ganeti. add GanetiRAPI.nodes and GanetiRAPI.groups [software/spicerack] - 10https://gerrit.wikimedia.org/r/924081
[09:05:00] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] ganeti. add GanetiRAPI.nodes and GanetiRAPI.groups [software/spicerack] - 10https://gerrit.wikimedia.org/r/924081 (owner: 10Jbond)
[09:05:32] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:06:12] <logmsgbot>	 !log tgr@deploy1002 Finished scap: Backport for [[gerrit:923644|Section images: Do not treat unexpected kinds as production errors]] (duration: 14m 22s)
[09:09:20] <wikibugs>	 (03Merged) 10jenkins-bot: ganeti. add GanetiRAPI.nodes and GanetiRAPI.groups [software/spicerack] - 10https://gerrit.wikimedia.org/r/924081 (owner: 10Jbond)
[09:11:12] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
[09:11:18] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[09:11:55] <wikibugs>	 (03PS1) 10Fabfur: cache::upload: Add hieradata to switch HTTPS redirection from Varnish to HAProxy [puppet] - 10https://gerrit.wikimedia.org/r/924444 (https://phabricator.wikimedia.org/T323557)
[09:12:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: systemd-timedated.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:13:12] <wikibugs>	 (03Merged) 10jenkins-bot: Improve handling of missing image recommendation [extensions/GrowthExperiments] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924361 (owner: 10Gergő Tisza)
[09:13:28] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:14:17] <wikibugs>	 (03Abandoned) 10Clément Goubert: testwikidatawiki: Fix missing mobile redir to k8s [puppet] - 10https://gerrit.wikimedia.org/r/923384 (https://phabricator.wikimedia.org/T337490) (owner: 10Clément Goubert)
[09:14:35] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:14:35] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:14:35] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[09:14:38] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[09:15:25] <logmsgbot>	 !log tgr@deploy1002 Started scap: Backport for [[gerrit:924361|Improve handling of missing image recommendation]]
[09:16:05] <wikibugs>	 (03PS8) 10Jelto: Gitlab: Support OIDC alongside CAS for OmniAuth in Gitlab [puppet] - 10https://gerrit.wikimedia.org/r/916509 (https://phabricator.wikimedia.org/T320390) (owner: 10Jbond)
[09:17:16] <logmsgbot>	 !log tgr@deploy1002 tgr: Backport for [[gerrit:924361|Improve handling of missing image recommendation]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
[09:19:15] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+1] trafficserver: also match mobile domains in mw-on-k8s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/924080 (owner: 10Giuseppe Lavagetto)
[09:19:25] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/924444 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[09:19:50] <arturo>	 !log run aborrero@cumin1001:~ 2s 98 $ sudo cumin "P{R:Profile::Mariadb::Section = 's7'} and P{P:wmcs::db::wikireplicas::mariadb_multiinstance}" "/usr/local/sbin/maintain-meta_p --all-databases --bootstrap"
[09:19:50] <arturo>	  (T337446)
[09:19:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:19:54] <stashbot>	 T337446: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446
[09:20:32] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:20:45] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
[09:21:56] <wikibugs>	 (03PS3) 10Jelto: gitlab: sync all configured providers [puppet] - 10https://gerrit.wikimedia.org/r/916522 (https://phabricator.wikimedia.org/T320390) (owner: 10Jbond)
[09:22:14] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
[09:24:03] <wikibugs>	 (03PS5) 10Clément Goubert: mw-on-k8s: Redirect www.mediawiki.org to mw-on-k8s [puppet] - 10https://gerrit.wikimedia.org/r/923385 (https://phabricator.wikimedia.org/T337490)
[09:24:22] <logmsgbot>	 !log tgr@deploy1002 Finished scap: Backport for [[gerrit:924361|Improve handling of missing image recommendation]] (duration: 08m 57s)
[09:24:34] <jinxer-wm>	 (KubernetesAPILatency) firing: (2) High Kubernetes API latency (DELETE pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[09:24:40] <wikibugs>	 (03PS2) 10Fabfur: cache::upload: Add hieradata to switch HTTPS redirection from Varnish to HAProxy [puppet] - 10https://gerrit.wikimedia.org/r/924444 (https://phabricator.wikimedia.org/T323557)
[09:25:24] <tgr_>	 zabe: done
[09:25:33] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/924444 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[09:25:37] <tgr_>	 sorry for the delay, GrowthExperiments CI isn't very snappy
[09:26:19] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Check for null when using ::getCheckUserHelperFieldset [extensions/CheckUser] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/923635 (https://phabricator.wikimedia.org/T337599) (owner: 10Zabe)
[09:26:33] <wikibugs>	 (03PS4) 10Zabe: Start reading from rev_comment_id in test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/922492 (https://phabricator.wikimedia.org/T299954)
[09:26:44] <zabe>	 yup
[09:27:46] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:27:49] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Start reading from rev_comment_id in test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/922492 (https://phabricator.wikimedia.org/T299954) (owner: 10Zabe)
[09:28:39] <wikibugs>	 (03Merged) 10jenkins-bot: Start reading from rev_comment_id in test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/922492 (https://phabricator.wikimedia.org/T299954) (owner: 10Zabe)
[09:29:23] <logmsgbot>	 !log zabe@deploy1002 Started scap: Backport for [[gerrit:922492|Start reading from rev_comment_id in test wikis (T299954)]]
[09:29:28] <stashbot>	 T299954: Write code for handing write and read of rev_comment_id - https://phabricator.wikimedia.org/T299954
[09:29:34] <jinxer-wm>	 (KubernetesAPILatency) resolved: (2) High Kubernetes API latency (DELETE pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[09:29:40] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb2003.codfw.wmnet with OS bookworm
[09:29:50] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Setup an initial bookworm host pair with Puppetdb 7 - https://phabricator.wikimedia.org/T321783 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host puppetdb2003.codfw.wmnet with OS bookworm completed: - puppetd...
[09:29:55] <wikibugs>	 (03PS1) 10Muehlenhoff: ganeti: Pass memory size in megabytes [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/924445 (https://phabricator.wikimedia.org/T230712)
[09:30:27] <wikibugs>	 (03PS1) 10Jbond: sre.ganeti.makvm: update the default memory to 1.5 [cookbooks] - 10https://gerrit.wikimedia.org/r/924466
[09:30:39] <wikibugs>	 (03PS1) 10Muehlenhoff: sre.ganeti.makevm: Bump default to 1.5G [cookbooks] - 10https://gerrit.wikimedia.org/r/924467 (https://phabricator.wikimedia.org/T230712)
[09:30:47] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:30:48] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[09:30:51] <logmsgbot>	 !log zabe@deploy1002 zabe: Backport for [[gerrit:922492|Start reading from rev_comment_id in test wikis (T299954)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
[09:31:16] <wikibugs>	 (03CR) 10Clément Goubert: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/923591 (owner: 10Clément Goubert)
[09:31:43] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
[09:32:42] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:33:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sre.ganeti.makvm: update the default memory to 1.5 [cookbooks] - 10https://gerrit.wikimedia.org/r/924466 (owner: 10Jbond)
[09:33:31] <icinga-wm>	 PROBLEM - Check systemd state on wdqs2021 is CRITICAL: CRITICAL - Failed to connect to bus: Resource temporarily unavailable: unexpected https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:33:47] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:33:47] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:33:47] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[09:33:50] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[09:33:50] <logmsgbot>	 !log slyngshede@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
[09:33:56] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ganeti: Pass memory size in megabytes [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/924445 (https://phabricator.wikimedia.org/T230712) (owner: 10Muehlenhoff)
[09:34:03] <wikibugs>	 (03PS2) 10Jbond: sre.ganeti.makvm: update the default memory to 1.5 [cookbooks] - 10https://gerrit.wikimedia.org/r/924466
[09:34:51] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
[09:36:25] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sre.ganeti.makvm: update the default memory to 1.5 [cookbooks] - 10https://gerrit.wikimedia.org/r/924466 (owner: 10Jbond)
[09:37:11] <logmsgbot>	 !log zabe@deploy1002 Finished scap: Backport for [[gerrit:922492|Start reading from rev_comment_id in test wikis (T299954)]] (duration: 07m 48s)
[09:37:16] <stashbot>	 T299954: Write code for handing write and read of rev_comment_id - https://phabricator.wikimedia.org/T299954
[09:37:54] <wikibugs>	 (03PS8) 10Clément Goubert: mw-on-k8s: Redirect closed wikis to mw-on-k8s [puppet] - 10https://gerrit.wikimedia.org/r/923386 (https://phabricator.wikimedia.org/T337490)
[09:38:51] <wikibugs>	 (03PS9) 10Clément Goubert: mw-on-k8s: Redirect closed wikis to mw-on-k8s [puppet] - 10https://gerrit.wikimedia.org/r/923386 (https://phabricator.wikimedia.org/T337490)
[09:39:24] <wikibugs>	 (03PS2) 10Muehlenhoff: ganeti: Pass memory size in megabytes [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/924445 (https://phabricator.wikimedia.org/T230712)
[09:40:13] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
[09:40:14] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[09:41:22] <wikibugs>	 (03Merged) 10jenkins-bot: Check for null when using ::getCheckUserHelperFieldset [extensions/CheckUser] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/923635 (https://phabricator.wikimedia.org/T337599) (owner: 10Zabe)
[09:42:10] <logmsgbot>	 !log zabe@deploy1002 Started scap: Backport for [[gerrit:923635|Check for null when using ::getCheckUserHelperFieldset (T337599)]]
[09:42:14] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:42:15] <stashbot>	 T337599: Running a "get edits" check on user with no edits gives fatal exception - https://phabricator.wikimedia.org/T337599
[09:42:20] <wikibugs>	 (03CR) 10Volans: [C: 04-2] "This is against the debian branch... it should be against master" [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/924445 (https://phabricator.wikimedia.org/T230712) (owner: 10Muehlenhoff)
[09:42:56] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[09:42:59] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host puppetdb1003.eqiad.wmnet with OS bookworm
[09:43:05] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Setup an initial bookworm host pair with Puppetdb 7 - https://phabricator.wikimedia.org/T321783 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host puppetdb1003.eqiad.wmnet with OS bookworm
[09:43:15] <wikibugs>	 (03PS1) 10Zabe: Start reading from rev_comment_id in group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924469 (https://phabricator.wikimedia.org/T299954)
[09:43:36] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:43:36] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:43:36] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
[09:43:39] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
[09:43:43] <logmsgbot>	 !log zabe@deploy1002 zabe: Backport for [[gerrit:923635|Check for null when using ::getCheckUserHelperFieldset (T337599)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
[09:45:28] <wikibugs>	 (03PS1) 10Marostegui: wiki-replicas.sql: Add heartbeat_p [puppet] - 10https://gerrit.wikimedia.org/r/924471 (https://phabricator.wikimedia.org/T337446)
[09:45:34] <marostegui>	 Amir1:  ^
[09:46:10] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
[09:46:19] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "Do we need to add meta_p too?" [puppet] - 10https://gerrit.wikimedia.org/r/924471 (https://phabricator.wikimedia.org/T337446) (owner: 10Marostegui)
[09:46:23] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wiki-replicas.sql: Add heartbeat_p [puppet] - 10https://gerrit.wikimedia.org/r/924471 (https://phabricator.wikimedia.org/T337446) (owner: 10Marostegui)
[09:46:26] <Amir1>	 thanks
[09:46:35] <marostegui>	 Amir1: yep, I will note it just in case we need more
[09:46:51] <Amir1>	 awesome. Thanks
[09:47:28] <wikibugs>	 (03PS1) 10Muehlenhoff: ganeti: Pass memory size in megabytes [software/spicerack] - 10https://gerrit.wikimedia.org/r/924472 (https://phabricator.wikimedia.org/T230712)
[09:47:40] <wikibugs>	 (03PS1) 10Marostegui: wiki-replicas.sql: Add meta_p GRANT [puppet] - 10https://gerrit.wikimedia.org/r/924473 (https://phabricator.wikimedia.org/T337446)
[09:48:02] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Do not merge yet in case we find other grants that are needed" [puppet] - 10https://gerrit.wikimedia.org/r/924473 (https://phabricator.wikimedia.org/T337446) (owner: 10Marostegui)
[09:49:05] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Start reading from rev_comment_id in group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924469 (https://phabricator.wikimedia.org/T299954) (owner: 10Zabe)
[09:49:37] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
[09:50:03] <wikibugs>	 (03Merged) 10jenkins-bot: Start reading from rev_comment_id in group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924469 (https://phabricator.wikimedia.org/T299954) (owner: 10Zabe)
[09:51:46] <wikibugs>	 (03PS3) 10Jbond: sre.ganeti.makvm: update the default memory to 1.5 [cookbooks] - 10https://gerrit.wikimedia.org/r/924466
[09:52:03] <logmsgbot>	 !log zabe@deploy1002 Finished scap: Backport for [[gerrit:923635|Check for null when using ::getCheckUserHelperFieldset (T337599)]] (duration: 09m 52s)
[09:52:08] <stashbot>	 T337599: Running a "get edits" check on user with no edits gives fatal exception - https://phabricator.wikimedia.org/T337599
[09:52:34] <jinxer-wm>	 (KubernetesAPILatency) firing: (2) High Kubernetes API latency (DELETE pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[09:52:36] <logmsgbot>	 !log zabe@deploy1002 Started scap: Backport for [[gerrit:924469|Start reading from rev_comment_id in group0 wikis (T299954)]]
[09:52:40] <stashbot>	 T299954: Write code for handing write and read of rev_comment_id - https://phabricator.wikimedia.org/T299954
[09:52:58] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[09:54:48] <logmsgbot>	 !log zabe@deploy1002 zabe: Backport for [[gerrit:924469|Start reading from rev_comment_id in group0 wikis (T299954)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
[09:55:17] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, thanks" [software/spicerack] - 10https://gerrit.wikimedia.org/r/924472 (https://phabricator.wikimedia.org/T230712) (owner: 10Muehlenhoff)
[09:55:35] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
[09:55:42] <jinxer-wm>	 (SystemdUnitFailed) firing: systemd-timedated.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:57:34] <jinxer-wm>	 (KubernetesAPILatency) resolved: (2) High Kubernetes API latency (DELETE pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[09:57:52] <icinga-wm>	 PROBLEM - MariaDB read only s2 on clouddb1018 is CRITICAL: Could not connect to localhost:3312 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[09:57:55] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:58:06] <icinga-wm>	 PROBLEM - Check systemd state on clouddb1018 is CRITICAL: CRITICAL - degraded: The following units failed: wmf-pt-kill@s2.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:58:48] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
[09:58:58] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
[09:59:11] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.hosts.reimage for host testvm2006.codfw.wmnet with OS bookworm
[09:59:18] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: Merge reimaging cookbooks - https://phabricator.wikimedia.org/T336491 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host testvm2006.codfw.wmnet with OS bookworm
[10:00:06] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1000)
[10:00:25] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good. We need to merge/deploy https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/924472 first" [cookbooks] - 10https://gerrit.wikimedia.org/r/924466 (owner: 10Jbond)
[10:00:48] <logmsgbot>	 !log zabe@deploy1002 Finished scap: Backport for [[gerrit:924469|Start reading from rev_comment_id in group0 wikis (T299954)]] (duration: 08m 12s)
[10:00:53] <stashbot>	 T299954: Write code for handing write and read of rev_comment_id - https://phabricator.wikimedia.org/T299954
[10:00:54] * zabe done
[10:01:22] <icinga-wm>	 PROBLEM - mysqld processes on clouddb1018 is CRITICAL: PROCS CRITICAL: 1 process with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[10:01:48] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s2 on clouddb1018 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:04:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] ganeti: Pass memory size in megabytes [software/spicerack] - 10https://gerrit.wikimedia.org/r/924472 (https://phabricator.wikimedia.org/T230712) (owner: 10Muehlenhoff)
[10:07:25] <wikibugs>	 (03CR) 10Jbond: ganeti: Pass memory size in megabytes (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/924472 (https://phabricator.wikimedia.org/T230712) (owner: 10Muehlenhoff)
[10:08:30] <wikibugs>	 (03CR) 10Volans: [C: 03+1] ganeti: Pass memory size in megabytes (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/924472 (https://phabricator.wikimedia.org/T230712) (owner: 10Muehlenhoff)
[10:10:07] <wikibugs>	 (03PS4) 10Majavah: ferm::service: allow passing array of hosts [puppet] - 10https://gerrit.wikimedia.org/r/919300
[10:10:28] <wikibugs>	 (03PS1) 10Jbond: ganeti: update definition of add to accept float or int [software/spicerack] - 10https://gerrit.wikimedia.org/r/924477 (https://phabricator.wikimedia.org/T230712)
[10:10:30] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ferm::service: allow passing array of hosts [puppet] - 10https://gerrit.wikimedia.org/r/919300 (owner: 10Majavah)
[10:10:38] <wikibugs>	 (03CR) 10Jbond: ganeti: Pass memory size in megabytes (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/924472 (https://phabricator.wikimedia.org/T230712) (owner: 10Muehlenhoff)
[10:10:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: systemd-timedated.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:11:10] <logmsgbot>	 !log jbond@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard1003.eqiad.wmnet with OS bookworm
[10:11:19] <logmsgbot>	 !log jbond@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard2003.codfw.wmnet with OS bookworm
[10:11:29] <wikibugs>	 (03PS5) 10Majavah: ferm::service: allow passing array of hosts [puppet] - 10https://gerrit.wikimedia.org/r/919300
[10:12:19] <wikibugs>	 (03CR) 10Majavah: ferm::service: allow passing array of hosts (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/919300 (owner: 10Majavah)
[10:12:31] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, nit for tests inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/924477 (https://phabricator.wikimedia.org/T230712) (owner: 10Jbond)
[10:13:49] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41394/console" [puppet] - 10https://gerrit.wikimedia.org/r/919300 (owner: 10Majavah)
[10:15:51] <wikibugs>	 (03PS2) 10Jbond: ganeti: update definition of add to accept float or int [software/spicerack] - 10https://gerrit.wikimedia.org/r/924477 (https://phabricator.wikimedia.org/T230712)
[10:15:54] <wikibugs>	 (03CR) 10Jbond: ganeti: update definition of add to accept float or int (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/924477 (https://phabricator.wikimedia.org/T230712) (owner: 10Jbond)
[10:16:49] <wikibugs>	 (03CR) 10Volans: "LGTM, nit inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/924466 (owner: 10Jbond)
[10:17:00] <wikibugs>	 (03CR) 10Jgiannelos: "Just a heads up, this requires the container to already exist in swift." [deployment-charts] - 10https://gerrit.wikimedia.org/r/924112 (https://phabricator.wikimedia.org/T333318) (owner: 10Effie Mouzeli)
[10:17:28] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, thanks!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/924477 (https://phabricator.wikimedia.org/T230712) (owner: 10Jbond)
[10:18:10] <wikibugs>	 (03CR) 10Jgiannelos: [C: 03+1] "Also we need to depool codfw before applying this change." [deployment-charts] - 10https://gerrit.wikimedia.org/r/924112 (https://phabricator.wikimedia.org/T333318) (owner: 10Effie Mouzeli)
[10:18:25] <wikibugs>	 (03PS4) 10Jbond: sre.ganeti.makvm: update the default memory to 1.5 [cookbooks] - 10https://gerrit.wikimedia.org/r/924466
[10:18:48] <wikibugs>	 (03CR) 10Jbond: sre.ganeti.makvm: update the default memory to 1.5 (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/924466 (owner: 10Jbond)
[10:19:12] <wikibugs>	 (03PS1) 10Matthias Mullie: Fix maxJobs default [extensions/ImageSuggestions] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924454
[10:19:22] <wikibugs>	 (03PS1) 10Matthias Mullie: Fix maxJobs default [extensions/ImageSuggestions] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924455
[10:21:18] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sre.ganeti.makvm: update the default memory to 1.5 [cookbooks] - 10https://gerrit.wikimedia.org/r/924466 (owner: 10Jbond)
[10:21:44] <icinga-wm>	 PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS1299/IPv4: Active - Telia, AS1299/IPv6: Active - Telia https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:25:22] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+1] "Thanks for fixing this" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924079 (owner: 10Gergő Tisza)
[10:28:18] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[10:29:05] <wikibugs>	 (03PS5) 10Jbond: sre.ganeti.makvm: update the default memory to 1.5 [cookbooks] - 10https://gerrit.wikimedia.org/r/924466
[10:29:24] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] ganeti: update definition of add to accept float or int [software/spicerack] - 10https://gerrit.wikimedia.org/r/924477 (https://phabricator.wikimedia.org/T230712) (owner: 10Jbond)
[10:29:32] <wikibugs>	 (03Abandoned) 10Volans: ganeti: Pass memory size in megabytes [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/924445 (https://phabricator.wikimedia.org/T230712) (owner: 10Muehlenhoff)
[10:31:44] <icinga-wm>	 PROBLEM - Check systemd state on wdqs2021 is CRITICAL: CRITICAL - Failed to connect to bus: Resource temporarily unavailable: unexpected https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:32:33] <wikibugs>	 (03CR) 10Muehlenhoff: "Looks good, one nit and one thought/proposal inline" [puppet] - 10https://gerrit.wikimedia.org/r/922815 (https://phabricator.wikimedia.org/T279683) (owner: 10Jbond)
[10:33:18] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[10:33:19] <wikibugs>	 (03PS3) 10Hnowlan: rest-gateway: add citoid support [deployment-charts] - 10https://gerrit.wikimedia.org/r/920710 (https://phabricator.wikimedia.org/T329049)
[10:33:28] <wikibugs>	 (03CR) 10Hnowlan: rest-gateway: add citoid support (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/920710 (https://phabricator.wikimedia.org/T329049) (owner: 10Hnowlan)
[10:33:52] <wikibugs>	 (03Merged) 10jenkins-bot: ganeti: update definition of add to accept float or int [software/spicerack] - 10https://gerrit.wikimedia.org/r/924477 (https://phabricator.wikimedia.org/T230712) (owner: 10Jbond)
[10:37:58] <icinga-wm>	 RECOVERY - SSH on wdqs2021 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[10:40:40] <wikibugs>	 (03PS3) 10Volans: dhcp: reword some exception messages [software/spicerack] - 10https://gerrit.wikimedia.org/r/920225
[10:40:42] <jinxer-wm>	 (SystemdUnitFailed) firing: systemd-timedated.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:40:44] <wikibugs>	 (03CR) 10Volans: "replies and questions inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/920225 (owner: 10Volans)
[10:41:00] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on clouddb1021 is OK: OK slave_sql_lag not a slave https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:41:11] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [staging] START helmfile.d/services/thumbor: apply
[10:41:16] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [staging] DONE helmfile.d/services/thumbor: apply
[10:42:42] <icinga-wm>	 PROBLEM - SSH on wdqs2021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[10:42:45] <wikibugs>	 (03PS2) 10Volans: Add Python 3.11 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/922489 (owner: 10Ayounsi)
[10:44:58] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] dhcp: reword some exception messages [software/spicerack] - 10https://gerrit.wikimedia.org/r/920225 (owner: 10Volans)
[10:45:01] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] tegola: Switch swift container to tegola-swift-codfw-v003 (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/924112 (https://phabricator.wikimedia.org/T333318) (owner: 10Effie Mouzeli)
[10:45:05] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wiki-replicas.sql: Add meta_p GRANT [puppet] - 10https://gerrit.wikimedia.org/r/924473 (https://phabricator.wikimedia.org/T337446) (owner: 10Marostegui)
[10:45:22] <wikibugs>	 (03CR) 10Effie Mouzeli: tegola: Switch swift container to tegola-swift-codfw-v003 [deployment-charts] - 10https://gerrit.wikimedia.org/r/924112 (https://phabricator.wikimedia.org/T333318) (owner: 10Effie Mouzeli)
[10:45:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: systemd-timedated.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:46:57] <wikibugs>	 (03PS4) 10Volans: dhcp: reword some exception messages [software/spicerack] - 10https://gerrit.wikimedia.org/r/920225
[10:50:24] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
[10:50:49] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [codfw] START helmfile.d/services/thumbor: apply
[10:53:09] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [codfw] DONE helmfile.d/services/thumbor: apply
[10:53:52] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
[10:56:46] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[10:56:49] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[10:57:12] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/thumbor: apply
[10:57:34] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (PATCH pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[10:57:44] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: lvs: remove wikireplicas S3 definition [puppet] - 10https://gerrit.wikimedia.org/r/924481 (https://phabricator.wikimedia.org/T337721)
[10:58:02] <wikibugs>	 (03PS1) 10Jbond: build_envoy_deb: update to work with bookworm [puppet] - 10https://gerrit.wikimedia.org/r/924482
[11:00:03] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
[11:02:34] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (PATCH pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[11:02:48] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/924482 (owner: 10Jbond)
[11:04:11] <wikibugs>	 (03PS1) 10Filippo Giunchedi: cadvisor: listen on main ip address only [puppet] - 10https://gerrit.wikimedia.org/r/924483 (https://phabricator.wikimedia.org/T337689)
[11:04:30] <wikibugs>	 (03CR) 10Ladsgroup: "Adding Brandon as he reviewed If49d66b64c1 so might know if this can cause issues and Valentine who was involved with the pybal's page yes" [puppet] - 10https://gerrit.wikimedia.org/r/924481 (https://phabricator.wikimedia.org/T337721) (owner: 10Arturo Borrero Gonzalez)
[11:04:31] <icinga-wm>	 RECOVERY - Check systemd state on wdqs2021 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:05:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [cookbooks] - 10https://gerrit.wikimedia.org/r/924466 (owner: 10Jbond)
[11:05:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41396/console" [puppet] - 10https://gerrit.wikimedia.org/r/924483 (https://phabricator.wikimedia.org/T337689) (owner: 10Filippo Giunchedi)
[11:07:20] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2006.codfw.wmnet with OS bookworm
[11:07:20] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
[11:07:25] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: Merge reimaging cookbooks - https://phabricator.wikimedia.org/T336491 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host testvm2006.codfw.wmnet with OS bookworm completed: - testvm2...
[11:08:09] <icinga-wm>	 RECOVERY - SSH on wdqs2021 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[11:08:34] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm minor nit" [puppet] - 10https://gerrit.wikimedia.org/r/924483 (https://phabricator.wikimedia.org/T337689) (owner: 10Filippo Giunchedi)
[11:08:37] <wikibugs>	 (03PS7) 10Slyngshede: sre.ganeti.makevm call reimage after VM creation [cookbooks] - 10https://gerrit.wikimedia.org/r/920203 (https://phabricator.wikimedia.org/T336491)
[11:08:44] <wikibugs>	 10SRE, 10MW-on-K8s, 10Traffic, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert)
[11:10:27] <wikibugs>	 10SRE, 10MW-on-K8s, 10Traffic, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert)
[11:11:07] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Add Python 3.11 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/922489 (owner: 10Ayounsi)
[11:11:11] <wikibugs>	 (03Abandoned) 10Muehlenhoff: sre.ganeti.makevm: Bump default to 1.5G [cookbooks] - 10https://gerrit.wikimedia.org/r/924467 (https://phabricator.wikimedia.org/T230712) (owner: 10Muehlenhoff)
[11:11:22] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloud_private: route the whole cloud public IPv4 space to cloudsw [puppet] - 10https://gerrit.wikimedia.org/r/923324 (https://phabricator.wikimedia.org/T336963) (owner: 10Arturo Borrero Gonzalez)
[11:11:45] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv4: Idle - HE, AS6939/IPv6: Idle - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[11:12:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sre.ganeti.makevm call reimage after VM creation [cookbooks] - 10https://gerrit.wikimedia.org/r/920203 (https://phabricator.wikimedia.org/T336491) (owner: 10Slyngshede)
[11:12:49] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] "Long week-end is gone unblocking deployment :]" [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/923688 (https://phabricator.wikimedia.org/T331651) (owner: 10Hashar)
[11:13:33] <wikibugs>	 (03Merged) 10jenkins-bot: wm-checks-api: add support for DUCT [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/923688 (https://phabricator.wikimedia.org/T331651) (owner: 10Hashar)
[11:13:33] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] Gitlab: Support OIDC alongside CAS for OmniAuth in Gitlab [puppet] - 10https://gerrit.wikimedia.org/r/916509 (https://phabricator.wikimedia.org/T320390) (owner: 10Jbond)
[11:13:41] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[11:14:02] <logmsgbot>	 !log hashar@deploy1002 Started deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - T331651
[11:14:07] <stashbot>	 T331651: [wm-checks-api] support kindrobot - https://phabricator.wikimedia.org/T331651
[11:14:10] <logmsgbot>	 !log hashar@deploy1002 Finished deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - T331651 (duration: 00m 08s)
[11:15:13] <wikibugs>	 (03PS8) 10Slyngshede: sre.ganeti.makevm call reimage after VM creation [cookbooks] - 10https://gerrit.wikimedia.org/r/920203 (https://phabricator.wikimedia.org/T336491)
[11:17:31] <wikibugs>	 (03Merged) 10jenkins-bot: Add Python 3.11 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/922489 (owner: 10Ayounsi)
[11:21:24] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[11:21:27] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[11:34:45] <icinga-wm>	 PROBLEM - puppetboard.wikimedia.org tls expiry on puppetboard1003 is CRITICAL: connect to address 10.64.32.38 and port 443: Connection refused https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[11:34:51] <icinga-wm>	 PROBLEM - Check that envoy is running on puppetboard1003 is CRITICAL: CRITICAL - Expecting active but unit envoyproxy.service is inactive https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23Envoy
[11:35:05] <icinga-wm>	 PROBLEM - Check systemd state on puppetboard1003 is CRITICAL: CRITICAL - degraded: The following units failed: uwsgi-puppetboard.service,wmf_auto_restart_uwsgi-puppetboard.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:35:11] <icinga-wm>	 PROBLEM - puppetboard.wikimedia.org requires authentication on puppetboard1003 is CRITICAL: connect to address 10.64.32.38 and port 443: Connection refused https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[11:35:57] <icinga-wm>	 PROBLEM - uWSGI puppetboard -http via nrpe- on puppetboard1003 is CRITICAL: connect to address localhost and port 8001: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/puppetboard
[11:41:14] <wikibugs>	 (03CR) 10Slyngshede: "Parameters where configured wrong, revealed in testing." [cookbooks] - 10https://gerrit.wikimedia.org/r/920203 (https://phabricator.wikimedia.org/T336491) (owner: 10Slyngshede)
[11:41:23] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[11:41:26] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[11:42:54] <wikibugs>	 (03PS1) 10Daimona Eaytoy: prod: Remove $wgCampaignEventsEnableMultipleOrganizers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924488 (https://phabricator.wikimedia.org/T334088)
[11:45:32] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
[11:45:45] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
[11:45:47] <wikibugs>	 (03PS3) 10Daimona Eaytoy: beta: Remove $wgCampaignEventsEnableMultipleOrganizers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/909401 (https://phabricator.wikimedia.org/T334088) (owner: 10Cmelo)
[11:46:01] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
[11:46:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [cookbooks] - 10https://gerrit.wikimedia.org/r/920203 (https://phabricator.wikimedia.org/T336491) (owner: 10Slyngshede)
[11:46:58] <wikibugs>	 (03PS2) 10Daimona Eaytoy: prod: Remove $wgCampaignEventsEnableMultipleOrganizers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924488 (https://phabricator.wikimedia.org/T334088)
[11:47:36] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[11:50:00] <logmsgbot>	 !log slyngshede@cumin1001 START - Cookbook sre.dns.netbox
[11:50:13] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
[11:51:04] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/920203 (https://phabricator.wikimedia.org/T336491) (owner: 10Slyngshede)
[11:51:08] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
[11:51:08] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:51:10] <wikibugs>	 10SRE, 10CAS-SSO, 10Infrastructure-Foundations, 10serviceops-collab, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) @jbond I tested https://gerrit.wikimedia.org/r/916509 on the GitLab hosts but the change is noop and no new oauth provider is av...
[11:51:13] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:51:14] <logmsgbot>	 !log slyngshede@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
[11:51:19] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: Merge reimaging cookbooks - https://phabricator.wikimedia.org/T336491 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by slyngshede@cumin1001 for hosts: `testvm2006.codfw.wmnet` - testvm2006.codfw.wmnet (**PASS**)...
[11:51:44] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/924466 (owner: 10Jbond)
[11:57:09] <wikibugs>	 (03CR) 10Ayounsi: dhcp: reword some exception messages (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/920225 (owner: 10Volans)
[11:59:42] <wikibugs>	 10SRE, 10ops-codfw, 10cloud-services-team (FY2022/2023-Q4): cloudcontrol2005-dev: make it a cloudlb backend - https://phabricator.wikimedia.org/T336564 (10cmooney) >>! In T336564#8884008, @Jhancock.wm wrote: > @cmooney I moved the patch to switch cloudsw1-b1-codfw, port ge-1/0/13, but I can't get the netbox...
[12:08:19] <wikibugs>	 (03CR) 10Ayounsi: Add cookbook to configure router's BGP sessions to k8s hosts (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/903174 (https://phabricator.wikimedia.org/T306649) (owner: 10Ayounsi)
[12:13:27] <wikibugs>	 (03PS1) 10Ayounsi: Spicerack: add some colors [software/spicerack] - 10https://gerrit.wikimedia.org/r/924493
[12:13:29] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10cloud-services-team, 10netops: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) @papaul when you are back can you advise on the status of these?  They all appear as connected on asw-b1-codfw...
[12:13:35] <wikibugs>	 (03PS2) 10Filippo Giunchedi: cadvisor: listen on main ip address only [puppet] - 10https://gerrit.wikimedia.org/r/924483 (https://phabricator.wikimedia.org/T337689)
[12:13:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] "Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/924483 (https://phabricator.wikimedia.org/T337689) (owner: 10Filippo Giunchedi)
[12:14:18] <logmsgbot>	 !log aborrero@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
[12:14:26] <wikibugs>	 10SRE, 10ops-codfw, 10cloud-services-team (FY2022/2023-Q4): cloudcontrol2005-dev: make it a cloudlb backend - https://phabricator.wikimedia.org/T336564 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
[12:15:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41398/console" [puppet] - 10https://gerrit.wikimedia.org/r/924483 (https://phabricator.wikimedia.org/T337689) (owner: 10Filippo Giunchedi)
[12:16:37] <wikibugs>	 10SRE, 10ops-codfw, 10cloud-services-team (FY2022/2023-Q4): cloudcontrol2005-dev: make it a cloudlb backend - https://phabricator.wikimedia.org/T336564 (10aborrero) >>! In T336564#8888280, @cmooney wrote: >  > @aborrero you should be good to do the reimage on this now.  I've reserved [[ https://netbox.wikime...
[12:18:04] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Spicerack: add some colors [software/spicerack] - 10https://gerrit.wikimedia.org/r/924493 (owner: 10Ayounsi)
[12:21:09] <icinga-wm>	 ACKNOWLEDGEMENT - dump of db_inventory in codfw on backupmon1001 is CRITICAL: Last dump for db_inventory at codfw (db2185) taken on 2023-05-30 03:55:35 is 109 KiB, but the previous one was 93 KiB, a change of +16.6 % Jcrespo expected https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[12:21:09] <icinga-wm>	 ACKNOWLEDGEMENT - dump of db_inventory in eqiad on backupmon1001 is CRITICAL: Last dump for db_inventory at eqiad (db1215) taken on 2023-05-30 04:02:03 is 109 KiB, but the previous one was 93 KiB, a change of +17.1 % Jcrespo expected https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[12:22:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+1 C: 03+2] cadvisor: listen on main ip address only [puppet] - 10https://gerrit.wikimedia.org/r/924483 (https://phabricator.wikimedia.org/T337689) (owner: 10Filippo Giunchedi)
[12:24:43] <wikibugs>	 (03PS1) 10Jbond: install_console: provide a default for $2 [puppet] - 10https://gerrit.wikimedia.org/r/924497
[12:25:41] <wikibugs>	 (03PS2) 10Jbond: install_console: provide a default for $2 [puppet] - 10https://gerrit.wikimedia.org/r/924497 (https://phabricator.wikimedia.org/T117348)
[12:25:42] <jinxer-wm>	 (SystemdUnitFailed) firing: cadvisor.service Failed on elastic1058:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:25:44] <wikibugs>	 (03CR) 10Jelto: "one question in-line regarding blackbox checks." [puppet] - 10https://gerrit.wikimedia.org/r/923652 (https://phabricator.wikimedia.org/T300171) (owner: 10Dzahn)
[12:25:52] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] install_console: provide a default for $2 [puppet] - 10https://gerrit.wikimedia.org/r/924497 (https://phabricator.wikimedia.org/T117348) (owner: 10Jbond)
[12:26:32] <icinga-wm>	 PROBLEM - Check systemd state on db2146 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:26:36] <icinga-wm>	 PROBLEM - Check systemd state on durum3002 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:26:38] <icinga-wm>	 PROBLEM - Check systemd state on cp2027 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:08] <icinga-wm>	 PROBLEM - Check systemd state on cp2042 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:10] <icinga-wm>	 PROBLEM - Check systemd state on parse1008 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:16] <icinga-wm>	 PROBLEM - Check systemd state on cp3056 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:18] <icinga-wm>	 PROBLEM - Check systemd state on cp6012 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:18] <icinga-wm>	 PROBLEM - Check systemd state on mw2375 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:18] <icinga-wm>	 PROBLEM - Check systemd state on mc-gp1003 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:26] <icinga-wm>	 PROBLEM - Check systemd state on mw1476 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:28] <icinga-wm>	 PROBLEM - Check systemd state on dumpsdata1004 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:32] <icinga-wm>	 PROBLEM - Check systemd state on cp6007 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:32] <icinga-wm>	 PROBLEM - Check systemd state on cp3051 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:32] <icinga-wm>	 PROBLEM - Check systemd state on ncredir6001 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:32] <icinga-wm>	 PROBLEM - Check systemd state on cp2030 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:36] <icinga-wm>	 PROBLEM - Check systemd state on elastic1058 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:42] <icinga-wm>	 PROBLEM - Check systemd state on mw2298 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:47] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: Hide 'editnotice-notext' message in VE (and mobile apps) [extensions/VisualEditor] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924159 (https://phabricator.wikimedia.org/T337633)
[12:27:52] <icinga-wm>	 PROBLEM - Check systemd state on mw1460 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:52] <icinga-wm>	 PROBLEM - Check systemd state on db1207 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:55] <godog>	 oh crap, sorry that's me
[12:27:56] <icinga-wm>	 PROBLEM - Check systemd state on doh6002 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:56] <icinga-wm>	 PROBLEM - Check systemd state on parse2016 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:27:57] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: ve.ui.MWGalleryDialog: Fix showing the search panel [extensions/VisualEditor] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924160 (https://phabricator.wikimedia.org/T337638)
[12:28:02] <icinga-wm>	 PROBLEM - Check systemd state on druid1007 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:04] <icinga-wm>	 PROBLEM - Check systemd state on cp4047 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:06] <volans>	 godog:  flag provided but not defined: -listen
[12:28:12] <icinga-wm>	 PROBLEM - Check systemd state on elastic2073 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:12] <icinga-wm>	 PROBLEM - Check systemd state on mw1357 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:12] <icinga-wm>	 PROBLEM - Check systemd state on ganeti6001 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:18] <godog>	 thank you volans 
[12:28:22] <icinga-wm>	 PROBLEM - Check systemd state on parse2005 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:23] <godog>	 I'll revert
[12:28:26] <icinga-wm>	 PROBLEM - Check systemd state on mw1356 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:30] <icinga-wm>	 PROBLEM - Check systemd state on lvs1017 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:32] <icinga-wm>	 PROBLEM - Check systemd state on sessionstore2003 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:35] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Hide 'editnotice-notext' message in VE (and mobile apps) [extensions/VisualEditor] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924456 (https://phabricator.wikimedia.org/T337633)
[12:28:38] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Revert "cadvisor: listen on main ip address only" [puppet] - 10https://gerrit.wikimedia.org/r/924457
[12:28:40] <icinga-wm>	 PROBLEM - Check systemd state on mw1418 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:42] <icinga-wm>	 PROBLEM - Check systemd state on mw1370 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:42] <icinga-wm>	 PROBLEM - Check systemd state on mc-wf1001 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:48] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: ve.ui.MWGalleryDialog: Fix showing the search panel [extensions/VisualEditor] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924458 (https://phabricator.wikimedia.org/T337638)
[12:28:50] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] Revert "cadvisor: listen on main ip address only" [puppet] - 10https://gerrit.wikimedia.org/r/924457 (owner: 10Filippo Giunchedi)
[12:28:52] <icinga-wm>	 PROBLEM - Check systemd state on mw2421 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:56] <icinga-wm>	 PROBLEM - Check systemd state on ganeti2019 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:58] <icinga-wm>	 PROBLEM - Check systemd state on install6002 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:28:58] <icinga-wm>	 PROBLEM - Check systemd state on mw2300 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:04] <icinga-wm>	 PROBLEM - Check systemd state on mw1466 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:13] <bblack>	 s/listen/listen_ip/ I think :)
[12:29:32] <icinga-wm>	 PROBLEM - Check systemd state on mw1393 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:34] <icinga-wm>	 PROBLEM - Check systemd state on snapshot1015 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:35] <godog>	 yes
[12:29:36] <icinga-wm>	 PROBLEM - Check systemd state on mw2354 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:36] <icinga-wm>	 PROBLEM - Check systemd state on parse2013 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:38] <icinga-wm>	 PROBLEM - Check systemd state on mw2410 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:40] <icinga-wm>	 PROBLEM - Check systemd state on lvs6002 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:44] <volans>	 !log disablig puppet where cadvisor is present
[12:29:46] <icinga-wm>	 PROBLEM - Check systemd state on db2113 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:29:52] <icinga-wm>	 PROBLEM - Check systemd state on logstash1025 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:56] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1074 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:29:56] <icinga-wm>	 PROBLEM - Check systemd state on cp3058 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:00] <icinga-wm>	 PROBLEM - Check systemd state on mw2355 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:00] <icinga-wm>	 PROBLEM - Check systemd state on dse-k8s-worker1006 is CRITICAL: CRITICAL - degraded: The following units failed: kubelet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:00] <icinga-wm>	 PROBLEM - Check systemd state on mw1477 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:00] <icinga-wm>	 PROBLEM - Check systemd state on lvs3006 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:00] <icinga-wm>	 PROBLEM - Check systemd state on prometheus4002 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:00] <icinga-wm>	 PROBLEM - Check systemd state on mw1491 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:02] <icinga-wm>	 PROBLEM - Check systemd state on mw1407 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:06] <icinga-wm>	 PROBLEM - Check systemd state on an-airflow1003 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:07] <godog>	 thank you volans 
[12:30:10] <icinga-wm>	 PROBLEM - Check systemd state on cp3054 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:12] <icinga-wm>	 PROBLEM - Check systemd state on ganeti5006 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:12] <icinga-wm>	 PROBLEM - Check systemd state on cp4046 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:12] <icinga-wm>	 PROBLEM - Check systemd state on ganeti6002 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:14] <icinga-wm>	 RECOVERY - BGP status on cr3-ulsfo is OK: BGP OK - up: 93, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:30:20] <icinga-wm>	 PROBLEM - Check systemd state on ores1007 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:30] <icinga-wm>	 PROBLEM - Check systemd state on db2162 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:34] <icinga-wm>	 PROBLEM - Check systemd state on mx2001 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:42] <icinga-wm>	 PROBLEM - Check systemd state on cp3063 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:42] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) cadvisor.service Failed on elastic1058:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:30:48] <icinga-wm>	 PROBLEM - Check systemd state on cp4048 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:54] <icinga-wm>	 PROBLEM - Check systemd state on xhgui2001 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:30:56] <icinga-wm>	 PROBLEM - Check systemd state on mw1440 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:00] <icinga-wm>	 PROBLEM - Check systemd state on mw2292 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:18] <icinga-wm>	 PROBLEM - Check systemd state on gerrit1003 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:18] <icinga-wm>	 PROBLEM - Check systemd state on parse1016 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:24] <icinga-wm>	 PROBLEM - Check systemd state on cp4039 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:30] <icinga-wm>	 PROBLEM - Check systemd state on mw2383 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:30] <icinga-wm>	 PROBLEM - Check systemd state on mw2371 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:32] <icinga-wm>	 PROBLEM - Check systemd state on cp4052 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:36] <icinga-wm>	 PROBLEM - Check systemd state on mw1371 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:36] <icinga-wm>	 PROBLEM - Check systemd state on mw1444 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:38] <icinga-wm>	 PROBLEM - Check systemd state on mw1479 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:38] <icinga-wm>	 PROBLEM - Check systemd state on netflow4002 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:40] <icinga-wm>	 PROBLEM - Check systemd state on mw2443 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:44] <icinga-wm>	 PROBLEM - Check systemd state on poolcounter2003 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:46] <icinga-wm>	 PROBLEM - Check systemd state on mw2450 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:48] <icinga-wm>	 PROBLEM - Check systemd state on cp6011 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:50] <icinga-wm>	 PROBLEM - Check systemd state on mw2310 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:50] <icinga-wm>	 PROBLEM - Check systemd state on mw2367 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:56] <icinga-wm>	 PROBLEM - Check systemd state on mw1461 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:31:58] <icinga-wm>	 PROBLEM - Check systemd state on mw1424 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:32:00] <icinga-wm>	 PROBLEM - Check systemd state on mw2264 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:32:04] <icinga-wm>	 PROBLEM - Check systemd state on mw1488 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:32:24] <icinga-wm>	 PROBLEM - Check systemd state on mw2444 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:32:26] <icinga-wm>	 PROBLEM - Check systemd state on deploy2002 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:32:34] <icinga-wm>	 PROBLEM - Check systemd state on mw2260 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:32:34] <icinga-wm>	 PROBLEM - Check systemd state on mw1404 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:32:42] <wikibugs>	 (03PS1) 10Muehlenhoff: Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498
[12:32:50] <icinga-wm>	 PROBLEM - Check systemd state on cp6004 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:33:06] <icinga-wm>	 PROBLEM - Check systemd state on mw2403 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:33:12] <wikibugs>	 (03PS2) 10Muehlenhoff: Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498
[12:33:18] <icinga-wm>	 RECOVERY - puppetboard.wikimedia.org requires authentication on puppetboard1003 is OK: HTTP OK: Status line output matched HTTP/1.1 302 - 554 bytes in 1.047 second response time https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[12:33:28] <wikibugs>	 (03PS1) 10Filippo Giunchedi: cadvisor: listen on main ip address only [puppet] - 10https://gerrit.wikimedia.org/r/924500 (https://phabricator.wikimedia.org/T337689)
[12:33:40] <icinga-wm>	 RECOVERY - Check that envoy is running on puppetboard1003 is OK: OK - envoyproxy.service is active https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23Envoy
[12:34:08] <icinga-wm>	 PROBLEM - Check systemd state on mw2400 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:34:50] <icinga-wm>	 PROBLEM - Check systemd state on mw1426 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:34:54] <icinga-wm>	 PROBLEM - Check systemd state on mw2413 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:35:50] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498 (owner: 10Muehlenhoff)
[12:37:08] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] cadvisor: listen on main ip address only [puppet] - 10https://gerrit.wikimedia.org/r/924500 (https://phabricator.wikimedia.org/T337689) (owner: 10Filippo Giunchedi)
[12:37:15] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] cadvisor: listen on main ip address only [puppet] - 10https://gerrit.wikimedia.org/r/924500 (https://phabricator.wikimedia.org/T337689) (owner: 10Filippo Giunchedi)
[12:37:44] <icinga-wm>	 PROBLEM - Check systemd state on mw1442 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:38:24] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1068 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:38:26] <icinga-wm>	 PROBLEM - Check systemd state on mw1373 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:38:28] <icinga-wm>	 PROBLEM - Check systemd state on mw2388 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:38:50] <icinga-wm>	 PROBLEM - Check systemd state on install5002 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:38:52] <icinga-wm>	 PROBLEM - Check systemd state on mw1464 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:38:52] <icinga-wm>	 PROBLEM - Check systemd state on mw1457 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:39:12] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[12:39:14] <icinga-wm>	 RECOVERY - Check systemd state on lvs1017 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:39:14] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[12:39:20] <icinga-wm>	 PROBLEM - Check systemd state on cp4041 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:40:11] <wikibugs>	 (03PS3) 10Muehlenhoff: Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498
[12:40:30] <icinga-wm>	 PROBLEM - Check systemd state on cp3064 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:42:28] <icinga-wm>	 PROBLEM - Check systemd state on db2158 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:42:34] <icinga-wm>	 PROBLEM - Check systemd state on maps2007 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:42:38] <icinga-wm>	 PROBLEM - Check systemd state on mw2321 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:42:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498 (owner: 10Muehlenhoff)
[12:43:20] <icinga-wm>	 PROBLEM - Check systemd state on mw2438 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:43:20] <icinga-wm>	 PROBLEM - Check systemd state on parse2001 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:44:18] <icinga-wm>	 PROBLEM - Check systemd state on cp6014 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:44:28] <icinga-wm>	 PROBLEM - Check systemd state on mw1359 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:44:30] <icinga-wm>	 PROBLEM - Check systemd state on mw2272 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:44:34] <icinga-wm>	 PROBLEM - Check systemd state on parse1009 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:45:48] <icinga-wm>	 RECOVERY - Check systemd state on mw2438 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:45:51] <wikibugs>	 10SRE-swift-storage: >=27k objects listed in swift containers but not extant - https://phabricator.wikimedia.org/T327253 (10MatthewVernon) 05Open→03Resolved a:03MatthewVernon I think we can resolve this now; the remove-ghost-objects cookbook has helped, and recent `rclone` runs have successfully completed.
[12:46:16] <icinga-wm>	 PROBLEM - Check systemd state on ganeti1019 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:46:22] <icinga-wm>	 PROBLEM - Check systemd state on mw2359 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:46:55] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10MatthewVernon)
[12:46:58] <wikibugs>	 10SRE-swift-storage: Swiftrepl doesn't work on bullseye (and swiftrepl.conf is deployed by hand) - https://phabricator.wikimedia.org/T299125 (10MatthewVernon) 05Open→03Resolved a:03MatthewVernon I think we're now at the point where we can commit to our `rclone`-based replacement.
[12:48:00] <icinga-wm>	 PROBLEM - Check systemd state on cp4044 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:48:01] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS bullseye
[12:48:06] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-fe2009.codfw.wmnet with OS bullseye
[12:48:13] <wikibugs>	 (03Restored) 10BBlack: service: Disable monitors for wikireplicas [puppet] - 10https://gerrit.wikimedia.org/r/924342 (https://phabricator.wikimedia.org/T337446) (owner: 10Vgutierrez)
[12:48:16] <icinga-wm>	 PROBLEM - Check systemd state on mw2366 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:48:24] <icinga-wm>	 RECOVERY - Check systemd state on an-airflow1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:48:36] <icinga-wm>	 PROBLEM - Check systemd state on cp6015 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:48:44] <wikibugs>	 (03PS4) 10Muehlenhoff: Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498
[12:48:46] <icinga-wm>	 RECOVERY - Check systemd state on cp2030 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:49:01] <wikibugs>	 (03PS3) 10BBlack: service: Disable monitors for wikireplicas [puppet] - 10https://gerrit.wikimedia.org/r/924342 (https://phabricator.wikimedia.org/T337446) (owner: 10Vgutierrez)
[12:49:04] <icinga-wm>	 PROBLEM - Check systemd state on db1178 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:49:18] <icinga-wm>	 PROBLEM - Check systemd state on mw2426 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:49:50] <volans>	 godog: still failing? ^^^
[12:50:22] <icinga-wm>	 PROBLEM - Check systemd state on cp5020 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:50:36] <icinga-wm>	 PROBLEM - Check systemd state on mw2363 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:50:48] <godog>	 volans: I think that's the race between the check and the puppet runs (ongoing)
[12:51:02] <icinga-wm>	 PROBLEM - Check systemd state on mw2259 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:51:03] <volans>	 ok
[12:51:27] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498 (owner: 10Muehlenhoff)
[12:51:55] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[12:51:58] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[12:51:58] <icinga-wm>	 RECOVERY - Check systemd state on mw2426 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:52:22] <icinga-wm>	 PROBLEM - Check systemd state on mw1395 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:52:25] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cloudcontrol2005-dev: enable puppet role [puppet] - 10https://gerrit.wikimedia.org/r/924504 (https://phabricator.wikimedia.org/T336564)
[12:52:34] <icinga-wm>	 RECOVERY - Check systemd state on cp2027 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:52:34] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41405/console" [puppet] - 10https://gerrit.wikimedia.org/r/924342 (https://phabricator.wikimedia.org/T337446) (owner: 10Vgutierrez)
[12:52:54] <icinga-wm>	 PROBLEM - Check systemd state on mw2405 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:53:07] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudcontrol2005-dev: enable puppet role [puppet] - 10https://gerrit.wikimedia.org/r/924504 (https://phabricator.wikimedia.org/T336564) (owner: 10Arturo Borrero Gonzalez)
[12:53:48] <icinga-wm>	 PROBLEM - Check systemd state on mw1414 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:53:48] <icinga-wm>	 RECOVERY - MariaDB read only s2 on clouddb1018 is OK: Version 10.4.22-MariaDB, Uptime 5s, read_only: True, event_scheduler: False, 13.63 QPS, connection latency: 0.004024s, query latency: 0.002028s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[12:53:58] <icinga-wm>	 PROBLEM - Check systemd state on cp1090 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:54:12] <icinga-wm>	 PROBLEM - Check systemd state on lvs6003 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:54:14] <icinga-wm>	 PROBLEM - Check systemd state on mw1364 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:54:16] <icinga-wm>	 RECOVERY - Check systemd state on mw1357 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:54:16] <icinga-wm>	 PROBLEM - Check systemd state on mw2335 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:54:22] <wikibugs>	 (03CR) 10Jgiannelos: [C: 03+2] wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/922543 (owner: 10PipelineBot)
[12:54:28] <icinga-wm>	 RECOVERY - mysqld processes on clouddb1018 is OK: PROCS OK: 2 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[12:54:32] <MatmaRex>	 jouncebot: next
[12:54:32] <jouncebot>	 In 0 hour(s) and 5 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1300)
[12:54:32] <jouncebot>	 In 0 hour(s) and 5 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1300)
[12:54:39] <wikibugs>	 (03PS2) 10Ayounsi: Spicerack: add some colors [software/spicerack] - 10https://gerrit.wikimedia.org/r/924493
[12:54:46] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: s2 on clouddb1018 is OK: OK slave_sql_state not a slave https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:54:48] <MatmaRex>	 is the deployment proceeding as scheduled? or are we having some outage?
[12:54:51] <wikibugs>	 (03PS4) 10DDesouza: Deploy Research Incentive survey on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/917863 (https://phabricator.wikimedia.org/T336092)
[12:55:02] <icinga-wm>	 RECOVERY - Check systemd state on elastic2073 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:06] <icinga-wm>	 RECOVERY - Check systemd state on ganeti6001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:06] <icinga-wm>	 RECOVERY - Check systemd state on mw1460 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:10] <icinga-wm>	 RECOVERY - Check systemd state on mw2355 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:14] <icinga-wm>	 RECOVERY - Check systemd state on mw2403 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:16] <wikibugs>	 (03Merged) 10jenkins-bot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/922543 (owner: 10PipelineBot)
[12:55:18] <MatmaRex>	 there's a lot of patches schedules for this one btw. sorry about that
[12:55:22] <icinga-wm>	 RECOVERY - Check systemd state on mw2375 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:24] <icinga-wm>	 RECOVERY - Check systemd state on cp3056 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:25] <godog>	 MatmaRex: please go ahead, just monitoring spam
[12:55:26] <icinga-wm>	 RECOVERY - Check systemd state on mw1476 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:30] <icinga-wm>	 RECOVERY - Check systemd state on elastic1058 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:36] <icinga-wm>	 RECOVERY - Check systemd state on mw1466 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:48] <icinga-wm>	 PROBLEM - Check systemd state on cp1089 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:50] <icinga-wm>	 RECOVERY - Check systemd state on mw1393 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:55:56] <icinga-wm>	 RECOVERY - Check systemd state on mw1371 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:04] <icinga-wm>	 RECOVERY - Check systemd state on mw1356 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:12] <icinga-wm>	 RECOVERY - Check systemd state on lvs6002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:16] <icinga-wm>	 RECOVERY - Check systemd state on mw2363 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:18] <wikibugs>	 (03PS5) 10Muehlenhoff: Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498
[12:56:23] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] service: Disable monitors for wikireplicas [puppet] - 10https://gerrit.wikimedia.org/r/924342 (https://phabricator.wikimedia.org/T337446) (owner: 10Vgutierrez)
[12:56:26] <icinga-wm>	 RECOVERY - Check systemd state on mw1418 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:30] <icinga-wm>	 RECOVERY - Check systemd state on mw1370 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:30] <icinga-wm>	 RECOVERY - puppetboard.wikimedia.org tls expiry on puppetboard1003 is OK: OK - Certificate puppetboard.discovery.wmnet will expire on Tue 27 Jun 2023 09:36:00 AM GMT +0000. https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[12:56:34] <icinga-wm>	 RECOVERY - Check systemd state on mw1373 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:36] <icinga-wm>	 RECOVERY - Check systemd state on mw1477 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:40] <icinga-wm>	 RECOVERY - Check systemd state on mw1407 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:40] <icinga-wm>	 RECOVERY - Check systemd state on cp3058 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:40] <icinga-wm>	 RECOVERY - Check systemd state on mw2388 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:44] <icinga-wm>	 RECOVERY - Check systemd state on mw1395 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:46] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] build_envoy_deb: update to work with bookworm [puppet] - 10https://gerrit.wikimedia.org/r/924482 (owner: 10Jbond)
[12:56:50] <icinga-wm>	 RECOVERY - Check systemd state on mw2421 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:56:56] <icinga-wm>	 RECOVERY - Check systemd state on ganeti2019 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:06] <icinga-wm>	 RECOVERY - Check systemd state on install6002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:08] <icinga-wm>	 RECOVERY - Check systemd state on mw1442 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:08] <icinga-wm>	 RECOVERY - Check systemd state on lvs6003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:12] <icinga-wm>	 RECOVERY - Check systemd state on mw2335 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:14] <icinga-wm>	 RECOVERY - Check systemd state on mw2405 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:28] <icinga-wm>	 RECOVERY - Check systemd state on mw1479 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:32] <icinga-wm>	 RECOVERY - Check systemd state on mw2354 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:34] <icinga-wm>	 RECOVERY - Check systemd state on mw2410 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:38] <icinga-wm>	 PROBLEM - Check systemd state on mw2376 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:40] <icinga-wm>	 RECOVERY - Check systemd state on mw1426 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:48] <icinga-wm>	 RECOVERY - Check systemd state on mw2413 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:58] <icinga-wm>	 RECOVERY - Check systemd state on logstash1025 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:57:58] <icinga-wm>	 RECOVERY - Check systemd state on mw1440 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:58:04] <icinga-wm>	 PROBLEM - Check systemd state on zookeeper-test1002 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:58:14] <icinga-wm>	 RECOVERY - Check systemd state on mw2400 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:58:20] <icinga-wm>	 RECOVERY - Check systemd state on lvs3006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:58:36] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498 (owner: 10Muehlenhoff)
[12:58:38] <icinga-wm>	 RECOVERY - Check systemd state on ganeti6002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:58:40] <icinga-wm>	 RECOVERY - Check systemd state on mw1364 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:58:42] <icinga-wm>	 RECOVERY - Check systemd state on ganeti5006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:58:46] <icinga-wm>	 RECOVERY - Check systemd state on cp4039 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:58:52] <icinga-wm>	 RECOVERY - Check systemd state on mw2383 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:59:16] <icinga-wm>	 PROBLEM - Check systemd state on cp6016 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:59:24] <icinga-wm>	 RECOVERY - Check systemd state on mw2366 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:59:28] <icinga-wm>	 RECOVERY - Check systemd state on mw1414 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:59:29] <bblack>	 !lvs1020: restart pybal to test disabling wikireplicas monitoring
[12:59:30] <icinga-wm>	 PROBLEM - Check systemd state on mw2285 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:59:34] <icinga-wm>	 PROBLEM - Check systemd state on parse1011 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:59:46] <icinga-wm>	 RECOVERY - Check systemd state on gerrit1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:59:46] <marostegui>	 bblack: let me know if the test goes well and I can shutdown more instances
[12:59:53] <marostegui>	 bblack: more wikireplicas instances, that is
[12:59:56] <icinga-wm>	 RECOVERY - Check systemd state on mw1464 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:59:56] <icinga-wm>	 RECOVERY - Check systemd state on mw1457 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:00:00] <icinga-wm>	 RECOVERY - Check systemd state on mw2444 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:00:06] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, TheresNoTime, and taavi: It is that lovely time of the day again! You are hereby commanded to deploy UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1300).
[13:00:06] <icinga-wm>	 RECOVERY - Check systemd state on install5002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:00:06] <jouncebot>	 tgr, matthiasmullie, Daimona, HouseOfM, and MatmaRex: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:06] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1300)
[13:00:08] <icinga-wm>	 RECOVERY - Check systemd state on mw2371 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:00:08] <icinga-wm>	 RECOVERY - Check systemd state on ganeti1019 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:00:09] <matthiasmullie>	 o/ my 2 patches don't need mwdebug/testing; they only affect a not currently running maint script that I will execute later.
[13:00:12] <icinga-wm>	 RECOVERY - Check systemd state on mw1404 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:00:25] <Daimona>	 o/
[13:00:28] <icinga-wm>	 RECOVERY - Check systemd state on mw1359 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:00:30] <logmsgbot>	 !log aborrero@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
[13:00:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: (2) cadvisor.service Failed on elastic1058:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:00:44] <TheresNoTime>	 (Unable to deploy today, meeting D:)
[13:00:55] <tgr_>	 o/ my patch is a noop in production 
[13:01:12] <icinga-wm>	 RECOVERY - Check systemd state on mw2321 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:02:42] <icinga-wm>	 RECOVERY - Check systemd state on cp3063 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:02:49] <wikibugs>	 (03CR) 10Andrew Bogott: "I an yet again waiting 10 minutes for cumin runs that would take seconds with this patch." [software/cumin] - 10https://gerrit.wikimedia.org/r/869332 (https://phabricator.wikimedia.org/T325773) (owner: 10Andrew Bogott)
[13:03:44] <logmsgbot>	 !log aborrero@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
[13:04:56] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [staging] START helmfile.d/services/wikifeeds: apply
[13:06:09] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [staging] START helmfile.d/services/wikifeeds: apply
[13:06:21] <matthiasmullie>	 Any deployer around?
[13:06:22] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
[13:06:28] <icinga-wm>	 RECOVERY - Check systemd state on mw2359 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:06:34] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
[13:07:20] <TheresNoTime>	 (I'll be out of this meeting in ~30mins, but there's a lot to do, so ideally another deployer could pick this up)
[13:07:40] <matthiasmullie>	 In the interest of time, I can go ahead and self-deploy mine
[13:07:41] <wikibugs>	 (03PS1) 10Elukey: varnishkafka: add catch all systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/924506
[13:07:43] <wikibugs>	 (03PS1) 10Elukey: profile::cache::kafka: add support for PKI [puppet] - 10https://gerrit.wikimedia.org/r/924507
[13:08:00] <icinga-wm>	 RECOVERY - Check systemd state on cp4041 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:08:10] <wikibugs>	 10SRE, 10Observability-Metrics, 10Patch-For-Review, 10SRE Observability (FY2022/2023-Q4), 10User-fgiunchedi: Collect per-cgroup cpu/mem and other system level metrics - https://phabricator.wikimedia.org/T108027 (10lmata)
[13:08:19] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by mlitn@deploy1002 using scap backport" [extensions/ImageSuggestions] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924454 (owner: 10Matthias Mullie)
[13:08:21] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by mlitn@deploy1002 using scap backport" [extensions/ImageSuggestions] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924455 (owner: 10Matthias Mullie)
[13:08:27] <matthiasmullie>	 starting mine
[13:08:30] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [codfw] START helmfile.d/services/wikifeeds: apply
[13:08:30] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] varnishkafka: add catch all systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/924506 (owner: 10Elukey)
[13:08:38] <wikibugs>	 (03PS6) 10Muehlenhoff: Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498
[13:08:53] <matthiasmullie>	 tgr_: can you self-deploy? mine will take some time to pass CI - your config patch should be quick I suppose?
[13:08:53] <bblack>	 !log lvs1018: restart pybal for wikireplicas monitoring removal
[13:08:56] <wikibugs>	 (03CR) 10Volans: Openstack backend: make use of all_tenants nova api flag (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/869332 (https://phabricator.wikimedia.org/T325773) (owner: 10Andrew Bogott)
[13:08:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:03] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
[13:09:05] <tgr_>	 will do
[13:09:19] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
[13:09:35] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
[13:09:46] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tgr@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924079 (owner: 10Gergő Tisza)
[13:09:48] <logmsgbot>	 !log jgiannelos@deploy1002 helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
[13:09:57] <bblack>	 marostegui: it seems to be functioning as intended (no checks on wikireplicas)
[13:10:37] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments: Re-add $wgGERestbaseUrl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924079 (owner: 10Gergő Tisza)
[13:10:45] <jinxer-wm>	 (NodeTextfileStale) resolved: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[13:10:58] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498 (owner: 10Muehlenhoff)
[13:11:06] <logmsgbot>	 !log tgr@deploy1002 Started scap: Backport for [[gerrit:924079|GrowthExperiments: Re-add $wgGERestbaseUrl]]
[13:11:13] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
[13:11:26] <wikibugs>	 (03PS2) 10Elukey: varnishkafka: add catch all systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/924506
[13:11:28] <wikibugs>	 (03PS2) 10Elukey: profile::cache::kafka: add support for PKI [puppet] - 10https://gerrit.wikimedia.org/r/924507
[13:11:46] <icinga-wm>	 RECOVERY - Check systemd state on mw1444 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:13:05] <logmsgbot>	 !log tgr@deploy1002 tgr: Backport for [[gerrit:924079|GrowthExperiments: Re-add $wgGERestbaseUrl]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
[13:13:08] <wikibugs>	 (03PS3) 10Elukey: profile::cache::kafka: add support for PKI [puppet] - 10https://gerrit.wikimedia.org/r/924507
[13:13:32] <icinga-wm>	 RECOVERY - Check systemd state on mw2376 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:13:35] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41407/console" [puppet] - 10https://gerrit.wikimedia.org/r/924506 (owner: 10Elukey)
[13:14:49] <wikibugs>	 (03PS7) 10Muehlenhoff: Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498
[13:15:39] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41408/console" [puppet] - 10https://gerrit.wikimedia.org/r/924507 (owner: 10Elukey)
[13:17:10] <icinga-wm>	 RECOVERY - Check systemd state on mw2367 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:17:27] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498 (owner: 10Muehlenhoff)
[13:17:52] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+2 C: 03+2] WMF signup message, stray " [software/bitu] - 10https://gerrit.wikimedia.org/r/924439 (owner: 10Slyngshede)
[13:18:12] <wikibugs>	 (03PS4) 10Elukey: profile::cache::kafka: add support for PKI [puppet] - 10https://gerrit.wikimedia.org/r/924507
[13:18:19] <MatmaRex>	 tgr_: matthiasmullie: would either of you be able to deploy my patches as well afterwards?
[13:19:31] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41409/console" [puppet] - 10https://gerrit.wikimedia.org/r/924507 (owner: 10Elukey)
[13:19:32] <matthiasmullie>	 MatmaRex: sorry, I can't today (watching my 1yr old, too much of a distraction to deal with unforeseen circumstances)
[13:19:56] <marostegui>	 bblack: excellent thanks
[13:20:16] <MatmaRex>	 :)
[13:20:32] <logmsgbot>	 !log tgr@deploy1002 Finished scap: Backport for [[gerrit:924079|GrowthExperiments: Re-add $wgGERestbaseUrl]] (duration: 09m 26s)
[13:20:42] <wikibugs>	 (03PS5) 10Elukey: profile::cache::kafka: add support for PKI [puppet] - 10https://gerrit.wikimedia.org/r/924507
[13:20:48] <icinga-wm>	 RECOVERY - Check systemd state on mw1424 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:20:49] <matthiasmullie>	 but sounded like TheresNoTime may be available by the time tgr_ & me are done
[13:20:50] <tgr_>	 matthiasmullie: done
[13:20:56] <matthiasmullie>	 tgr_: rgr thanks
[13:21:07] <tgr_>	 MatmaRex: I can deploy the rest once matthiasmullie is finished
[13:21:41] <MatmaRex>	 thanks
[13:21:46] <tgr_>	 well, whatever gets in before the end of the hour, I have a meeting afterwards
[13:21:51] <wikibugs>	 (03PS1) 10BBlack: wikireplicas: restore pybal monitoring [puppet] - 10https://gerrit.wikimedia.org/r/924508 (https://phabricator.wikimedia.org/T337446)
[13:21:57] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41410/console" [puppet] - 10https://gerrit.wikimedia.org/r/924507 (owner: 10Elukey)
[13:22:36] <icinga-wm>	 RECOVERY - Check systemd state on mw2310 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:23:36] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s3 on clouddb1013 is OK: OK slave_sql_lag not a slave https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[13:23:42] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s3 on clouddb1013 is OK: OK slave_io_state not a slave https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[13:24:06] <wikibugs>	 (03Merged) 10jenkins-bot: Fix maxJobs default [extensions/ImageSuggestions] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924454 (owner: 10Matthias Mullie)
[13:24:20] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1018 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:24:28] <icinga-wm>	 RECOVERY - Check systemd state on mw2443 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:25:16] <wikibugs>	 (03Merged) 10jenkins-bot: Fix maxJobs default [extensions/ImageSuggestions] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924455 (owner: 10Matthias Mullie)
[13:25:48] <logmsgbot>	 !log mlitn@deploy1002 Started scap: Backport for [[gerrit:924454|Fix maxJobs default]], [[gerrit:924455|Fix maxJobs default]]
[13:26:36] <wikibugs>	 (03PS1) 10Elukey: Move cp4037's varnishkafka instances to PKI [puppet] - 10https://gerrit.wikimedia.org/r/924509
[13:27:16] <logmsgbot>	 !log mlitn@deploy1002 mlitn: Backport for [[gerrit:924454|Fix maxJobs default]], [[gerrit:924455|Fix maxJobs default]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
[13:27:48] <icinga-wm>	 RECOVERY - Check systemd state on cp3064 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:28:01] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41411/console" [puppet] - 10https://gerrit.wikimedia.org/r/924509 (owner: 10Elukey)
[13:28:06] <icinga-wm>	 RECOVERY - Check systemd state on mw2450 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:28:41] <wikibugs>	 (03PS8) 10Muehlenhoff: Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498
[13:29:08] <matthiasmullie>	 tgr_: syncing mine; you might want to start +2 those other patches already?
[13:29:35] <wikibugs>	 (03PS1) 10ArielGlenn: fix up sample nfs share mount command in docs for dupms nfs share testing [puppet] - 10https://gerrit.wikimedia.org/r/924510 (https://phabricator.wikimedia.org/T325232)
[13:29:47] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] editpage: Change the order of hooks slightly for FlaggedRevs [core] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924158 (https://phabricator.wikimedia.org/T337637) (owner: 10Bartosz Dziewoński)
[13:31:01] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498 (owner: 10Muehlenhoff)
[13:32:22] <icinga-wm>	 PROBLEM - Check systemd state on cumin2002 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-api-ext_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:33:27] <logmsgbot>	 !log mlitn@deploy1002 Finished scap: Backport for [[gerrit:924454|Fix maxJobs default]], [[gerrit:924455|Fix maxJobs default]] (duration: 07m 39s)
[13:33:38] <icinga-wm>	 PROBLEM - Check systemd state on puppetmaster1001 is CRITICAL: CRITICAL - degraded: The following units failed: fetch-rings-codfw.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:33:43] <matthiasmullie>	 tgr_: I;m done; the floor is all yours
[13:33:50] <tgr_>	 thx
[13:34:07] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] beta: Remove $wgCampaignEventsEnableMultipleOrganizers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/909401 (https://phabricator.wikimedia.org/T334088) (owner: 10Cmelo)
[13:34:54] <wikibugs>	 (03Merged) 10jenkins-bot: beta: Remove $wgCampaignEventsEnableMultipleOrganizers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/909401 (https://phabricator.wikimedia.org/T334088) (owner: 10Cmelo)
[13:35:14] <wikibugs>	 (03PS1) 10KartikMistry: Enable Content and Section Translation for 9 Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924511 (https://phabricator.wikimedia.org/T337290)
[13:37:26] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] cache::upload: Add hieradata to switch HTTPS redirection from Varnish to HAProxy [puppet] - 10https://gerrit.wikimedia.org/r/924444 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[13:38:27] <tgr_>	 Daimona: do you want to test the bet change or should I go on with the prod one?
[13:38:55] <tgr_>	 s/bet/beta/
[13:39:02] <Daimona>	 tgr_: thanks, I think you can go ahead with prod!
[13:39:09] <wikibugs>	 (03PS9) 10Muehlenhoff: Add a cookbook to drain a Ganeti node (WIP) [cookbooks] - 10https://gerrit.wikimedia.org/r/924498
[13:39:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tgr@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924488 (https://phabricator.wikimedia.org/T334088) (owner: 10Daimona Eaytoy)
[13:40:27] <wikibugs>	 (03Merged) 10jenkins-bot: prod: Remove $wgCampaignEventsEnableMultipleOrganizers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924488 (https://phabricator.wikimedia.org/T334088) (owner: 10Daimona Eaytoy)
[13:40:32] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[13:40:53] <logmsgbot>	 !log tgr@deploy1002 Started scap: Backport for [[gerrit:924488|prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]]
[13:40:58] <stashbot>	 T334088: Enable the multiple organizers feature in production - https://phabricator.wikimedia.org/T334088
[13:41:04] <godog>	 there will another shower of recoveries btw
[13:41:07] <godog>	 all harmless
[13:42:19] <wikibugs>	 (03PS1) 10MVernon: hira: disable swiftrepl [puppet] - 10https://gerrit.wikimedia.org/r/924516 (https://phabricator.wikimedia.org/T279637)
[13:42:27] <logmsgbot>	 !log tgr@deploy1002 tgr and daimona: Backport for [[gerrit:924488|prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
[13:42:44] <tgr_>	 Daimona: ^^
[13:42:48] <wikibugs>	 (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/924516 (https://phabricator.wikimedia.org/T279637) (owner: 10MVernon)
[13:43:12] <wikibugs>	 (03PS1) 10Muehlenhoff: Setup debmonitor2003 as bookworm debmonitor VM [puppet] - 10https://gerrit.wikimedia.org/r/924517 (https://phabricator.wikimedia.org/T241049)
[13:44:01] <Daimona>	 Thanks! HouseOfM: we can now test that the multiple organizers feature is still appearing in production wikis (testwiki, test2wiki, meta, officewiki)
[13:44:12] <icinga-wm>	 RECOVERY - Check systemd state on cp2042 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:44:24] <HouseOfM>	 Sweet, thanks @tgr
[13:44:32] <icinga-wm>	 RECOVERY - Check systemd state on cp6004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:44:44] <icinga-wm>	 RECOVERY - Check systemd state on cp6007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:44:49] <Daimona>	 You can choose any mwdebug server from the dropdown
[13:45:10] <icinga-wm>	 RECOVERY - Check systemd state on cp6014 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:45:32] <icinga-wm>	 RECOVERY - Check systemd state on cp6011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:45:50] <wikibugs>	 (03PS10) 10Muehlenhoff: Add a cookbook to drain a Ganeti node [cookbooks] - 10https://gerrit.wikimedia.org/r/924498 (https://phabricator.wikimedia.org/T203964)
[13:45:56] <wikibugs>	 (03Merged) 10jenkins-bot: editpage: Change the order of hooks slightly for FlaggedRevs [core] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924158 (https://phabricator.wikimedia.org/T337637) (owner: 10Bartosz Dziewoński)
[13:46:00] <icinga-wm>	 RECOVERY - Check systemd state on cp6015 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:46:22] <icinga-wm>	 RECOVERY - Check systemd state on cp6016 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:46:42] <icinga-wm>	 RECOVERY - Check systemd state on cp1090 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:46:49] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] hira: disable swiftrepl [puppet] - 10https://gerrit.wikimedia.org/r/924516 (https://phabricator.wikimedia.org/T279637) (owner: 10MVernon)
[13:47:04] <icinga-wm>	 RECOVERY - Check systemd state on parse1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:47:24] <Daimona>	 Looking good to me
[13:47:26] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] hira: disable swiftrepl [puppet] - 10https://gerrit.wikimedia.org/r/924516 (https://phabricator.wikimedia.org/T279637) (owner: 10MVernon)
[13:47:28] <icinga-wm>	 RECOVERY - Check systemd state on cp1089 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:48:06] <icinga-wm>	 RECOVERY - Check systemd state on cp3051 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:48:10] <icinga-wm>	 RECOVERY - Check systemd state on cp5020 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:48:28] <icinga-wm>	 RECOVERY - Check systemd state on ores1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:49:24] <HouseOfM>	 @Daimona, all good
[13:49:28] <icinga-wm>	 RECOVERY - Check systemd state on cp4044 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:49:32] <icinga-wm>	 RECOVERY - Check systemd state on cp3054 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:49:48] <Daimona>	 Cool, thanks. @tgr_ you can proceed
[13:49:50] <icinga-wm>	 RECOVERY - Check systemd state on db2113 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:50:04] <icinga-wm>	 RECOVERY - Check systemd state on cp4047 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:50:04] <icinga-wm>	 RECOVERY - Check systemd state on db2162 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:50:30] <icinga-wm>	 RECOVERY - Check systemd state on db2146 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:50:36] <icinga-wm>	 RECOVERY - Check systemd state on db1178 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:50:40] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS bullseye
[13:50:46] <icinga-wm>	 RECOVERY - Check systemd state on cp4052 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:50:46] <wikibugs>	 10SRE-swift-storage, 10Patch-For-Review: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-fe2009.codfw.wmnet with OS bullseye completed: - ms-fe2009...
[13:50:54] <icinga-wm>	 RECOVERY - Check systemd state on cp4046 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:50:54] <icinga-wm>	 RECOVERY - Check systemd state on db2158 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:51:16] <icinga-wm>	 RECOVERY - Check systemd state on cp4048 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:51:46] <icinga-wm>	 RECOVERY - Check systemd state on deploy2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:51:54] <icinga-wm>	 RECOVERY - Check systemd state on db1207 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:52:06] <icinga-wm>	 RECOVERY - Check systemd state on doh6002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:52:22] <icinga-wm>	 RECOVERY - Check systemd state on druid1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:52:24] <icinga-wm>	 RECOVERY - Check systemd state on dumpsdata1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:52:40] <icinga-wm>	 RECOVERY - Check systemd state on durum3002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:53:00] <icinga-wm>	 RECOVERY - Check systemd state on dse-k8s-worker1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:53:53] <wikibugs>	 (03CR) 10Hokwelum: [C: 03+1] "looks good :-)" [puppet] - 10https://gerrit.wikimedia.org/r/924510 (https://phabricator.wikimedia.org/T325232) (owner: 10ArielGlenn)
[13:54:22] <icinga-wm>	 RECOVERY - Check systemd state on ncredir6001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:14] <wikibugs>	 (03PS1) 10Jelto: gitlab: use production idp for gitlab hosts [puppet] - 10https://gerrit.wikimedia.org/r/924525 (https://phabricator.wikimedia.org/T320390)
[13:55:16] <icinga-wm>	 RECOVERY - Check systemd state on mw2259 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:20] <icinga-wm>	 RECOVERY - Check systemd state on mc-gp1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:26] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1068 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:28] <icinga-wm>	 RECOVERY - Check systemd state on mc-wf1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:32] <icinga-wm>	 RECOVERY - Check systemd state on parse2016 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:44] <icinga-wm>	 RECOVERY - Check systemd state on parse2005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:48] <icinga-wm>	 RECOVERY - Check systemd state on sessionstore2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:56] <icinga-wm>	 RECOVERY - Check systemd state on parse1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:58] <icinga-wm>	 RECOVERY - Check systemd state on maps2007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:55:58] <logmsgbot>	 !log mvernon@cumin2002 conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2009.codfw.wmnet
[13:56:06] <logmsgbot>	 !log mvernon@cumin2002 conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2009.codfw.wmnet
[13:56:08] <icinga-wm>	 RECOVERY - Check systemd state on mw1491 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:56:16] <icinga-wm>	 RECOVERY - Check systemd state on mw2298 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:56:32] <icinga-wm>	 RECOVERY - Check systemd state on mw2300 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:56:34] <icinga-wm>	 RECOVERY - Check systemd state on parse2013 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:56:40] <icinga-wm>	 RECOVERY - Check systemd state on cp6012 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:57:06] <logmsgbot>	 !log tgr@deploy1002 Finished scap: Backport for [[gerrit:924488|prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]] (duration: 16m 13s)
[13:57:10] <icinga-wm>	 RECOVERY - Check systemd state on snapshot1015 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:57:11] <stashbot>	 T334088: Enable the multiple organizers feature in production - https://phabricator.wikimedia.org/T334088
[13:57:22] <icinga-wm>	 RECOVERY - Check systemd state on mw2272 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:57:25] <wikibugs>	 (03CR) 10Jelto: "not sure if that makes sense, but I noticed wmcloud idp is configured as the default for all gitlab hosts in I737a9da73911f1f6f7084d909db2" [puppet] - 10https://gerrit.wikimedia.org/r/924525 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto)
[13:57:29] <tgr_>	 Daimona: deployed
[13:57:44] <icinga-wm>	 RECOVERY - Check systemd state on mw2264 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:57:44] <icinga-wm>	 RECOVERY - Check systemd state on parse2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:57:52] <icinga-wm>	 RECOVERY - Check systemd state on prometheus4002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:57:54] <icinga-wm>	 RECOVERY - Check systemd state on mw2260 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:58:02] <Daimona>	 Thanks!
[13:58:18] <icinga-wm>	 RECOVERY - Check systemd state on mw2285 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:58:21] <logmsgbot>	 !log tgr@deploy1002 Started scap: Backport for [[gerrit:924158|editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]]
[13:58:26] <stashbot>	 T337637: Duplicated edit notice about pending changes - https://phabricator.wikimedia.org/T337637
[13:58:30] <icinga-wm>	 RECOVERY - Check systemd state on mx2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:58:36] <icinga-wm>	 RECOVERY - Check systemd state on parse1016 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:58:53] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41412/console" [puppet] - 10https://gerrit.wikimedia.org/r/916522 (https://phabricator.wikimedia.org/T320390) (owner: 10Jbond)
[13:58:56] <icinga-wm>	 RECOVERY - Check systemd state on zookeeper-test1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:59:02] <wikibugs>	 (03PS6) 10Elukey: profile::cache::kafka: add support for PKI [puppet] - 10https://gerrit.wikimedia.org/r/924507
[13:59:04] <wikibugs>	 (03PS2) 10Elukey: Move cp4037's varnishkafka instances to PKI [puppet] - 10https://gerrit.wikimedia.org/r/924509
[13:59:10] <icinga-wm>	 RECOVERY - Check systemd state on netflow4002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:59:10] <icinga-wm>	 RECOVERY - Check systemd state on mw2292 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:59:20] <icinga-wm>	 RECOVERY - Check systemd state on xhgui2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:59:22] <icinga-wm>	 RECOVERY - Check systemd state on poolcounter2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:59:57] <logmsgbot>	 !log tgr@deploy1002 tgr and matmarex: Backport for [[gerrit:924158|editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
[14:00:18] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41413/console" [puppet] - 10https://gerrit.wikimedia.org/r/924509 (owner: 10Elukey)
[14:00:22] <tgr_>	 MatmaRex: ^
[14:00:22] <icinga-wm>	 RECOVERY - Check systemd state on parse1009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:00:24] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1074 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:00:26] <icinga-wm>	 RECOVERY - Check systemd state on mw1461 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:00:28] <MatmaRex>	 looking
[14:00:40] <icinga-wm>	 RECOVERY - Check systemd state on mw1488 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:00:41] <MatmaRex>	 tgr_: looks good
[14:02:21] <MatmaRex>	 i assume you're leaving after this one syncs?
[14:04:37] <tgr_>	 I can continue, I don't need to do anything during the meeting
[14:05:11] <tgr_>	 (I confused it with a different meeting)
[14:05:43] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] fix up sample nfs share mount command in docs for dupms nfs share testing [puppet] - 10https://gerrit.wikimedia.org/r/924510 (https://phabricator.wikimedia.org/T325232) (owner: 10ArielGlenn)
[14:05:50] <icinga-wm>	 RECOVERY - Check systemd state on puppetmaster1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:06:14] <MatmaRex>	 oh, i guess we're in the same meeting then, heh
[14:06:32] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:06:36] <logmsgbot>	 !log tgr@deploy1002 Finished scap: Backport for [[gerrit:924158|editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]] (duration: 08m 14s)
[14:06:36] <moritzm>	 !log installing libwebp security updates
[14:06:40] <logmsgbot>	 !log bking@deploy1002 helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
[14:06:41] <stashbot>	 T337637: Duplicated edit notice about pending changes - https://phabricator.wikimedia.org/T337637
[14:06:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:46] <wikibugs>	 (03CR) 10ArielGlenn: [C: 04-2] "prevent merge while the parent change gets reviewed and deployed" [puppet] - 10https://gerrit.wikimedia.org/r/924510 (https://phabricator.wikimedia.org/T325232) (owner: 10ArielGlenn)
[14:06:47] <MatmaRex>	 thanks
[14:07:13] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tgr@deploy1002 using scap backport" [extensions/VisualEditor] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924159 (https://phabricator.wikimedia.org/T337633) (owner: 10Bartosz Dziewoński)
[14:08:42] <logmsgbot>	 !log bking@deploy1002 helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
[14:08:44] <icinga-wm>	 RECOVERY - Check systemd state on cumin2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:11:58] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:13:57] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[14:13:59] <logmsgbot>	 !log gmodena@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[14:16:32] <jinxer-wm>	 (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:16:43] <logmsgbot>	 !log jmm@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetdb1003.eqiad.wmnet with OS bookworm
[14:16:47] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Setup an initial bookworm host pair with Puppetdb 7 - https://phabricator.wikimedia.org/T321783 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host puppetdb1003.eqiad.wmnet with OS bookworm executed with errors: - puppetdb1003 (**FA...
[14:16:54] <logmsgbot>	 !log herron@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mwlog2002.codfw.wmnet with OS bullseye
[14:17:17] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wikimediacloud.org: adjust openstack.codfw1dev FQDN [dns] - 10https://gerrit.wikimedia.org/r/924526 (https://phabricator.wikimedia.org/T336564)
[14:17:19] <wikibugs>	 (03CR) 10Hokwelum: [C: 03+1] "Checks out!" [puppet] - 10https://gerrit.wikimedia.org/r/923289 (https://phabricator.wikimedia.org/T325232) (owner: 10ArielGlenn)
[14:17:32] <wikibugs>	 (03PS17) 10JMeybohm: deployment_server: Create k8s configs with pki certs [puppet] - 10https://gerrit.wikimedia.org/r/904500 (https://phabricator.wikimedia.org/T325268)
[14:17:34] <wikibugs>	 (03PS5) 10JMeybohm: profile::imagecatalog migrate from user token to client cert [puppet] - 10https://gerrit.wikimedia.org/r/912842 (https://phabricator.wikimedia.org/T325268)
[14:17:36] <wikibugs>	 (03PS10) 10JMeybohm: prometheus::k8s: Use kubernetes::clusters_defaults [puppet] - 10https://gerrit.wikimedia.org/r/913114 (https://phabricator.wikimedia.org/T325268)
[14:17:38] <wikibugs>	 (03PS10) 10JMeybohm: prometheus::k8s switch staging-codfw to client cert auth [puppet] - 10https://gerrit.wikimedia.org/r/913149 (https://phabricator.wikimedia.org/T325268)
[14:18:12] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] Dumps: move the nfs share test conf to the right location [puppet] - 10https://gerrit.wikimedia.org/r/923289 (https://phabricator.wikimedia.org/T325232) (owner: 10ArielGlenn)
[14:19:52] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41416/console" [puppet] - 10https://gerrit.wikimedia.org/r/904500 (https://phabricator.wikimedia.org/T325268) (owner: 10JMeybohm)
[14:21:00] <wikibugs>	 (03PS1) 10Fabfur: run-puppet-restart-varnish: fix _custom_action signature [cookbooks] - 10https://gerrit.wikimedia.org/r/924527 (https://phabricator.wikimedia.org/T323557)
[14:21:20] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] fix up sample nfs share mount command in docs for dupms nfs share testing [puppet] - 10https://gerrit.wikimedia.org/r/924510 (https://phabricator.wikimedia.org/T325232) (owner: 10ArielGlenn)
[14:22:05] <MatmaRex>	 so CI is taking forever today, i guess? if we want to deploy the rest, i suggest +2-ing them all and deploying them together
[14:24:02] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] run-puppet-restart-varnish: fix _custom_action signature [cookbooks] - 10https://gerrit.wikimedia.org/r/924527 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[14:24:16] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] "LGTM, will merge today" [puppet] - 10https://gerrit.wikimedia.org/r/921437 (https://phabricator.wikimedia.org/T335637) (owner: 10Jameel Kaisar)
[14:25:02] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] "thanks! LGTM, will merge today" [puppet] - 10https://gerrit.wikimedia.org/r/923448 (https://phabricator.wikimedia.org/T337317) (owner: 10Jameel Kaisar)
[14:25:39] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM!  Ignore comment just me thinking out loud." [dns] - 10https://gerrit.wikimedia.org/r/924526 (https://phabricator.wikimedia.org/T336564) (owner: 10Arturo Borrero Gonzalez)
[14:26:00] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wikimediacloud.org: adjust openstack.codfw1dev FQDN (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/924526 (https://phabricator.wikimedia.org/T336564) (owner: 10Arturo Borrero Gonzalez)
[14:27:27] <wikibugs>	 (03Merged) 10jenkins-bot: Hide 'editnotice-notext' message in VE (and mobile apps) [extensions/VisualEditor] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924159 (https://phabricator.wikimedia.org/T337633) (owner: 10Bartosz Dziewoński)
[14:27:57] <logmsgbot>	 !log tgr@deploy1002 Started scap: Backport for [[gerrit:924159|Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]]
[14:28:02] <stashbot>	 T337633: Empty message 'editnotice-notext' is visible as an edit notice in VisualEditor and mobile apps - https://phabricator.wikimedia.org/T337633
[14:28:17] <wikibugs>	 (03CR) 10Volans: run-puppet-restart-varnish: fix _custom_action signature (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/924527 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[14:28:23] <wikibugs>	 (03CR) 10JMeybohm: "Please comment out/disable the egress zookeeper stuff for now (with a link in comment to the phab task) to make it clear that we don't use" [deployment-charts] - 10https://gerrit.wikimedia.org/r/922874 (https://phabricator.wikimedia.org/T333464) (owner: 10Ottomata)
[14:28:43] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
[14:29:33] <logmsgbot>	 !log tgr@deploy1002 matmarex and tgr: Backport for [[gerrit:924159|Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
[14:29:55] <tgr_>	 MatmaRex: ^
[14:29:56] <wikibugs>	 (03PS2) 10Fabfur: run-puppet-restart-varnish: fix _custom_action signature [cookbooks] - 10https://gerrit.wikimedia.org/r/924527 (https://phabricator.wikimedia.org/T323557)
[14:29:57] <MatmaRex>	 tgr_: looks good
[14:30:11] <MatmaRex>	 tgr_: so CI is taking forever today, i guess? if we want to deploy the rest, i suggest +2-ing them all and deploying them together
[14:30:26] <tgr_>	 yeah, slow day
[14:30:37] <wikibugs>	 (03CR) 10Fabfur: run-puppet-restart-varnish: fix _custom_action signature (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/924527 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[14:30:38] <MatmaRex>	 the wmf.11 backports don't really need additional testing
[14:31:30] <wikibugs>	 (03PS1) 10Bking: rdf-streaming-updater: Enable new flink version in CODFW [deployment-charts] - 10https://gerrit.wikimedia.org/r/924528 (https://phabricator.wikimedia.org/T334244)
[14:32:18] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] rdf-streaming-updater: Enable new flink version in CODFW [deployment-charts] - 10https://gerrit.wikimedia.org/r/924528 (https://phabricator.wikimedia.org/T334244) (owner: 10Bking)
[14:33:00] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/924527 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[14:33:02] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] run-puppet-restart-varnish: fix _custom_action signature [cookbooks] - 10https://gerrit.wikimedia.org/r/924527 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[14:34:08] <wikibugs>	 (03CR) 10Fabfur: [C: 03+2] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/924527 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[14:34:31] <wikibugs>	 (03CR) 10Bking: [C: 03+2] rdf-streaming-updater: Enable new flink version in CODFW [deployment-charts] - 10https://gerrit.wikimedia.org/r/924528 (https://phabricator.wikimedia.org/T334244) (owner: 10Bking)
[14:35:02] <icinga-wm>	 PROBLEM - Check systemd state on cumin2002 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-web_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:35:14] <wikibugs>	 (03Merged) 10jenkins-bot: rdf-streaming-updater: Enable new flink version in CODFW [deployment-charts] - 10https://gerrit.wikimedia.org/r/924528 (https://phabricator.wikimedia.org/T334244) (owner: 10Bking)
[14:35:59] <logmsgbot>	 !log tgr@deploy1002 Finished scap: Backport for [[gerrit:924159|Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]] (duration: 08m 01s)
[14:36:04] <stashbot>	 T337633: Empty message 'editnotice-notext' is visible as an edit notice in VisualEditor and mobile apps - https://phabricator.wikimedia.org/T337633
[14:36:34] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (DELETE pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:36:42] <wikibugs>	 (03Merged) 10jenkins-bot: run-puppet-restart-varnish: fix _custom_action signature [cookbooks] - 10https://gerrit.wikimedia.org/r/924527 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[14:37:17] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tgr@deploy1002 using scap backport" [extensions/VisualEditor] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924160 (https://phabricator.wikimedia.org/T337638) (owner: 10Bartosz Dziewoński)
[14:37:23] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tgr@deploy1002 using scap backport" [extensions/VisualEditor] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924456 (https://phabricator.wikimedia.org/T337633) (owner: 10Bartosz Dziewoński)
[14:37:29] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tgr@deploy1002 using scap backport" [extensions/VisualEditor] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924458 (https://phabricator.wikimedia.org/T337638) (owner: 10Bartosz Dziewoński)
[14:37:35] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "nice ideas but wont work as expected currently" [software/spicerack] - 10https://gerrit.wikimedia.org/r/924493 (owner: 10Ayounsi)
[14:38:14] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:41:34] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (DELETE pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:41:39] <wikibugs>	 10SRE, 10Release-Engineering-Team, 10Security-Team, 10Wikimedia-GitHub, and 2 others: Add github.com/wikimedia as an SCM for Semgrep Cloud - https://phabricator.wikimedia.org/T337561 (10sbassett) >>! In T337561#8883334, @Dzahn wrote: > Let's keep access requests on tickets and not in ad-hoc chats.  Yep, th...
[14:41:49] <wikibugs>	 (03CR) 10Jbond: "lgtm, minor nit" [puppet] - 10https://gerrit.wikimedia.org/r/924506 (owner: 10Elukey)
[14:43:02] <wikibugs>	 (03PS8) 10Ottomata: flink-operator - deploy in wikikube eqiad and codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/922874 (https://phabricator.wikimedia.org/T333464)
[14:43:13] <wikibugs>	 (03CR) 10Ottomata: flink-operator - deploy in wikikube eqiad and codfw (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/922874 (https://phabricator.wikimedia.org/T333464) (owner: 10Ottomata)
[14:44:32] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cloudlb: make it aware of cloudcontrol2005-dev [puppet] - 10https://gerrit.wikimedia.org/r/924533 (https://phabricator.wikimedia.org/T336564)
[14:45:17] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudlb: make it aware of cloudcontrol2005-dev [puppet] - 10https://gerrit.wikimedia.org/r/924533 (https://phabricator.wikimedia.org/T336564) (owner: 10Arturo Borrero Gonzalez)
[14:46:33] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
[14:49:45] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
[14:50:31] <moritzm>	 !log installing texlive-bin security updates
[14:50:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:49] <wikibugs>	 (03PS1) 10Herron: reuse-lvm-root-4dev: add grub-installer/bootdev config [puppet] - 10https://gerrit.wikimedia.org/r/924535 (https://phabricator.wikimedia.org/T333614)
[14:52:26] <wikibugs>	 (03CR) 10Herron: [C: 03+2] "self-merging since this was live tested and fixed mwlog2002 reimage (see bug)" [puppet] - 10https://gerrit.wikimedia.org/r/924535 (https://phabricator.wikimedia.org/T333614) (owner: 10Herron)
[14:56:16] <wikibugs>	 (03PS3) 10Fabfur: cache::upload: Add hieradata to switch HTTPS redirection from Varnish to HAProxy only on host c2042.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/924444 (https://phabricator.wikimedia.org/T323557)
[14:56:32] <wikibugs>	 (03PS1) 10Kimberly Sarabia: Turn on A/B Test Hebrew [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924536 (https://phabricator.wikimedia.org/T336969)
[14:56:38] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] cache::upload: Add hieradata to switch HTTPS redirection from Varnish to HAProxy only on host c2042.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/924444 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[14:57:44] <wikibugs>	 (03PS4) 10Fabfur: cache::upload: Add hieradata to switch HTTPS redirection from Varnish to HAProxy only on host cp2042 [puppet] - 10https://gerrit.wikimedia.org/r/924444 (https://phabricator.wikimedia.org/T323557)
[14:58:09] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] cache::upload: Add hieradata to switch HTTPS redirection from Varnish to HAProxy only on host cp2042 [puppet] - 10https://gerrit.wikimedia.org/r/924444 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[14:58:38] <wikibugs>	 (03PS5) 10Fabfur: cache::upload: Switch HTTPS redirection from Varnish to HAProxy only on cp2042 [puppet] - 10https://gerrit.wikimedia.org/r/924444 (https://phabricator.wikimedia.org/T323557)
[14:58:51] <wikibugs>	 10SRE, 10ops-eqiad: Move two GPUs from Hadoop to Lift Wing - https://phabricator.wikimedia.org/T335031 (10elukey) @Jclark-ctr sorryyyy didn't see the ping :(  Lemme know if you have time in these days or next week, thanks a lot! The caveat is that we'd need to move 2 GPUs from a dse-k8s-worker node, not from H...
[14:59:31] <wikibugs>	 (03Merged) 10jenkins-bot: ve.ui.MWGalleryDialog: Fix showing the search panel [extensions/VisualEditor] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924160 (https://phabricator.wikimedia.org/T337638) (owner: 10Bartosz Dziewoński)
[14:59:33] <wikibugs>	 (03Merged) 10jenkins-bot: Hide 'editnotice-notext' message in VE (and mobile apps) [extensions/VisualEditor] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924456 (https://phabricator.wikimedia.org/T337633) (owner: 10Bartosz Dziewoński)
[14:59:37] <wikibugs>	 (03Merged) 10jenkins-bot: ve.ui.MWGalleryDialog: Fix showing the search panel [extensions/VisualEditor] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924458 (https://phabricator.wikimedia.org/T337638) (owner: 10Bartosz Dziewoński)
[15:00:08] <logmsgbot>	 !log tgr@deploy1002 Started scap: Backport for [[gerrit:924160|ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456|Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458|ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]]
[15:00:20] <stashbot>	 T337638: Gallery creation not functional in VisualEditor - https://phabricator.wikimedia.org/T337638
[15:00:21] <stashbot>	 T337633: Empty message 'editnotice-notext' is visible as an edit notice in VisualEditor and mobile apps - https://phabricator.wikimedia.org/T337633
[15:00:39] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "This chan" [puppet] - 10https://gerrit.wikimedia.org/r/904500 (https://phabricator.wikimedia.org/T325268) (owner: 10JMeybohm)
[15:02:04] <logmsgbot>	 !log tgr@deploy1002 tgr and matmarex: Backport for [[gerrit:924160|ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456|Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458|ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
[15:02:18] <tgr_>	 MatmaRex: ^ last one
[15:02:22] <wikibugs>	 10SRE, 10Znuny, 10serviceops-collab: Puppet template for /etc/clamav/clamd.conf needs to be updated - https://phabricator.wikimedia.org/T330129 (10Arnoldokoth) 05Open→03Resolved
[15:02:36] <MatmaRex>	 wmf.10 gallery backport looks good
[15:03:13] <MatmaRex>	 wmf.11 looks good too
[15:03:34] <wikibugs>	 (03CR) 10Jbond: "see inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/924498 (https://phabricator.wikimedia.org/T203964) (owner: 10Muehlenhoff)
[15:03:34] <logmsgbot>	 !log bking@deploy1002 helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
[15:03:40] <MatmaRex>	 tgr_: all good
[15:05:10] <logmsgbot>	 !log bking@deploy1002 helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
[15:06:11] <wikibugs>	 (03PS3) 10Elukey: varnishkafka: add catch all systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/924506
[15:06:13] <wikibugs>	 (03PS7) 10Elukey: profile::cache::kafka: add support for PKI [puppet] - 10https://gerrit.wikimedia.org/r/924507
[15:06:15] <wikibugs>	 (03PS3) 10Elukey: Move cp4037's varnishkafka instances to PKI [puppet] - 10https://gerrit.wikimedia.org/r/924509
[15:07:49] <jinxer-wm>	 (WcqsStreamingUpdaterFlinkJobNotRunning) firing: WCQS_Streaming_Updater in codfw (k8s) is not running - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DWcqsStreamingUpdaterFlinkJobNotRunning
[15:08:10] <wikibugs>	 (03PS4) 10Elukey: varnishkafka: add catch all systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/924506
[15:08:12] <wikibugs>	 (03PS8) 10Elukey: profile::cache::kafka: add support for PKI [puppet] - 10https://gerrit.wikimedia.org/r/924507
[15:08:14] <wikibugs>	 (03PS4) 10Elukey: Move cp4037's varnishkafka instances to PKI [puppet] - 10https://gerrit.wikimedia.org/r/924509
[15:08:17] <logmsgbot>	 !log tgr@deploy1002 Finished scap: Backport for [[gerrit:924160|ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456|Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458|ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]] (duration: 08m 08s)
[15:08:23] <stashbot>	 T337638: Gallery creation not functional in VisualEditor - https://phabricator.wikimedia.org/T337638
[15:08:23] <stashbot>	 T337633: Empty message 'editnotice-notext' is visible as an edit notice in VisualEditor and mobile apps - https://phabricator.wikimedia.org/T337633
[15:09:07] <wikibugs>	 (03PS5) 10Elukey: varnishkafka: add catch all systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/924506
[15:09:09] <wikibugs>	 (03PS9) 10Elukey: profile::cache::kafka: add support for PKI [puppet] - 10https://gerrit.wikimedia.org/r/924507
[15:09:11] <wikibugs>	 (03PS5) 10Elukey: Move cp4037's varnishkafka instances to PKI [puppet] - 10https://gerrit.wikimedia.org/r/924509
[15:09:34] <tgr_>	 deployed, logs look good.
[15:09:49] <jinxer-wm>	 (WdqsStreamingUpdaterFlinkJobNotRunning) firing: WDQS_Streaming_Updater in codfw (k8s) is not running - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DWdqsStreamingUpdaterFlinkJobNotRunning
[15:09:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkJobUnstable) firing: (2) WCQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[15:09:59] <wikibugs>	 (03CR) 10Elukey: varnishkafka: add catch all systemd unit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/924506 (owner: 10Elukey)
[15:10:00] <tgr_>	 !log UTC evening deploys done
[15:10:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:37] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41419/console" [puppet] - 10https://gerrit.wikimedia.org/r/924506 (owner: 10Elukey)
[15:13:08] <wikibugs>	 (03PS1) 10Hokwelum: Rename nfs_settings dir to nfs_testing and move nfs test files into nfs test dir [puppet] - 10https://gerrit.wikimedia.org/r/924542
[15:14:07] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] .gitmessage: add Hosts: line [puppet] - 10https://gerrit.wikimedia.org/r/924438 (owner: 10Volans)
[15:14:30] <logmsgbot>	 !log aborrero@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
[15:14:49] <jinxer-wm>	 (WdqsStreamingUpdaterFlinkJobNotRunning) resolved: WDQS_Streaming_Updater in codfw (k8s) is not running - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DWdqsStreamingUpdaterFlinkJobNotRunning
[15:14:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkJobUnstable) firing: (2) WCQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[15:15:35] <logmsgbot>	 !log aborrero@cumin2002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
[15:15:36] <logmsgbot>	 !log aborrero@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
[15:15:43] <wikibugs>	 10SRE, 10ops-codfw, 10cloud-services-team (FY2022/2023-Q4): cloudcontrol2005-dev: make it a cloudlb backend - https://phabricator.wikimedia.org/T336564 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye complete...
[15:16:35] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/924506 (owner: 10Elukey)
[15:19:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkJobUnstable) resolved: (2) WCQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[15:20:04] <icinga-wm>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 71, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:20:51] <wikibugs>	 10SRE, 10Traffic, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm)
[15:21:09] <wikibugs>	 10SRE, 10Traffic, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm)
[15:22:02] <MatmaRex>	 (thanks tgr_)
[15:23:06] <icinga-wm>	 PROBLEM - BGP status on cr2-eqord is CRITICAL: BGP CRITICAL - AS13030/IPv4: Connect - Init7, AS13030/IPv6: Connect - Init7 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:24:19] <jinxer-wm>	 (WcqsStreamingUpdaterFlinkJobNotRunning) resolved: WCQS_Streaming_Updater in codfw (k8s) is not running - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DWcqsStreamingUpdaterFlinkJobNotRunning
[15:24:42] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:28:29] <wikibugs>	 (03PS1) 10Elukey: ml-services: add autoscaling capabilities to revert risk la [deployment-charts] - 10https://gerrit.wikimedia.org/r/924544
[15:28:31] <wikibugs>	 (03PS1) 10Elukey: services: raise auth-users rate limit for Lift Wing in the API Gateway [deployment-charts] - 10https://gerrit.wikimedia.org/r/924545
[15:28:42] <wikibugs>	 10SRE, 10Observability-Metrics, 10User-fgiunchedi: Extend router ACLs to block 4194/tcp on LVSes - https://phabricator.wikimedia.org/T337689 (10fgiunchedi) 05Open→03Resolved >>! In T337689#8887689, @ayounsi wrote: > What I pushed is an extra safeguard, but a more viable fix is to have the daemon listen o...
[15:28:44] <wikibugs>	 10SRE, 10Observability-Metrics, 10Patch-For-Review, 10SRE Observability (FY2022/2023-Q4), 10User-fgiunchedi: Collect per-cgroup cpu/mem and other system level metrics - https://phabricator.wikimedia.org/T108027 (10fgiunchedi)
[15:29:09] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ml-services: add autoscaling capabilities to revert risk la [deployment-charts] - 10https://gerrit.wikimedia.org/r/924544 (owner: 10Elukey)
[15:29:19] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] services: raise auth-users rate limit for Lift Wing in the API Gateway [deployment-charts] - 10https://gerrit.wikimedia.org/r/924545 (owner: 10Elukey)
[15:32:12] <icinga-wm>	 RECOVERY - Check systemd state on cumin2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:35:45] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] flink-operator - deploy in wikikube eqiad and codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/922874 (https://phabricator.wikimedia.org/T333464) (owner: 10Ottomata)
[15:35:52] <wikibugs>	 (03PS2) 10Elukey: ml-services: add autoscaling capabilities to revert risk la [deployment-charts] - 10https://gerrit.wikimedia.org/r/924544
[15:35:54] <wikibugs>	 (03PS2) 10Elukey: services: raise auth-users rate limit for Lift Wing in the API Gateway [deployment-charts] - 10https://gerrit.wikimedia.org/r/924545
[15:36:48] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] flink-operator - deploy in wikikube eqiad and codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/922874 (https://phabricator.wikimedia.org/T333464) (owner: 10Ottomata)
[15:36:54] <wikibugs>	 (03CR) 10Clément Goubert: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/924494 (https://phabricator.wikimedia.org/T337490) (owner: 10Clément Goubert)
[15:37:32] <wikibugs>	 (03PS6) 10Clément Goubert: mw-on-k8s: Redirect www.mediawiki.org to mw-on-k8s [puppet] - 10https://gerrit.wikimedia.org/r/923385 (https://phabricator.wikimedia.org/T337490)
[15:37:48] <wikibugs>	 (03PS10) 10Clément Goubert: mw-on-k8s: Redirect closed wikis to mw-on-k8s [puppet] - 10https://gerrit.wikimedia.org/r/923386 (https://phabricator.wikimedia.org/T337490)
[15:38:19] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/923385 (https://phabricator.wikimedia.org/T337490) (owner: 10Clément Goubert)
[15:38:29] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/923386 (https://phabricator.wikimedia.org/T337490) (owner: 10Clément Goubert)
[15:39:08] <wikibugs>	 (03Merged) 10jenkins-bot: flink-operator - deploy in wikikube eqiad and codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/922874 (https://phabricator.wikimedia.org/T333464) (owner: 10Ottomata)
[15:40:06] <wikibugs>	 10SRE, 10serviceops, 10CommRel-Specialists-Support (Apr-Jun-2023), 10Datacenter-Switchover, 10User-notice: CommRel support for April 2023 Datacenter Switchback - https://phabricator.wikimedia.org/T334671 (10akosiaris) @Trizek-WMF, should we resolve this?
[15:40:14] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+1] "👌" [deployment-charts] - 10https://gerrit.wikimedia.org/r/924545 (owner: 10Elukey)
[15:40:50] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[15:42:42] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+1] ml-services: add autoscaling capabilities to revert risk la [deployment-charts] - 10https://gerrit.wikimedia.org/r/924544 (owner: 10Elukey)
[15:43:53] <wikibugs>	 (03CR) 10AikoChou: [C: 03+1] ml-services: add autoscaling capabilities to revert risk la [deployment-charts] - 10https://gerrit.wikimedia.org/r/924544 (owner: 10Elukey)
[15:44:05] <wikibugs>	 (03CR) 10AikoChou: [C: 03+1] services: raise auth-users rate limit for Lift Wing in the API Gateway [deployment-charts] - 10https://gerrit.wikimedia.org/r/924545 (owner: 10Elukey)
[15:45:57] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] Rename nfs_settings dir to nfs_testing and move nfs test files into nfs test dir [puppet] - 10https://gerrit.wikimedia.org/r/924542 (owner: 10Hokwelum)
[15:46:09] <wikibugs>	 10SRE, 10serviceops, 10CommRel-Specialists-Support (Apr-Jun-2023), 10Datacenter-Switchover, 10User-notice: CommRel support for April 2023 Datacenter Switchback - https://phabricator.wikimedia.org/T334671 (10Trizek-WMF) 05In progress→03Resolved
[15:46:12] <wikibugs>	 10SRE, 10Data-Persistence, 10serviceops, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Trizek-WMF)
[15:49:33] <logmsgbot>	 !log otto@deploy1002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[15:49:37] <logmsgbot>	 !log otto@deploy1002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[15:51:34] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[15:51:41] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[15:51:55] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog2002.codfw.wmnet with OS bullseye
[15:52:30] <icinga-wm>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 71, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:53:14] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[15:54:04] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[15:54:24] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv6: Idle - HE, AS6939/IPv4: Idle - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:54:31] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[15:54:39] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[15:55:00] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] services: raise auth-users rate limit for Lift Wing in the API Gateway [deployment-charts] - 10https://gerrit.wikimedia.org/r/924545 (owner: 10Elukey)
[15:55:12] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[15:55:26] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] ml-services: add autoscaling capabilities to revert risk la [deployment-charts] - 10https://gerrit.wikimedia.org/r/924544 (owner: 10Elukey)
[15:55:34] <icinga-wm>	 PROBLEM - Docker registry HTTPS interface on registry1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Docker
[15:56:00] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[15:56:58] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[15:56:58] <icinga-wm>	 RECOVERY - Docker registry HTTPS interface on registry1003 is OK: HTTP OK: HTTP/1.1 200 OK - 3754 bytes in 0.179 second response time https://wikitech.wikimedia.org/wiki/Docker
[15:57:08] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:57:15] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[15:58:04] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[15:58:10] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[15:58:17] <wikibugs>	 (03PS1) 10JMeybohm: Revert: Ratelimit a hotlink saturation case [puppet] - 10https://gerrit.wikimedia.org/r/924550
[15:58:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert: Ratelimit a hotlink saturation case [puppet] - 10https://gerrit.wikimedia.org/r/924550 (owner: 10JMeybohm)
[15:58:41] <wikibugs>	 (03PS2) 10Urbanecm: [Growth] Enable user impact refresh on 10 more wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924053 (https://phabricator.wikimedia.org/T336203)
[15:58:46] <urbanecm>	 jouncebot: nowandnext
[15:58:46] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 1 minute(s)
[15:58:46] <jouncebot>	 In 0 hour(s) and 1 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1600)
[15:59:05] <wikibugs>	 (03PS2) 10JMeybohm: Revert: Ratelimit a hotlink saturation case [puppet] - 10https://gerrit.wikimedia.org/r/924550
[15:59:27] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert: Ratelimit a hotlink saturation case [puppet] - 10https://gerrit.wikimedia.org/r/924550 (owner: 10JMeybohm)
[16:00:02] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[16:00:06] <jouncebot>	 jbond and rzl: May I have your attention please! Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1600)
[16:00:06] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[16:00:52] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[16:04:29] <rzl>	 urbanecm: nothing planned for the puppet window today, all yours if you need it :)
[16:05:10] <urbanecm>	 thanks!
[16:05:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924053 (https://phabricator.wikimedia.org/T336203) (owner: 10Urbanecm)
[16:06:58] <wikibugs>	 (03Merged) 10jenkins-bot: [Growth] Enable user impact refresh on 10 more wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924053 (https://phabricator.wikimedia.org/T336203) (owner: 10Urbanecm)
[16:07:26] <logmsgbot>	 !log urbanecm@deploy1002 Started scap: Backport for [[gerrit:924053|[Growth] Enable user impact refresh on 10 more wikis (T336203)]]
[16:07:31] <stashbot>	 T336203: Positive reinforcement: Deploy the new Impact module to all Wikipedias - https://phabricator.wikimedia.org/T336203
[16:14:34] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:924053|[Growth] Enable user impact refresh on 10 more wikis (T336203)]] (duration: 07m 08s)
[16:14:39] <stashbot>	 T336203: Positive reinforcement: Deploy the new Impact module to all Wikipedias - https://phabricator.wikimedia.org/T336203
[16:15:28] <urbanecm>	 rzl: would it be possible to start the growthexperiments-userImpactUpdateRecentlyRegistered and growthexperiments-userImpactUpdateRecentlyEdited jobs at mwmaint1002 before the timer kicks in? if it's too problematic, i can run it in a tmux too.
[16:15:34] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (POST pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[16:17:15] <rzl>	 urbanecm: sure thing - ready for it now?
[16:17:26] <urbanecm>	 yes.
[16:19:19] <rzl>	 !log rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyRegistered
[16:19:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:00] <wikibugs>	 (03PS1) 10Herron: mwlog: add remove_python2_on_bullseye exemption [puppet] - 10https://gerrit.wikimedia.org/r/924555 (https://phabricator.wikimedia.org/T333614)
[16:20:14] <rzl>	 !log rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyEdited
[16:20:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:34] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (POST pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[16:21:35] <urbanecm>	 thanks rzl!
[16:21:52] <rzl>	 no worries! RecentlyEdited is still running, up to the Ts now
[16:22:05] <rzl>	 there we go, both done
[16:22:07] <wikibugs>	 (03CR) 10Herron: [V: 03+1] "PCC SUCCESS (DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41422/console" [puppet] - 10https://gerrit.wikimedia.org/r/924555 (https://phabricator.wikimedia.org/T333614) (owner: 10Herron)
[16:23:34] <wikibugs>	 (03PS18) 10Cwhite: prometheus: generate swagger targets from service catalog [puppet] - 10https://gerrit.wikimedia.org/r/916914 (https://phabricator.wikimedia.org/T320620)
[16:24:29] <wikibugs>	 (03CR) 10Herron: [V: 03+1 C: 03+2] "self-merging to complete the upgrade to bullseye.  we should revisit this for a longer term fix" [puppet] - 10https://gerrit.wikimedia.org/r/924555 (https://phabricator.wikimedia.org/T333614) (owner: 10Herron)
[16:32:32] <wikibugs>	 (03PS1) 10Herron: udp2log: dont use python symlink [puppet] - 10https://gerrit.wikimedia.org/r/924557 (https://phabricator.wikimedia.org/T333614)
[16:33:15] <wikibugs>	 (03PS2) 10Herron: udp2log: dont use python symlink [puppet] - 10https://gerrit.wikimedia.org/r/924557 (https://phabricator.wikimedia.org/T333614)
[16:33:38] <wikibugs>	 (03PS1) 10Hokwelum: create  mount point dir [puppet] - 10https://gerrit.wikimedia.org/r/924558
[16:34:48] <wikibugs>	 (03CR) 10Herron: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41423/console" [puppet] - 10https://gerrit.wikimedia.org/r/924557 (https://phabricator.wikimedia.org/T333614) (owner: 10Herron)
[16:35:33] <wikibugs>	 (03CR) 10Herron: [V: 03+1 C: 03+2] "self-merging to complete host reimage" [puppet] - 10https://gerrit.wikimedia.org/r/924557 (https://phabricator.wikimedia.org/T333614) (owner: 10Herron)
[16:36:02] <wikibugs>	 (03PS2) 10Hokwelum: create  mount point dir [puppet] - 10https://gerrit.wikimedia.org/r/924558 (https://phabricator.wikimedia.org/T325232)
[16:38:10] <wikibugs>	 (03PS2) 10Urbanecm: [Growth] Enable new Impact for 10 additional wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924060 (https://phabricator.wikimedia.org/T336203)
[16:38:59] <wikibugs>	 (03PS19) 10Cwhite: prometheus: generate swagger targets from service catalog [puppet] - 10https://gerrit.wikimedia.org/r/916914 (https://phabricator.wikimedia.org/T320620)
[16:41:32] <wikibugs>	 (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/924135
[16:46:26] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-product-users for KCVelaga (WMF) - https://phabricator.wikimedia.org/T337766 (10KCVelaga_WMF)
[16:51:30] <wikibugs>	 (03PS1) 10BCornwall: pybal: Switch codfw LVS to use Maglev scheduler [puppet] - 10https://gerrit.wikimedia.org/r/924559 (https://phabricator.wikimedia.org/T263797)
[16:51:48] <sukhe>	 jouncebot: now
[16:51:48] <jouncebot>	 For the next 0 hour(s) and 8 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1600)
[16:51:52] <sukhe>	 jouncebot: nowandnext
[16:51:52] <jouncebot>	 For the next 0 hour(s) and 8 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1600)
[16:51:52] <jouncebot>	 In 0 hour(s) and 8 minute(s): MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1700)
[16:53:48] <wikibugs>	 (03PS1) 10Ssingh: depool codfw (emergency patch, do not merge) [dns] - 10https://gerrit.wikimedia.org/r/924561 (https://phabricator.wikimedia.org/T263797)
[17:00:05] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1700)
[17:03:58] <wikibugs>	 (03CR) 10Cwhite: "PCC OK: https://puppet-compiler.wmflabs.org/output/916914/41424/" [puppet] - 10https://gerrit.wikimedia.org/r/916914 (https://phabricator.wikimedia.org/T320620) (owner: 10Cwhite)
[17:10:10] <wikibugs>	 (03PS6) 10Ilias Sarantopoulos: ORES: add model versions configuration and thresholds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/922512 (https://phabricator.wikimedia.org/T319170)
[17:12:38] <wikibugs>	 (03PS1) 10Zabe: Start reading from rev_comment_id in group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924564 (https://phabricator.wikimedia.org/T299954)
[17:17:22] <wikibugs>	 (03PS3) 10Cwhite: team-sre: add openapi/swagger alerts [alerts] - 10https://gerrit.wikimedia.org/r/918547 (https://phabricator.wikimedia.org/T320620)
[17:19:55] <wikibugs>	 (03CR) 10Cwhite: team-sre: add openapi/swagger alerts (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/918547 (https://phabricator.wikimedia.org/T320620) (owner: 10Cwhite)
[17:24:57] <wikibugs>	 10SRE, 10Release-Engineering-Team, 10Security-Team, 10Wikimedia-GitHub, and 3 others: Add github.com/wikimedia as an SCM for Semgrep Cloud - https://phabricator.wikimedia.org/T337561 (10sbassett)
[17:26:09] <wikibugs>	 (03PS2) 10Jforrester: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/924135 (https://phabricator.wikimedia.org/T337464) (owner: 10PipelineBot)
[17:33:33] <wikibugs>	 (03PS3) 10Hokwelum: create mount point dir for dumps test nfs share [puppet] - 10https://gerrit.wikimedia.org/r/924558 (https://phabricator.wikimedia.org/T325232)
[17:38:36] <wikibugs>	 (03PS4) 10Hokwelum: create mount point dir for dumps test nfs share [puppet] - 10https://gerrit.wikimedia.org/r/924558 (https://phabricator.wikimedia.org/T325232)
[17:42:20] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] create mount point dir for dumps test nfs share [puppet] - 10https://gerrit.wikimedia.org/r/924558 (https://phabricator.wikimedia.org/T325232) (owner: 10Hokwelum)
[17:45:29] <mutante>	 !log re-enabling puppet on contint2001
[17:45:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:55:06] <wikibugs>	 10ops-drmrs, 10DC-Ops, 10decommission-hardware, 10SRE Observability (FY2022/2023-Q4): Decommission prometheus6001 - https://phabricator.wikimedia.org/T335588 (10RobH) 05Open→03Resolved So VMs don't need/warrant a hardware decom ticket, resolving.
[17:55:08] <wikibugs>	 10SRE, 10ops-ulsfo, 10DC-Ops, 10decommission-hardware, 10SRE Observability (FY2022/2023-Q4): Decommission prometheus4001 - https://phabricator.wikimedia.org/T335585 (10RobH) 05Open→03Resolved So VMs don't need/warrant a hardware decom ticket, resolving.
[17:56:50] <wikibugs>	 (03PS3) 10Urbanecm: [Growth] Enable new Impact for 10 additional wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924060 (https://phabricator.wikimedia.org/T336203)
[17:58:48] <wikibugs>	 (03PS2) 10Dzahn: releases: Ensure rsync jobs get removed on the non-active machine [puppet] - 10https://gerrit.wikimedia.org/r/924085 (https://phabricator.wikimedia.org/T334435) (owner: 10EoghanGaffney)
[17:59:08] <wikibugs>	 (03CR) 10Dzahn: "nitpick: please start commit message with the module name, so like "releases: " in this case." [puppet] - 10https://gerrit.wikimedia.org/r/924085 (https://phabricator.wikimedia.org/T334435) (owner: 10EoghanGaffney)
[18:00:05] <jouncebot>	 dduvall and ^demon: gettimeofday() says it's time for MediaWiki train - Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1800)
[18:01:48] <wikibugs>	 (03PS1) 10Fabfur: run-puppet-restart-varnish: Add dry_run support to check function [cookbooks] - 10https://gerrit.wikimedia.org/r/924590 (https://phabricator.wikimedia.org/T323557)
[18:04:08] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] run-puppet-restart-varnish: Add dry_run support to check function [cookbooks] - 10https://gerrit.wikimedia.org/r/924590 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[18:06:41] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "This is flipped around from what it should be (and how it was on doc hosts, which is why this is confusing). What is supposed to happen he" [puppet] - 10https://gerrit.wikimedia.org/r/924085 (https://phabricator.wikimedia.org/T334435) (owner: 10EoghanGaffney)
[18:07:44] <wikibugs>	 (03PS20) 10Eevans: cassandra: add support for version 4.1.1 [puppet] - 10https://gerrit.wikimedia.org/r/913265 (https://phabricator.wikimedia.org/T313814)
[18:08:13] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] cassandra: add support for version 4.1.1 [puppet] - 10https://gerrit.wikimedia.org/r/913265 (https://phabricator.wikimedia.org/T313814) (owner: 10Eevans)
[18:10:09] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/output/921244/41426/" [puppet] - 10https://gerrit.wikimedia.org/r/921244 (owner: 10EoghanGaffney)
[18:10:59] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "should have linked to T336168.. oh well." [puppet] - 10https://gerrit.wikimedia.org/r/921244 (owner: 10EoghanGaffney)
[18:11:37] <wikibugs>	 (03PS2) 10Fabfur: run-puppet-restart-varnish: Add dry_run support to check function [cookbooks] - 10https://gerrit.wikimedia.org/r/924590 (https://phabricator.wikimedia.org/T323557)
[18:13:48] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "this was no-op on all hosts except doc1003, there it changed IPs to host names in ferm rules. ferm reloaded just fine. no issues." [puppet] - 10https://gerrit.wikimedia.org/r/921244 (owner: 10EoghanGaffney)
[18:14:04] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] run-puppet-restart-varnish: Add dry_run support to check function [cookbooks] - 10https://gerrit.wikimedia.org/r/924590 (https://phabricator.wikimedia.org/T323557) (owner: 10Fabfur)
[18:18:33] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 wikis to 1.41.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924591 (https://phabricator.wikimedia.org/T337525)
[18:18:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group0 wikis to 1.41.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924591 (https://phabricator.wikimedia.org/T337525) (owner: 10TrainBranchBot)
[18:19:21] <wikibugs>	 (03Merged) 10jenkins-bot: group0 wikis to 1.41.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924591 (https://phabricator.wikimedia.org/T337525) (owner: 10TrainBranchBot)
[18:23:51] <wikibugs>	 (03PS3) 10Fabfur: run-puppet-restart-varnish: Add dry_run support to check function [cookbooks] - 10https://gerrit.wikimedia.org/r/924590 (https://phabricator.wikimedia.org/T323557)
[18:27:15] <logmsgbot>	 !log dduvall@deploy1002 rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.11  refs T337525
[18:27:20] <stashbot>	 T337525: 1.41.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T337525
[18:30:12] <wikibugs>	 (03CR) 10JHathaway: puppetmaster: add new function to check for local files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/922877 (https://phabricator.wikimedia.org/T268344) (owner: 10Jbond)
[18:30:28] <wikibugs>	 (03PS1) 10Hokwelum: README updated with info on how to create the mount point subdir [puppet] - 10https://gerrit.wikimedia.org/r/924592
[18:31:32] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/924517 (https://phabricator.wikimedia.org/T241049) (owner: 10Muehlenhoff)
[18:36:26] <wikibugs>	 (03CR) 10Majavah: puppetmaster: add new function to check for local files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/922877 (https://phabricator.wikimedia.org/T268344) (owner: 10Jbond)
[18:40:20] <wikibugs>	 (03CR) 10Dzahn: "for some reason I can access the Internet without going through PROXY and it's not obvious in ferm rules why.. it is in iptables -L though" [puppet] - 10https://gerrit.wikimedia.org/r/902513 (owner: 10Dzahn)
[18:41:14] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s3 on clouddb1017 is OK: OK slave_sql_lag not a slave https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[18:41:48] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s3 on clouddb1017 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[18:42:09] <wikibugs>	 (03PS2) 10Hokwelum: README updated with info on how to create the mount point subdir [puppet] - 10https://gerrit.wikimedia.org/r/924592
[18:43:25] <wikibugs>	 (03PS3) 10Hokwelum: README updated with info on how to create the mount point subdir [puppet] - 10https://gerrit.wikimedia.org/r/924592 (https://phabricator.wikimedia.org/T325232)
[18:43:30] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s1 on clouddb1013 is OK: OK slave_io_state not a slave https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[18:52:18] <wikibugs>	 (03PS4) 10Hokwelum: README updated with info on how to create the mount point subdir [puppet] - 10https://gerrit.wikimedia.org/r/924592 (https://phabricator.wikimedia.org/T325232)
[18:57:03] <wikibugs>	 (03CR) 10JHathaway: puppetmaster: add new function to check for local files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/922877 (https://phabricator.wikimedia.org/T268344) (owner: 10Jbond)
[18:57:05] <wikibugs>	 (03PS1) 10BBlack: [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703)
[18:57:27] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703) (owner: 10BBlack)
[18:57:35] <wikibugs>	 (03CR) 10Jdrewniak: [C: 03+1] Turn on A/B Test Hebrew [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924536 (https://phabricator.wikimedia.org/T336969) (owner: 10Kimberly Sarabia)
[18:59:28] <wikibugs>	 (03PS5) 10Hokwelum: README updated with info on how to create the dumps test nfs mount point subdir [puppet] - 10https://gerrit.wikimedia.org/r/924592 (https://phabricator.wikimedia.org/T325232)
[19:02:06] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] README updated with info on how to create the dumps test nfs mount point subdir [puppet] - 10https://gerrit.wikimedia.org/r/924592 (https://phabricator.wikimedia.org/T325232) (owner: 10Hokwelum)
[19:02:16] <wikibugs>	 (03PS2) 10BBlack: [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703)
[19:05:13] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703) (owner: 10BBlack)
[19:06:14] <wikibugs>	 (03PS3) 10BBlack: [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703)
[19:09:10] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703) (owner: 10BBlack)
[19:10:46] <wikibugs>	 (03CR) 10Brennen Bearnes: gitlab: sync all configured providers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/916522 (https://phabricator.wikimedia.org/T320390) (owner: 10Jbond)
[19:11:16] <logmsgbot>	 !log bking@deploy1002 Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
[19:11:53] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+1] "Typos aside, +1 for idea. Seems fine." [puppet] - 10https://gerrit.wikimedia.org/r/916522 (https://phabricator.wikimedia.org/T320390) (owner: 10Jbond)
[19:12:50] <inflatador>	 !log [WDQS Deploy] Deploying version 0.3.124
[19:12:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:13:08] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on clouddb1013 is OK: OK slave_sql_lag Replication lag: 51.30 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[19:14:48] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2010 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-ssl_443: Servers wdqs2011.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[19:16:22] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2009 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-heavy-queries_8888: Servers wdqs2007.codfw.wmnet are marked down but pooled: wdqs_80: Servers wdqs2007.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[19:18:22] <gehel>	 inflatador, ryankemper: issue with the WDQS deployment? Need help? ^^
[19:19:20] <inflatador>	 gehel looking into it now
[19:21:52] <wikibugs>	 (03PS1) 10Reedy: Revert "Temporarily disable UCoC link from non tech wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924567 (https://phabricator.wikimedia.org/T280886)
[19:21:59] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert "Temporarily disable UCoC link from non tech wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924567 (https://phabricator.wikimedia.org/T280886) (owner: 10Reedy)
[19:22:17] <wikibugs>	 (03CR) 10Reedy: [C: 04-2] "Needs rebase..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924567 (https://phabricator.wikimedia.org/T280886) (owner: 10Reedy)
[19:24:24] <wikibugs>	 (03PS2) 10Reedy: Revert "Temporarily disable UCoC link from non tech wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924567 (https://phabricator.wikimedia.org/T280886)
[19:24:28] <logmsgbot>	 !log ryankemper@puppetmaster1001 conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
[19:24:36] <icinga-wm>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 71, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[19:25:48] <icinga-wm>	 PROBLEM - Check systemd state on wdqs2009 is CRITICAL: CRITICAL - starting: Late bootup, before the job queue becomes idle for the first time, or one of the rescue targets are reached. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:26:20] <icinga-wm>	 RECOVERY - SSH on wdqs2009 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[19:26:25] <wikibugs>	 (03CR) 10Reedy: "https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMessages/+/698076 needs to land first too" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924567 (https://phabricator.wikimedia.org/T280886) (owner: 10Reedy)
[19:26:28] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs2009 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.207 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[19:27:20] <icinga-wm>	 RECOVERY - Check systemd state on wdqs2009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:27:52] <logmsgbot>	 !log bking@deploy1002 Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 16m 36s)
[19:27:54] <icinga-wm>	 PROBLEM - Docker registry HTTPS interface on registry1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Docker
[19:28:03] <logmsgbot>	 !log bking@deploy1002 Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
[19:28:12] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv4: Idle - HE, AS6939/IPv6: Idle - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:28:58] <logmsgbot>	 !log bking@deploy1002 Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 54s)
[19:30:50] <icinga-wm>	 RECOVERY - Docker registry HTTPS interface on registry1004 is OK: HTTP OK: HTTP/1.1 200 OK - 3754 bytes in 0.220 second response time https://wikitech.wikimedia.org/wiki/Docker
[19:32:31] <logmsgbot>	 !log bking@deploy1002 Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
[19:33:58] <jinxer-wm>	 (RdfStreamingUpdaterHighConsumerUpdateLag) firing: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[19:34:09] <wikibugs>	 (03PS2) 10Samtar: Turn on A/B Test Hebrew [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924536 (https://phabricator.wikimedia.org/T336969) (owner: 10Kimberly Sarabia)
[19:35:28] <ryankemper>	 Yeah WDQS looks fine now. Fixed a host that was pooled=false instead of inactive, and rebooted wdqs2009 which was ssh unresponsive.
[19:35:37] <wikibugs>	 (03PS4) 10BBlack: [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703)
[19:35:59] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703) (owner: 10BBlack)
[19:36:28] <wikibugs>	 (03CR) 10Muehlenhoff: Setup debmonitor2003 as bookworm debmonitor VM (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/924517 (https://phabricator.wikimedia.org/T241049) (owner: 10Muehlenhoff)
[19:36:33] <logmsgbot>	 !log bking@deploy1002 Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 04m 02s)
[19:38:07] <wikibugs>	 (03PS5) 10BBlack: [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703)
[19:41:08] <wikibugs>	 (03PS6) 10BBlack: [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703)
[19:43:08] <icinga-wm>	 PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/suggest/sections/{title}/{from}/{to} (Suggest source sections to translate) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
[19:44:25] <wikibugs>	 (03PS7) 10BBlack: [WIP] pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703)
[19:48:39] <logmsgbot>	 !log xcollazo@deploy1002 Started deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. T335305.
[19:48:46] <stashbot>	 T335305: Migrate referrer_daily to Iceberg - https://phabricator.wikimedia.org/T335305
[19:48:49] <logmsgbot>	 !log xcollazo@deploy1002 Finished deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. T335305. (duration: 00m 09s)
[19:49:12] <icinga-wm>	 RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[19:49:54] <wikibugs>	 (03PS1) 10Ladsgroup: Add WANCache to ParserOutputPageProperties::finalize [extensions/CirrusSearch] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924568 (https://phabricator.wikimedia.org/T336698)
[19:51:04] <wikibugs>	 (03PS1) 10Ladsgroup: Add WANCache to ParserOutputPageProperties::finalize [extensions/CirrusSearch] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924569 (https://phabricator.wikimedia.org/T336698)
[19:51:28] <wikibugs>	 (03PS8) 10BBlack: pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703)
[19:51:38] <Amir1>	 jouncebot: nowandnext
[19:51:38] <jouncebot>	 For the next 0 hour(s) and 8 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T1800)
[19:51:38] <jouncebot>	 In 0 hour(s) and 8 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T2000)
[19:53:46] <wikibugs>	 (03CR) 10BBlack: [C: 03+1] "This looks right to me: https://puppet-compiler.wmflabs.org/output/924593/41437/lvs4010.ulsfo.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703) (owner: 10BBlack)
[20:00:07] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, kindrobot, and taavi: OwO what's this, a deployment window?? UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230530T2000). nyaa~
[20:00:07] <jouncebot>	 kimberly_sarabia: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:10] * TheresNoTime can deploy!
[20:00:21] <kimberly_sarabia>	 hello. ty
[20:00:32] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by samtar@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924536 (https://phabricator.wikimedia.org/T336969) (owner: 10Kimberly Sarabia)
[20:00:57] <wikibugs>	 (03PS1) 10BBlack: [WIP] safe-service-restart: use failover i13n [puppet] - 10https://gerrit.wikimedia.org/r/924596 (https://phabricator.wikimedia.org/T334703)
[20:01:26] <wikibugs>	 (03Merged) 10jenkins-bot: Turn on A/B Test Hebrew [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924536 (https://phabricator.wikimedia.org/T336969) (owner: 10Kimberly Sarabia)
[20:01:55] <logmsgbot>	 !log samtar@deploy1002 Started scap: Backport for [[gerrit:924536|Turn on A/B Test Hebrew (T336969)]]
[20:02:01] <stashbot>	 T336969: [Zebra AB test] Fix the mixing of global and user IDs for AB Test Enrollment Bucketing - https://phabricator.wikimedia.org/T336969
[20:02:28] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] [WIP] safe-service-restart: use failover i13n [puppet] - 10https://gerrit.wikimedia.org/r/924596 (https://phabricator.wikimedia.org/T334703) (owner: 10BBlack)
[20:03:35] <logmsgbot>	 !log samtar@deploy1002 ksarabia and samtar: Backport for [[gerrit:924536|Turn on A/B Test Hebrew (T336969)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
[20:03:36] <wikibugs>	 (03PS2) 10BBlack: [WIP] safe-service-restart: use failover i13n [puppet] - 10https://gerrit.wikimedia.org/r/924596 (https://phabricator.wikimedia.org/T334703)
[20:03:38] <TheresNoTime>	 kimberly_sarabia: live on mwdebug, can you test?
[20:03:45] <kimberly_sarabia>	 sure one moment
[20:03:55] <Amir1>	 TheresNoTime: please let me know once you're done
[20:04:04] <TheresNoTime>	 Amir1: will do
[20:05:05] <kimberly_sarabia>	 TheresNoTime: LGTM
[20:05:10] <TheresNoTime>	 syncing
[20:09:08] <wikibugs>	 (03PS9) 10BBlack: pybal: configure failover i13n IPs [puppet] - 10https://gerrit.wikimedia.org/r/924593 (https://phabricator.wikimedia.org/T334703)
[20:09:10] <wikibugs>	 (03PS3) 10BBlack: safe-service-restart: use failover i13n [puppet] - 10https://gerrit.wikimedia.org/r/924596 (https://phabricator.wikimedia.org/T334703)
[20:10:42] <logmsgbot>	 !log samtar@deploy1002 Finished scap: Backport for [[gerrit:924536|Turn on A/B Test Hebrew (T336969)]] (duration: 08m 46s)
[20:10:44] <TheresNoTime>	 kimberly_sarabia: live in prod :)
[20:10:47] <stashbot>	 T336969: [Zebra AB test] Fix the mixing of global and user IDs for AB Test Enrollment Bucketing - https://phabricator.wikimedia.org/T336969
[20:10:55] <TheresNoTime>	 Amir1: all yours
[20:11:19] <Amir1>	 awesome
[20:11:19] <kimberly_sarabia>	 TheresNoTime: great. tysm
[20:11:27] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add WANCache to ParserOutputPageProperties::finalize [extensions/CirrusSearch] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924568 (https://phabricator.wikimedia.org/T336698) (owner: 10Ladsgroup)
[20:12:11] <inflatador>	 !log bking@wdqs2009 depool wdqs2009 until it catches up with lag
[20:12:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:21:47] <wikibugs>	 (03PS1) 10Jforrester: linker: Check for null parser in Linker::makeThumbLink2 [core] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924570 (https://phabricator.wikimedia.org/T337794)
[20:23:43] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by ladsgroup@deploy1002 using scap backport" [extensions/CirrusSearch] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924568 (https://phabricator.wikimedia.org/T336698) (owner: 10Ladsgroup)
[20:23:44] <wikibugs>	 10SRE-OnFire, 10Wikidata, 10Wikidata-Query-Service, 10wdwb-tech, and 2 others: Review alerting around Wikidata Query Service update pipeline - https://phabricator.wikimedia.org/T336574 (10bking)
[20:24:00] <wikibugs>	 10SRE-OnFire, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work), and 2 others: Update WDQS Runbook following update lag incident - https://phabricator.wikimedia.org/T336577 (10bking)
[20:29:55] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "can be deployed now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924060 (https://phabricator.wikimedia.org/T336203) (owner: 10Urbanecm)
[20:30:05] <wikibugs>	 (03Merged) 10jenkins-bot: Add WANCache to ParserOutputPageProperties::finalize [extensions/CirrusSearch] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924568 (https://phabricator.wikimedia.org/T336698) (owner: 10Ladsgroup)
[20:30:31] <logmsgbot>	 !log ladsgroup@deploy1002 Started scap: Backport for [[gerrit:924568|Add WANCache to ParserOutputPageProperties::finalize (T336698)]]
[20:30:37] <stashbot>	 T336698: Reduce the load of CirrusSearch update jobs on MW jobrunners - https://phabricator.wikimedia.org/T336698
[20:32:00] <logmsgbot>	 !log ladsgroup@deploy1002 ladsgroup: Backport for [[gerrit:924568|Add WANCache to ParserOutputPageProperties::finalize (T336698)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
[20:33:42] <wikibugs>	 (03CR) 10Eevans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/913265 (https://phabricator.wikimedia.org/T313814) (owner: 10Eevans)
[20:34:07] <wikibugs>	 (03PS9) 10Jsn.sherman: beta: log additional click events on Special:Diff [mediawiki-config] - 10https://gerrit.wikimedia.org/r/896432 (https://phabricator.wikimedia.org/T326214)
[20:34:56] <wikibugs>	 (03PS10) 10Jsn.sherman: beta: log additional click events on Special:Diff|MobileDiff [mediawiki-config] - 10https://gerrit.wikimedia.org/r/896432 (https://phabricator.wikimedia.org/T326214)
[20:35:36] <wikibugs>	 (03Abandoned) 10Jsn.sherman: Log additional click events on Special:MobileDiff [mediawiki-config] - 10https://gerrit.wikimedia.org/r/899725 (https://phabricator.wikimedia.org/T326216) (owner: 10Jsn.sherman)
[20:37:19] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add WANCache to ParserOutputPageProperties::finalize [extensions/CirrusSearch] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924569 (https://phabricator.wikimedia.org/T336698) (owner: 10Ladsgroup)
[20:39:59] <logmsgbot>	 !log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:924568|Add WANCache to ParserOutputPageProperties::finalize (T336698)]] (duration: 09m 27s)
[20:40:04] <stashbot>	 T336698: Reduce the load of CirrusSearch update jobs on MW jobrunners - https://phabricator.wikimedia.org/T336698
[20:40:58] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by ladsgroup@deploy1002 using scap backport" [extensions/CirrusSearch] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924569 (https://phabricator.wikimedia.org/T336698) (owner: 10Ladsgroup)
[20:49:49] <wikibugs>	 (03CR) 10Eevans: [C: 03+2] cassandra: add support for version 4.1.1 [puppet] - 10https://gerrit.wikimedia.org/r/913265 (https://phabricator.wikimedia.org/T313814) (owner: 10Eevans)
[20:51:03] <wikibugs>	 (03CR) 10Eevans: [V: 03+2 C: 03+2] cassandra: add support for version 4.1.1 [puppet] - 10https://gerrit.wikimedia.org/r/913265 (https://phabricator.wikimedia.org/T313814) (owner: 10Eevans)
[20:56:36] <wikibugs>	 (03Merged) 10jenkins-bot: Add WANCache to ParserOutputPageProperties::finalize [extensions/CirrusSearch] (wmf/1.41.0-wmf.10) - 10https://gerrit.wikimedia.org/r/924569 (https://phabricator.wikimedia.org/T336698) (owner: 10Ladsgroup)
[20:57:01] <logmsgbot>	 !log ladsgroup@deploy1002 Started scap: Backport for [[gerrit:924569|Add WANCache to ParserOutputPageProperties::finalize (T336698)]]
[20:57:07] <stashbot>	 T336698: Reduce the load of CirrusSearch update jobs on MW jobrunners - https://phabricator.wikimedia.org/T336698
[20:58:36] <logmsgbot>	 !log ladsgroup@deploy1002 ladsgroup: Backport for [[gerrit:924569|Add WANCache to ParserOutputPageProperties::finalize (T336698)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
[21:09:46] <logmsgbot>	 !log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:924569|Add WANCache to ParserOutputPageProperties::finalize (T336698)]] (duration: 12m 44s)
[21:15:49] <TheresNoTime>	 jouncebot: nowandnext
[21:15:50] <jouncebot>	 No deployments scheduled for the next 8 hour(s) and 44 minute(s)
[21:15:50] <jouncebot>	 In 8 hour(s) and 44 minute(s): MediaWiki infrastucture (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230531T0600)
[21:27:42] <TheresNoTime>	 Amir1: have you finished deploying? I may backport 924570 for T337794
[21:27:43] <stashbot>	 T337794: Error: Call to a member function getOutput() on null - https://phabricator.wikimedia.org/T337794
[21:28:08] <Amir1>	 TheresNoTime: I am
[21:28:10] <Amir1>	 have fun
[21:28:26] <TheresNoTime>	 :)
[21:29:46] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by samtar@deploy1002 using scap backport" [core] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924570 (https://phabricator.wikimedia.org/T337794) (owner: 10Jforrester)
[21:48:15] <wikibugs>	 (03Merged) 10jenkins-bot: linker: Check for null parser in Linker::makeThumbLink2 [core] (wmf/1.41.0-wmf.11) - 10https://gerrit.wikimedia.org/r/924570 (https://phabricator.wikimedia.org/T337794) (owner: 10Jforrester)
[21:48:45] <logmsgbot>	 !log samtar@deploy1002 Started scap: Backport for [[gerrit:924570|linker: Check for null parser in Linker::makeThumbLink2 (T337794)]]
[21:48:51] <stashbot>	 T337794: Error: Call to a member function getOutput() on null - https://phabricator.wikimedia.org/T337794
[21:50:22] <logmsgbot>	 !log samtar@deploy1002 jforrester and samtar: Backport for [[gerrit:924570|linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
[21:50:26] * TheresNoTime testing
[21:50:38] <wikibugs>	 10SRE-OnFire, 10Discovery-Search, 10Sustainability: WDQS: Document procedure for switching between Kubernetes and Yarn Streaming Updater - https://phabricator.wikimedia.org/T337801 (10bking)
[21:51:06] * TheresNoTime syncing
[21:51:49] <wikibugs>	 10SRE-OnFire, 10Discovery-Search, 10Sustainability: WDQS: Document procedure for switching between Kubernetes and Yarn Streaming Updater - https://phabricator.wikimedia.org/T337801 (10bking)
[21:56:22] <icinga-wm>	 PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/translate/{from}/{to} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
[21:56:34] <logmsgbot>	 !log samtar@deploy1002 Finished scap: Backport for [[gerrit:924570|linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] (duration: 07m 48s)
[21:56:39] <stashbot>	 T337794: Error: Call to a member function getOutput() on null - https://phabricator.wikimedia.org/T337794
[21:57:48] <icinga-wm>	 RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[22:48:35] <wikibugs>	 10SRE, 10DNS, 10Domains, 10Traffic: Update DNS records for mastodon.wikimedia.org - https://phabricator.wikimedia.org/T337586 (10Dzahn) Is there a place where we can read about this project and the general plan around it?
[22:58:58] <jinxer-wm>	 (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (2) wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[23:07:55] <wikibugs>	 (03PS2) 10Cwhite: hiera: disable security plugin on beta-logs [puppet] - 10https://gerrit.wikimedia.org/r/912391 (https://phabricator.wikimedia.org/T333732)
[23:12:39] <wikibugs>	 (03PS1) 10Dzahn: planet: restrict firewall source range for port 443 to envoy [puppet] - 10https://gerrit.wikimedia.org/r/924604
[23:18:33] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "maybe just flip the "bool2str('present', 'absent')" around and call it "$ensure_not_on_active"?  or  !$ensure_on_active ?" [puppet] - 10https://gerrit.wikimedia.org/r/924085 (https://phabricator.wikimedia.org/T334435) (owner: 10EoghanGaffney)
[23:27:23] <wikibugs>	 (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/output/924604/41440/" [puppet] - 10https://gerrit.wikimedia.org/r/924604 (owner: 10Dzahn)
[23:27:37] <wikibugs>	 (03PS2) 10Dzahn: planet: restrict firewall source range for port 443 to envoy [puppet] - 10https://gerrit.wikimedia.org/r/924604
[23:28:30] <zabe->	 jouncebot: nowandnext
[23:28:30] <jouncebot>	 No deployments scheduled for the next 6 hour(s) and 31 minute(s)
[23:28:30] <jouncebot>	 In 6 hour(s) and 31 minute(s): MediaWiki infrastucture (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230531T0600)
[23:28:38] <wikibugs>	 (03PS2) 10Zabe: Start reading from rev_comment_id in group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924564 (https://phabricator.wikimedia.org/T299954)
[23:28:40] <wikibugs>	 (03CR) 10Dzahn: "Haven't checked yet but seems like this might affect a bunch of other hosts too." [puppet] - 10https://gerrit.wikimedia.org/r/924604 (owner: 10Dzahn)
[23:28:44] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Start reading from rev_comment_id in group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924564 (https://phabricator.wikimedia.org/T299954) (owner: 10Zabe)
[23:29:31] <wikibugs>	 (03Merged) 10jenkins-bot: Start reading from rev_comment_id in group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924564 (https://phabricator.wikimedia.org/T299954) (owner: 10Zabe)
[23:30:01] <logmsgbot>	 !log zabe@deploy1002 Started scap: Backport for [[gerrit:924564|Start reading from rev_comment_id in group1 wikis (T299954)]]
[23:30:10] <stashbot>	 T299954: Write code for handing write and read of rev_comment_id - https://phabricator.wikimedia.org/T299954
[23:31:32] <logmsgbot>	 !log zabe@deploy1002 zabe: Backport for [[gerrit:924564|Start reading from rev_comment_id in group1 wikis (T299954)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
[23:31:38] <wikibugs>	 (03PS1) 10Zabe: Start reading from rev_comment_id everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/924605 (https://phabricator.wikimedia.org/T299954)
[23:38:02] <logmsgbot>	 !log zabe@deploy1002 Finished scap: Backport for [[gerrit:924564|Start reading from rev_comment_id in group1 wikis (T299954)]] (duration: 08m 00s)
[23:38:07] <stashbot>	 T299954: Write code for handing write and read of rev_comment_id - https://phabricator.wikimedia.org/T299954
[23:49:12] <wikibugs>	 (03PS1) 10Dzahn: gerrit/bacula: adjust Gerrit file paths to be backed up [puppet] - 10https://gerrit.wikimedia.org/r/924608 (https://phabricator.wikimedia.org/T336427)
[23:50:03] <wikibugs>	 (03PS2) 10Dzahn: gerrit/bacula: adjust Gerrit file paths to be backed up [puppet] - 10https://gerrit.wikimedia.org/r/924608 (https://phabricator.wikimedia.org/T336427)
[23:54:05] <wikibugs>	 (03CR) 10Dzahn: "/var/lib/gerrit2 which is not currently backed up contains all the .h2 databases like:" [puppet] - 10https://gerrit.wikimedia.org/r/924608 (https://phabricator.wikimedia.org/T336427) (owner: 10Dzahn)