[00:18:17] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[00:18:24] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[00:26:33] <icinga-wm>	 RECOVERY - Disk space on centrallog1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=centrallog1002&var-datasource=eqiad+prometheus/ops
[00:38:55] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1009946
[00:38:57] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1009946 (owner: 10TrainBranchBot)
[00:49:55] <jinxer-wm>	 (KubernetesAPINotScrapable) firing: (2) k8s-aux@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[01:00:45] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1009946 (owner: 10TrainBranchBot)
[01:01:56] <jinxer-wm>	 (SystemdUnitFailed) resolved: mediawiki_job_generatecaptcha.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:35:48] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[01:35:54] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[02:15:55] <icinga-wm>	 RECOVERY - Host ripe-atlas-ulsfo is UP: PING WARNING - Packet loss = 77%, RTA = 33.11 ms
[02:22:11] <jinxer-wm>	 (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:22:19] <icinga-wm>	 PROBLEM - Host ripe-atlas-ulsfo is DOWN: PING CRITICAL - Packet loss = 100%
[02:36:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 49.83% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[02:37:14] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:41:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 49.83% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[02:41:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[03:11:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[03:12:14] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:32:25] <jinxer-wm>	 (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:04:06] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2024-03-11-035839-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1009941 (https://phabricator.wikimedia.org/T350773)
[04:07:06] <wikibugs>	 10SRE-swift-storage, 06Commons, 10MediaWiki-File-management: Uploads fail due to 401 error from swift on wednesdays - https://phabricator.wikimedia.org/T358830#9618833 (10tstarling) a:03tstarling
[04:31:34] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2024-03-11-035839-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1009941 (https://phabricator.wikimedia.org/T350773) (owner: 10KartikMistry)
[04:32:42] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2024-03-11-035839-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1009941 (https://phabricator.wikimedia.org/T350773) (owner: 10KartikMistry)
[04:37:31] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[04:37:38] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[04:38:46] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply
[04:39:20] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[04:46:31] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[04:47:03] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[04:47:46] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[04:48:22] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[04:49:55] <jinxer-wm>	 (KubernetesAPINotScrapable) firing: (2) k8s-aux@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[04:52:28] <kart_>	 !log Updated cxserver to 2024-03-11-035839-production (T350773)
[04:52:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:52:34] <stashbot>	 T350773: Remove preq and use node fetch - https://phabricator.wikimedia.org/T350773
[05:25:09] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1024 is CRITICAL: CRITICAL - logstash-default-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logstash-syslog-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.007Z), logstash-default-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-webrequest-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.002Z), logstas
[05:25:09] <icinga-wm>	 iki-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.30[0](2024-03-08T03:44:46.004Z), logstash-k8s-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.003Z), logstash-default-1-7.0.0-1-2023.12.30[0](2024-03-08T03:44:46.005Z), logstash-syslog-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.003Z), logstash-deploy-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.007Z), logstash-mediawiki-1-7.0.0-1-2023.12.17[0](2024-03
[05:25:09] <icinga-wm>	 4:46.005Z), logstash-syslog-1-7.0.0-1-2023.12.12[0](2024-03-08T03:44:46.002Z), logstash-mediawiki-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.17[0](2024-03-08 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:25:09] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1031 is CRITICAL: CRITICAL - logstash-webrequest-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.007Z), logstash-deploy-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-k8s-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.006Z), logstash-k8s-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.004Z), logstash-default-1
[05:25:09] <icinga-wm>	 -2023.12.29[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.002Z), logstash-default-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.006Z), logstash-webrequest-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.002Z), logstash-mediawiki-1-7.0.0-1-2023.12.15[0](2024-03
[05:25:10] <icinga-wm>	 4:46.003Z), logstash-default-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logstash-k8s-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-default-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:4 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:31:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1023 is CRITICAL: CRITICAL - logstash-k8s-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.006Z), logstash-mediawiki-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.004Z), logstash-webrequest-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-
[05:31:11] <icinga-wm>	 0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-syslog-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.003Z), logstash-k8s-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.006Z), logstash-k8s-1-7.0.0-1-2023.12.22[0](2024-03-08T03:44:46.002Z), logstash-default-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.007Z), logstash-webrequest-1-7.0.0-1-2023.12.24[0](2024-03-08T03:44:
[05:31:11] <icinga-wm>	 , logstash-deploy-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.004Z), logstash-default-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.003Z), https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:31:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1030 is CRITICAL: CRITICAL - logstash-deploy-1-7.0.0-1-2023.12.24[0](2024-03-08T03:44:46.002Z), logstash-deploy-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.007Z), logstash-k8s-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.006Z), logstash-k8s-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.006Z), logstash-webrequest-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.007Z), logstash-k8s-1-7.0
[05:31:11] <icinga-wm>	 4.01.02[0](2024-03-08T03:44:46.007Z), logstash-mediawiki-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-default-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.003Z), logstash-k8s-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-default-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.31[0](2024-03-08T03:44:46.004Z), logstash-mediawiki-1-7.0.0-1-2024.01.02[0](2024-03-08T03:44:46.004Z
[05:31:12] <icinga-wm>	 ash-webrequest-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-default-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.004Z) https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:32:25] <jinxer-wm>	 (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:34:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1037 is CRITICAL: CRITICAL - logstash-k8s-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.002Z), logstash-syslog-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.007Z), logstash-webrequest-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.007Z), logstash-webrequest-1-7.0.0-1-2023.12.15[0](2024-03-08T03:44:46.004Z), logstash
[05:34:11] <icinga-wm>	 ki-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-default-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logstash-deploy-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.004Z), logstash-webrequest-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-syslog-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.31[0](2024-03-08T03:44:46.004Z), logstash-mediawiki-1-7.0.0-1-2023.12.12[0](20
[05:34:11] <icinga-wm>	 T03:44:46.005Z), logstash-deploy-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.007Z), logstash-mediawiki-1-7.0.0-1-2023.12.24[0](2024- https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:37:09] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1036 is CRITICAL: CRITICAL - logstash-k8s-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46.005Z), logstash-webrequest-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.002Z), logstash-deploy-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.006Z), logstash-webrequest-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.31[0](2024-03-08T03:44:46.004Z), logstash
[05:37:09] <icinga-wm>	 -1-7.0.0-1-2023.12.24[0](2024-03-08T03:44:46.003Z), logstash-default-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-webrequest-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-default-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.13[0]
[05:37:09] <icinga-wm>	 -08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.004Z), logstash-default-1-7.0.0-1-2023.12.30[0](2024-03-08T03:44:46.005Z), logstash-k8s-1-7.0.0-1-2023.12.24[0](2024-03-08T https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:37:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1029 is CRITICAL: CRITICAL - logstash-k8s-1-7.0.0-1-2024.01.02[0](2024-03-08T03:44:46.007Z), logstash-deploy-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.004Z), logstash-deploy-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.002Z), logstash-syslog-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.004Z), logstash-k8s-1-
[05:37:11] <icinga-wm>	 2023.12.13[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.006Z), logstash-mediawiki-1-7.0.0-1-2024.01.01[0](2024-03-08T03:44:46.007Z), logstash-default-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.006Z), logstash-mediawiki-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.12[0](2024-03-08T03:44:46.002Z), logstash-mediawiki-1-7.0.0-1-2023.12.24[0](2024-03-08T03:44
[05:37:11] <icinga-wm>	 ), logstash-k8s-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-webrequest-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:37:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1012 is CRITICAL: CRITICAL - logstash-mediawiki-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.24[0](2024-03-08T03:44:46.007Z), logstash-k8s-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-web
[05:37:11] <icinga-wm>	 1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.007Z), logstash-mediawiki-1-7.0.0-1-2023.12.31[0](2024-03-08T03:44:46.004Z), logstash-default-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.003Z), logstash-default-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.002Z), logstash-syslog-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.005Z), logstash-syslog-1-7.0.0-1-2023.12.28[0](2024-03
[05:37:12] <icinga-wm>	 4:46.006Z), logstash-webrequest-1-7.0.0-1-2023.12.30[0](2024-03-08T03:44:46.004Z), logstash-k8s-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46.005Z), logstash-default-1-7.0.0-1-2023.12.18[0](2024-03-08T03:4 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:37:55] <jinxer-wm>	 (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:40:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1035 is CRITICAL: CRITICAL - logstash-deploy-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.002Z), logstash-default-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-mediawiki-1-7.0.0-1-2023.12.24[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.005Z), logstash-dep
[05:40:11] <icinga-wm>	 0.0-1-2023.12.24[0](2024-03-08T03:44:46.002Z), logstash-syslog-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-mediawiki-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.006Z), logstash-default-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.002Z), logstash-deploy-1-7.0.0-1-2023.12.15[0](2024-03-0
[05:40:11] <icinga-wm>	 46.004Z), logstash-mediawiki-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.007Z), logstash-syslog-1-7.0.0-1-2023.12.28[0](2024-03-08T03:44:46.006Z), logstash-webrequest-1-7.0.0-1-2023.12.18[0](2024-03-08T0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:40:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1028 is CRITICAL: CRITICAL - logstash-deploy-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.30[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.15[0](2024-03-08T03:44:46.003Z), logst
[05:40:11] <icinga-wm>	 1-7.0.0-1-2024.01.02[0](2024-03-08T03:44:46.007Z), logstash-syslog-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-k8s-1-7.0.0-1-2023.12.22[0](2024-03-08T03:44:46.002Z), logstash-default-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.006Z), logstash-mediawiki-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.12[0](2024-03-08T03:44:46.005Z), logstash-webrequest-1-7.0.0-1-2023.12.29[0](2024-03-
[05:40:12] <icinga-wm>	 :46.004Z), logstash-webrequest-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logstash-default-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.14[0](2024-03-08T03 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:43:09] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1010 is CRITICAL: CRITICAL - logstash-deploy-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.004Z), logstash-default-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-deploy-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.004Z), logstash-mediawiki-1-7.0.0-1-2024.01.02[0](2024-03-08T03:44:46.004Z), logstash-sys
[05:43:09] <icinga-wm>	 0.0-1-2023.12.28[0](2024-03-08T03:44:46.006Z), logstash-default-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.002Z), logstash-syslog-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.005Z), logstash-syslog-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-webrequest-1-7.0.0-1-2023.12.27[0](2024-03
[05:43:09] <icinga-wm>	 4:46.007Z), logstash-webrequest-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-default-1-7.0.0-1-2023.12.24[0](2024-03- https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:43:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1034 is CRITICAL: CRITICAL - logstash-mediawiki-1-7.0.0-1-2024.01.02[0](2024-03-08T03:44:46.004Z), logstash-webrequest-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.007Z), logstash-default-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.006Z), logst
[05:43:11] <icinga-wm>	 og-1-7.0.0-1-2023.12.28[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.31[0](2024-03-08T03:44:46.004Z), logstash-k8s-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.003Z), logstash-default-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.005Z), logstash-syslog-1-7.0.0-1-2023.12.18[0](2024-03-08T03:4
[05:43:11] <icinga-wm>	 Z), logstash-webrequest-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.004Z), logstash-deploy-1-7.0.0-1-2023.12.22[0](2024-03-08T03:44:46.0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:43:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1027 is CRITICAL: CRITICAL - logstash-webrequest-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.007Z), logstash-mediawiki-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.005Z), logstash-syslog-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logstash-k8s-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.003Z), logstash
[05:43:11] <icinga-wm>	 1-7.0.0-1-2023.12.15[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.28[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.12[0](2024-03-08T03:44:46.002Z), logstash-mediawiki-1-7.0.0-1-2023.12.24[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-syslog-1-7.0.0-1-2023.12.31[0](2024-03-08T03:44:46.004Z), logstash-mediawiki-1-7.0.0-1-2023.12.25[0](2024-03
[05:43:12] <icinga-wm>	 4:46.002Z), logstash-syslog-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.15[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.14[0](2024-03-08 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:46:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1026 is CRITICAL: CRITICAL - logstash-default-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.28[0](2024-03-08T03:44:46.006Z), logstash-deploy-1-7.0.0-1-2023.12.24[0](2024-03-08T03:44:46.002Z), logstash-k8s-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.004Z), logstash-deploy-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.004Z), logstash-webreques
[05:46:11] <icinga-wm>	 0-1-2023.12.27[0](2024-03-08T03:44:46.007Z), logstash-mediawiki-1-7.0.0-1-2023.12.31[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.12[0](2024-03-08T03:44:46.005Z), logstash-syslog-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.003Z), logstash-deploy-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.25[0](2024-03-08T0
[05:46:11] <icinga-wm>	 005Z), logstash-k8s-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46.005Z), logstash-webrequest-1-7.0.0-1-2024.01.02[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.15[0](2024-03-08T03:44:4 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:46:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1033 is CRITICAL: CRITICAL - logstash-mediawiki-1-7.0.0-1-2023.12.30[0](2024-03-08T03:44:46.006Z), logstash-deploy-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.30[0](2024-03-08T03:44:46.004Z), logstash-default-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logs
[05:46:11] <icinga-wm>	 -1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.006Z), logstash-k8s-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.004Z), logstash-k8s-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.006Z), logstash-deploy-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.004Z), logstash-mediawiki-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.18[0](2024-03-08T03:4
[05:46:12] <icinga-wm>	 Z), logstash-webrequest-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46.005Z), logstash-syslog-1-7.0.0-1-2023.12.31[0](2024-03-08T03:44:46.004Z), logstash-mediawiki-1-7.0.0-1-2023.12.22[0](2024-03-08T03:44:4 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:49:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1025 is CRITICAL: CRITICAL - logstash-k8s-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.007Z), logstash-webrequest-1-7.0.0-1-2023.12.30[0](2024-03-08T03:44:46.004Z), logstash-webrequest-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.007Z), logstash-deploy-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-k8
[05:49:11] <icinga-wm>	 0-1-2023.12.26[0](2024-03-08T03:44:46.006Z), logstash-default-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.006Z), logstash-deploy-1-7.0.0-1-2023.12.22[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.007Z), logstash-k8s-1-7.0.0-1-2024.01.02[0](2024-03-08T03:44:46.007Z), logstash-default-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-deploy-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46
[05:49:11] <icinga-wm>	 logstash-mediawiki-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.007Z), logstash-syslog-1-7.0.0-1-2023.12.12[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2024.01.02[0](2024-03-08T03:44:46.00 https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:49:11] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on logstash1032 is CRITICAL: CRITICAL - logstash-k8s-1-7.0.0-1-2023.12.22[0](2024-03-08T03:44:46.002Z), logstash-syslog-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.13[0](2024-03-08T03:44:46.007Z), logstash-syslog-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.004Z), logstash-web
[05:49:11] <icinga-wm>	 1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.002Z), logstash-syslog-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-default-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.003Z), logstash-k8s-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.004Z), logstash-default-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-default-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.006Z), logstash-deploy-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44
[05:49:12] <icinga-wm>	 ), logstash-default-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.12[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.14[0](2024-03-08T03:44:46 https://wikitech.wikimedia.org/wiki/Search%23Administration
[06:09:05] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga - https://phabricator.wikimedia.org/T358658#9618875 (10KCVelaga_WMF) @cmooney all permissions and access for `kcvelaga` are working fine without any trouble, permissions/access for LDAP user `KCVe...
[06:22:11] <jinxer-wm>	 (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:35:03] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[06:35:10] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240310T0800)
[07:00:05] <jouncebot>	 Amir1 and Urbanecm: May I have your attention please! UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T0700)
[07:00:05] <jouncebot>	 mo_abualruz: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:04:21] <jinxer-wm>	 (PoolcounterFullQueues) firing: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:06:23] <wikibugs>	 (03CR) 10Mabualruz: [C: 03+1] Exclude non-functional pages from night mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1009790 (https://phabricator.wikimedia.org/T359183) (owner: 10Jdlrobson)
[07:09:21] <jinxer-wm>	 (PoolcounterFullQueues) resolved: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:12:28] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:27:26] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1009854 (https://phabricator.wikimedia.org/T357547) (owner: 10Kamila Součková)
[07:29:27] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[07:29:33] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[07:39:46] <kostajh>	 mo_abualruz: are you around?
[07:39:53] <mo_abualruz>	 I am
[07:39:58] <kostajh>	 I can deploy your patch
[07:40:11] <mo_abualruz>	 Thanks that would be lovely
[07:41:47] <wikibugs>	 (03PS3) 10Kosta Harlan: throttle: Allow for overriding temp account creation limits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1008112 (https://phabricator.wikimedia.org/T357777)
[07:42:20] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1009790 (https://phabricator.wikimedia.org/T359183) (owner: 10Jdlrobson)
[07:43:35] <wikibugs>	 (03Merged) 10jenkins-bot: Exclude non-functional pages from night mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1009790 (https://phabricator.wikimedia.org/T359183) (owner: 10Jdlrobson)
[07:44:28] <logmsgbot>	 !log kharlan@deploy2002 Started scap: Backport for [[gerrit:1009790|Exclude non-functional pages from night mode (T359183)]]
[07:44:32] <stashbot>	 T359183: Exclude non-functional pages from night mode - https://phabricator.wikimedia.org/T359183
[07:44:36] <wikibugs>	 (03PS1) 10Cwhite: logstash: provision and commision logging-hd100[123] nodes [puppet] - 10https://gerrit.wikimedia.org/r/1009947 (https://phabricator.wikimedia.org/T352517)
[07:53:16] <wikibugs>	 (03CR) 10Cwhite: "PCC OK: https://puppet-compiler.wmflabs.org/output/1009947/1628/" [puppet] - 10https://gerrit.wikimedia.org/r/1009947 (https://phabricator.wikimedia.org/T352517) (owner: 10Cwhite)
[07:56:30] <logmsgbot>	 !log kharlan@deploy2002 kharlan and jdlrobson: Backport for [[gerrit:1009790|Exclude non-functional pages from night mode (T359183)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[07:56:32] <kostajh>	 effie / marostegui: should I be concerned about seeing T359787 during a scap backport just now?
[07:56:35] <stashbot>	 T359183: Exclude non-functional pages from night mode - https://phabricator.wikimedia.org/T359183
[07:56:35] <stashbot>	 T359787: ImportError: cannot import name 'where' from 'certifi' (unknown location) - https://phabricator.wikimedia.org/T359787
[07:56:52] <kostajh>	 mo_abualruz: please test your patch on mwdebug
[07:57:18] <mo_abualruz>	 Thanks give me a minute
[07:57:44] <marostegui>	 kostajh: I have no context on what that really is about sorry
[07:58:05] <kostajh>	 ok
[07:58:13] <kostajh>	 it seems like `scap` is able to proceed...
[07:59:25] <kostajh>	 it means that https://gitlab.wikimedia.org/repos/releng/scap/-/blob/master/scap/main.py#L347 didn't run, it seems
[08:00:07] <mo_abualruz>	 Seems it is working
[08:01:21] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga - https://phabricator.wikimedia.org/T358658#9618970 (10cmooney) Thanks for confirming @KCVelaga_WMF.  I’ll get that done over the next day or so; we have our annual SRE meet up this week but I sho...
[08:01:36] <kostajh>	 Amir1 / urbanecm, are either of you around?
[08:01:50] <urbanecm>	 around
[08:01:58] <urbanecm>	 (but waiting for Madalina to join a meeting)
[08:02:54] <kostajh>	 urbanecm: do you think the error for T359787 should halt deployment?
[08:02:55] <stashbot>	 T359787: ImportError: cannot import name 'where' from 'certifi' (unknown location) - https://phabricator.wikimedia.org/T359787
[08:03:33] <kostajh>	 it seems like not pulling master is a blocker, but I don't know the underlying mechanics well enough to say for sure.
[08:03:40] <urbanecm>	 kostajh: it looks super weird. but i also can't reproduce it anywhere. 
[08:03:51] <kostajh>	 urbanecm: seems ok to proceed with sync, then?
[08:04:18] <urbanecm>	 personally, i'd stop until someone can take a look and verify what is happening
[08:04:26] <kostajh>	 alright
[08:04:35] <kostajh>	 seems safer
[08:04:46] <logmsgbot>	 !log kharlan@deploy2002 Sync cancelled.
[08:04:59] * urbanecm goes fully into the meeting now
[08:05:00] <kostajh>	 mo_abualruz: sorry, but we'll have to pick this up later, after T359787 is resolved.
[08:05:38] <mo_abualruz>	 No worries I will document this in the ticket
[08:06:18] <jinxer-wm>	 (NELHigh) firing: (2) Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[08:16:17] <jinxer-wm>	 (NELHigh) resolved: (2) Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[08:17:49] <effie>	 kostajh: we are at our offsite, but we will look at this shortly
[08:25:41] <godog>	 !log bounce prometheus@aux-k8s - T343529
[08:29:41] <jinxer-wm>	 (KubernetesAPINotScrapable) resolved: (2) k8s-aux@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[08:29:41] <godog>	 hah stashbot came back
[08:29:46] <godog>	 !log bounce prometheus@aux-k8s - T343529
[08:29:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:29:51] <stashbot>	 T343529: Prometheus doesn't reload or alert on expired client certificates - https://phabricator.wikimedia.org/T343529
[08:44:26] <kostajh>	 effie: it should be ok to leave for RelEng
[08:55:03] <logmsgbot>	 !log jnuche@deploy2002 Installing scap version "4.70.1" for 376 hosts
[08:55:44] <logmsgbot>	 !log jnuche@deploy2002 Installation of scap version "4.70.1" completed for 376 hosts
[08:56:18] <logmsgbot>	 !log jnuche@deploy2002 Installing scap version "4.70.1" for 376 hosts
[08:57:03] <logmsgbot>	 !log jnuche@deploy2002 Installation of scap version "4.70.1" completed for 376 hosts
[09:01:19] <jnuche>	 kostajh, effie: scap should be working normally again -> https://phabricator.wikimedia.org/T359787#9619037
[09:02:29] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1220 to x1 master [puppet] - 10https://gerrit.wikimedia.org/r/1009949 (https://phabricator.wikimedia.org/T359790)
[09:03:12] <jnuche>	 jouncebot: nowandnext
[09:03:12] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 56 minute(s)
[09:03:12] <jouncebot>	 In 0 hour(s) and 56 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T1000)
[09:03:39] <jnuche>	 you should be fine to go ahead now with the backports if you still have the time
[09:25:09] <hashar>	 kostajh: my guess is https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1009790 got merged but hasn't been deployed due to the scap issue
[09:26:00] <kostajh>	 hashar: ah, right. 
[09:26:11] <kostajh>	 mo_abualruz: are you still around? we can continue with the backport
[09:26:34] <mo_abualruz>	 I am
[09:27:19] <hashar>	 we can do it now, that looks straight forward to test
[09:27:24] <kostajh>	 ok
[09:27:30] <kostajh>	 hashar: do you want to do it, or should I?
[09:27:52] <kostajh>	 I also had a patch for the window (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1008112) which is a no-op, and can be merged without verification
[09:28:00] <hashar>	 please do :)
[09:28:03] <mo_abualruz>	 I can do it cool let me
[09:28:44] <mo_abualruz>	 oh it was not addressed to me nvm 
[09:35:20] <kostajh>	 mo_abualruz: ok, hang on
[09:35:45] <logmsgbot>	 !log kharlan@deploy2002 Started scap: Backport for [[gerrit:1009790|Exclude non-functional pages from night mode (T359183)]]
[09:35:50] <stashbot>	 T359183: Exclude non-functional pages from night mode - https://phabricator.wikimedia.org/T359183
[09:38:00] <logmsgbot>	 !log kharlan@deploy2002 jdlrobson and kharlan: Backport for [[gerrit:1009790|Exclude non-functional pages from night mode (T359183)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[09:38:10] <jinxer-wm>	 (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:38:17] <kostajh>	 mo_abualruz: do you mind verifying again?
[09:39:07] <mo_abualruz>	 Sure
[09:39:39] <mo_abualruz>	 It is working 
[09:40:50] <hashar>	 \o/
[09:40:55] <hashar>	 jnuche: thanks for the scap fix!
[09:41:17] <jnuche>	 🥳
[09:41:42] <logmsgbot>	 !log kharlan@deploy2002 jdlrobson and kharlan: Continuing with sync
[09:48:43] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw2351 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[09:52:28] <logmsgbot>	 !log kharlan@deploy2002 Finished scap: Backport for [[gerrit:1009790|Exclude non-functional pages from night mode (T359183)]] (duration: 16m 42s)
[09:52:32] <stashbot>	 T359183: Exclude non-functional pages from night mode - https://phabricator.wikimedia.org/T359183
[09:56:23] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] P:puppetserver: git: mark /srv/git as safe [puppet] - 10https://gerrit.wikimedia.org/r/1009805 (owner: 10Majavah)
[09:59:12] <kostajh>	 mo_abualruz: all done
[09:59:25] <mo_abualruz>	 thanks a lot
[09:59:42] <kostajh>	 !log UTC morning deploys done
[09:59:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:59:59] <kostajh>	 (I decided to leave my patch for next week)
[10:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T1000)
[10:12:53] <wikibugs>	 (03PS1) 10Dzahn: add 'kus' (Kusaal) language to project languages [dns] - 10https://gerrit.wikimedia.org/r/1010161 (https://phabricator.wikimedia.org/T359757)
[10:14:43] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] add 'kus' (Kusaal) language to project languages [dns] - 10https://gerrit.wikimedia.org/r/1010161 (https://phabricator.wikimedia.org/T359757) (owner: 10Dzahn)
[10:14:47] <wikibugs>	 (03PS2) 10Dzahn: add 'kus' (Kusaal) language to project languages [dns] - 10https://gerrit.wikimedia.org/r/1010161 (https://phabricator.wikimedia.org/T359757)
[10:18:03] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "Ok. I guess the cache eviction `curl` call is failing? If so, that a separate issue than this one that we should fix separately. The rest " [puppet] - 10https://gerrit.wikimedia.org/r/1007396 (owner: 10Majavah)
[10:18:43] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw2351 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[10:22:11] <jinxer-wm>	 (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:24:13] <wikibugs>	 (03CR) 10Dzahn: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/1010161 (https://phabricator.wikimedia.org/T359757) (owner: 10Dzahn)
[10:26:06] <mutante>	 !log DNS - added new project language 'kus' - Kusaal is a Gur language spoken primarily in northern eastern Ghana, and Burkina Faso. It is spoken by about 121,000 people. T359757
[10:26:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:18] <stashbot>	 T359757: Create Wikipedia Kusaal - https://phabricator.wikimedia.org/T359757
[10:28:03] <wikibugs>	 06SRE, 10ops-eqiad: Degraded RAID on dumpsdata1007 - https://phabricator.wikimedia.org/T359702#9619268 (10Jclark-ctr) a:03Jclark-ctr
[10:32:06] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[10:32:13] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[10:33:19] <wikibugs>	 (03PS1) 10Mvolz: editcheckreferenceurl: don't error when aborting the lookupPromise [extensions/Citoid] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1009740 (https://phabricator.wikimedia.org/T359601)
[10:49:01] <wikibugs>	 06SRE, 10ops-eqiad: Degraded RAID on dumpsdata1007 - https://phabricator.wikimedia.org/T359702#9619344 (10Jclark-ctr) ticket submitted   You have successfully submitted request SR186677718.  @Marostegui  lets catch up about eta for replacement
[10:54:33] <wikibugs>	 06SRE, 10ops-eqiad, 06Data-Engineering: Degraded RAID on dumpsdata1007 - https://phabricator.wikimedia.org/T359702#9619351 (10Marostegui)
[11:02:23] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] miscweb(wikiworkshop): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1009859 (https://phabricator.wikimedia.org/T349774) (owner: 10DDesouza)
[11:03:29] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb(wikiworkshop): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1009859 (https://phabricator.wikimedia.org/T349774) (owner: 10DDesouza)
[11:12:28] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:21:15] <wikibugs>	 (03CR) 10Andrew Bogott: "It looks like the cache eviction happens last, so the important bits are likely getting done even though the run returns failure. So maybe" [puppet] - 10https://gerrit.wikimedia.org/r/1007396 (owner: 10Majavah)
[11:30:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "Confirmed, this works on the second pass." [puppet] - 10https://gerrit.wikimedia.org/r/1007396 (owner: 10Majavah)
[11:37:41] <wikibugs>	 (03PS2) 10Andrew Bogott: git-sync-upstream: on puppet7, deploy code after update [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450)
[11:37:42] <wikibugs>	 (03PS2) 10Andrew Bogott: git-sync-upstream.py: run through black [puppet] - 10https://gerrit.wikimedia.org/r/1009799
[11:37:44] <wikibugs>	 (03PS12) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455)
[11:37:50] <wikibugs>	 (03PS13) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455)
[11:37:55] <jinxer-wm>	 (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:37:58] <wikibugs>	 (03PS1) 10Andrew Bogott: P:puppetserver: git: mark repos dirs as safe [puppet] - 10https://gerrit.wikimedia.org/r/1010166
[11:40:17] <wikibugs>	 (03CR) 10Majavah: [V: 03+1 C: 03+2] "If the" [puppet] - 10https://gerrit.wikimedia.org/r/1007396 (owner: 10Majavah)
[11:41:25] <jinxer-wm>	 (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:42:56] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] P:puppetserver: git: mark repos dirs as safe [puppet] - 10https://gerrit.wikimedia.org/r/1010166 (owner: 10Andrew Bogott)
[11:49:57] <wikibugs>	 (03PS1) 10Majavah: hieradata: WMCS: try to evict Puppet cache after more operations [puppet] - 10https://gerrit.wikimedia.org/r/1010168 (https://phabricator.wikimedia.org/T351450)
[11:50:57] <wikibugs>	 (03CR) 10Majavah: "This is probably fine, but can you try https://gerrit.wikimedia.org/r/c/operations/puppet/+/1010168 instead first? We manually commit/reba" [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott)
[11:51:28] <wikibugs>	 (03CR) 10Majavah: [C: 04-1] P:puppetserver: git: mark repos dirs as safe (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1010166 (owner: 10Andrew Bogott)
[12:05:21] <wikibugs>	 (03PS2) 10Andrew Bogott: P:puppetserver: git: mark repos dirs as safe [puppet] - 10https://gerrit.wikimedia.org/r/1010166
[12:05:22] <wikibugs>	 (03PS13) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455)
[12:05:24] <wikibugs>	 (03PS14) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455)
[12:06:31] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] hieradata: WMCS: try to evict Puppet cache after more operations [puppet] - 10https://gerrit.wikimedia.org/r/1010168 (https://phabricator.wikimedia.org/T351450) (owner: 10Majavah)
[12:06:31] <wikibugs>	 (03CR) 10Majavah: "I don't think we can merge this before we've upgraded the cloudwide puppetservers?" [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) (owner: 10Andrew Bogott)
[12:06:53] <wikibugs>	 (03CR) 10Majavah: [V: 03+1 C: 03+2] hieradata: WMCS: try to evict Puppet cache after more operations [puppet] - 10https://gerrit.wikimedia.org/r/1010168 (https://phabricator.wikimedia.org/T351450) (owner: 10Majavah)
[12:07:01] <wikibugs>	 (03CR) 10Andrew Bogott: P:puppetserver: git: mark repos dirs as safe (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1010166 (owner: 10Andrew Bogott)
[12:07:08] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2024-03-11-120258-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010169 (https://phabricator.wikimedia.org/T350773)
[12:07:46] <wikibugs>	 (03PS1) 10Majavah: hieradata: update striker to 2024-03-11-120408-production [puppet] - 10https://gerrit.wikimedia.org/r/1010171
[12:07:57] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] P:puppetserver: git: mark repos dirs as safe [puppet] - 10https://gerrit.wikimedia.org/r/1010166 (owner: 10Andrew Bogott)
[12:09:16] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] hieradata: update striker to 2024-03-11-120408-production [puppet] - 10https://gerrit.wikimedia.org/r/1010171 (owner: 10Majavah)
[12:14:11] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "This is fine with me; I'd rather have it removed than installed and broken." [puppet] - 10https://gerrit.wikimedia.org/r/1009350 (owner: 10Majavah)
[12:15:34] <wikibugs>	 (03CR) 10Andrew Bogott: "hm, definitely didn't mean to +1 myself" [puppet] - 10https://gerrit.wikimedia.org/r/1010166 (owner: 10Andrew Bogott)
[12:15:48] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:puppetserver: git: mark repos dirs as safe [puppet] - 10https://gerrit.wikimedia.org/r/1010166 (owner: 10Andrew Bogott)
[12:17:58] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] Undeploy Striker from codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/1009350 (owner: 10Majavah)
[12:23:04] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1009955
[12:37:36] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for GeorgeMikesell - https://phabricator.wikimedia.org/T358922#9619690 (10SBisson) I approve but I am not @GMikesell-WMF's manager. That woul probably be @Jrbranaa
[12:39:11] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 92 probes of 734 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[12:44:11] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 68 probes of 734 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[12:50:40] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[12:50:46] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[12:52:40] <Dreamy_Jazz>	 !log Re-starting MediaModeration scanning script
[12:52:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:56:53] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] START helmfile.d/services/miscweb: apply
[12:57:22] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[12:57:23] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[12:59:09] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[12:59:10] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[12:59:40] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[13:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T1300).
[13:00:05] <jouncebot>	 Superpes, Jhs, and mvolz: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:29] <mvolz>	 o/
[13:00:56] <Superpes>	 Hi :)
[13:01:13] <wikibugs>	 (03PS2) 10Cyndywikime: Add account_conversion event streams. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/989216
[13:04:14] <wikibugs>	 (03PS3) 10Cyndywikime: Add account_conversion event streams. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/989216
[13:04:54] <wikibugs>	 (03CR) 10Cyndywikime: Add account_conversion event streams. (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/989216 (owner: 10Cyndywikime)
[13:06:13] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 102 probes of 733 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[13:08:08] <mvolz>	 Who is available to help deploy today? I think I can mostly do it on my own but I'd like someone to double check I've set everything up right before I go for it :)
[13:11:13] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 79 probes of 733 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[13:14:57] <Superpes>	 In the meantime could you please deploy my patch because I’ve to go out in some minutes…
[13:16:42] <mvolz>	 RoanKattouw, Lucas_WMDE, urbanecm, TheresNoTime - any of you around to help Superpes?
[13:18:03] <mvolz>	 Superpes: I wouldn't feel confident doing your patch, I've never done a config patch before, sorry! 
[13:18:11] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 91 probes of 733 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[13:18:24] <TheresNoTime>	 mvolz: give me 5 minutes and I'll be around
[13:19:27] <Superpes>	 Thank TheresNoTime! In case I’m not around could you please check my patch? 
[13:19:57] <Superpes>	 Just need to go on special:block on itwiki and see if there’s the “block the user talk page” option 
[13:20:06] <TheresNoTime>	 Superpes: ack, okay
[13:20:35] <mvolz>	 tnx!
[13:20:48] <TheresNoTime>	 Superpes: mvolz: I'm going to start with 1009731
[13:21:18] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by samtar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1009731 (owner: 10Superpes15)
[13:21:56] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:22:12] <wikibugs>	 (03Merged) 10jenkins-bot: [itwiki] Set 'wgBlockAllowsUTEdit' to true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1009731 (owner: 10Superpes15)
[13:22:16] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on 16 hosts with reason: Primary switchover x1 T359790
[13:22:20] <stashbot>	 T359790: Switchover x1 master (db1179 -> db1220) - https://phabricator.wikimedia.org/T359790
[13:22:30] <logmsgbot>	 !log samtar@deploy2002 Started scap: Backport for [[gerrit:1009731|[itwiki] Set 'wgBlockAllowsUTEdit' to true]]
[13:22:36] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[13:22:42] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 16 hosts with reason: Primary switchover x1 T359790
[13:22:43] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[13:23:00] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Set db1220 with weight 0 T359790', diff saved to https://phabricator.wikimedia.org/P58701 and previous config saved to /var/cache/conftool/dbconfig/20240311-132259-arnaudb.json
[13:24:35] <logmsgbot>	 !log samtar@deploy2002 superpes and samtar: Backport for [[gerrit:1009731|[itwiki] Set 'wgBlockAllowsUTEdit' to true]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:24:40] <TheresNoTime>	 Superpes: still around to test, or shall I?
[13:25:15] * TheresNoTime tests
[13:25:18] <Superpes>	 I’m going out in this moment :( please try the patch if you can
[13:25:35] <Superpes>	 Thanks :3
[13:25:40] <TheresNoTime>	 lgtm
[13:25:43] <logmsgbot>	 !log samtar@deploy2002 superpes and samtar: Continuing with sync
[13:25:52] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] slo_definitions: remove prometheus label from ml-serve definitions [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1009551 (owner: 10Elukey)
[13:27:00] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[13:27:07] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[13:29:09] <TheresNoTime>	 Jhs: your patch will be next — are you around? It's also marked WIP
[13:29:24] <TheresNoTime>	 mvolz: will you want to self-deploy? :)
[13:29:52] <Jhs>	 TheresNoTime, i'm here, yeah
[13:30:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 45.51% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[13:30:21] <mvolz>	 TheresNoTime: I'd like to give it ago. Would you double check that my patch looks okay, i.e. I've cherry picked it to the right branch etc? (After you're done with Jhs)
[13:31:03] <TheresNoTime>	 ack
[13:31:11] <wikibugs>	 (03PS1) 10Hashar: Merge tag 'v3.7.8' into wmf/stable-3.7 [software/gerrit] (wmf/stable-3.7) - 10https://gerrit.wikimedia.org/r/1010189 (https://phabricator.wikimedia.org/T359819)
[13:31:49] <wikibugs>	 (03PS1) 10Jon Harald Søby: nnwiki: Enable sandbox link [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010155 (https://phabricator.wikimedia.org/T359788)
[13:32:04] <wikibugs>	 (03CR) 10Arnaudb: [C: 03+2] mariadb: Promote db1220 to x1 master [puppet] - 10https://gerrit.wikimedia.org/r/1009949 (https://phabricator.wikimedia.org/T359790) (owner: 10Gerrit maintenance bot)
[13:32:15] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[13:32:21] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[13:32:36] <wikibugs>	 (03PS2) 10Jon Harald Søby: nnwiki: Enable sandbox link [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010155 (https://phabricator.wikimedia.org/T359788)
[13:32:38] <arnaudb>	 !log Starting x1 eqiad failover from db1179 to db1220 - T359790
[13:32:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:32:42] <stashbot>	 T359790: Switchover x1 master (db1179 -> db1220) - https://phabricator.wikimedia.org/T359790
[13:33:13] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 82 probes of 734 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[13:34:06] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Promote db1220 to x1 primary T359790', diff saved to https://phabricator.wikimedia.org/P58702 and previous config saved to /var/cache/conftool/dbconfig/20240311-133405-arnaudb.json
[13:35:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 48.59% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[13:35:43] <logmsgbot>	 !log samtar@deploy2002 Finished scap: Backport for [[gerrit:1009731|[itwiki] Set 'wgBlockAllowsUTEdit' to true]] (duration: 13m 13s)
[13:36:11] * TheresNoTime tests again
[13:36:21] <TheresNoTime>	 all looks good Superpes, deployed
[13:36:26] <Dreamy_Jazz>	 !log Running `foreachwikiindblist group2.dblist extensions/MediaModeration/maintenance/scanFilesInScanTable.php --wiki=commonswiki --use-jobqueue --sleep 30 --verbose 2>&1 | tee ~/scan-files-in-scan-table-group2-sleep-30-no-render-now.txt` on a tmux session
[13:36:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:31] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depool db1179 T359790', diff saved to https://phabricator.wikimedia.org/P58703 and previous config saved to /var/cache/conftool/dbconfig/20240311-133631-arnaudb.json
[13:36:43] <TheresNoTime>	 Jhs: moving to your patch now
[13:36:51] <Jhs>	 👍 
[13:37:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 50% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[13:37:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by samtar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010155 (https://phabricator.wikimedia.org/T359788) (owner: 10Jon Harald Søby)
[13:38:18] <wikibugs>	 (03Merged) 10jenkins-bot: nnwiki: Enable sandbox link [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010155 (https://phabricator.wikimedia.org/T359788) (owner: 10Jon Harald Søby)
[13:38:33] <logmsgbot>	 !log samtar@deploy2002 Started scap: Backport for [[gerrit:1010155|nnwiki: Enable sandbox link (T359788)]]
[13:38:38] <stashbot>	 T359788: Enable wmgUseSandboxLink on nnwiki - https://phabricator.wikimedia.org/T359788
[13:40:21] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 3:00:00 on db1179.eqiad.wmnet with reason: Silence for upgrade
[13:40:24] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1179.eqiad.wmnet with reason: Silence for upgrade
[13:40:25] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] Merge tag 'v3.7.8' into wmf/stable-3.7 [software/gerrit] (wmf/stable-3.7) - 10https://gerrit.wikimedia.org/r/1010189 (https://phabricator.wikimedia.org/T359819) (owner: 10Hashar)
[13:40:51] <logmsgbot>	 !log samtar@deploy2002 jhsoby and samtar: Backport for [[gerrit:1010155|nnwiki: Enable sandbox link (T359788)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:40:55] <Jhs>	 TheresNoTime, working as it should on mwdebug2002 👍 
[13:40:55] <wikibugs>	 (03Abandoned) 10Elukey: python-webapp: update mesh and base modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/980904 (owner: 10Elukey)
[13:41:01] <TheresNoTime>	 ack
[13:41:05] <logmsgbot>	 !log samtar@deploy2002 jhsoby and samtar: Continuing with sync
[13:41:19] <wikibugs>	 (03Abandoned) 10Elukey: profile::cache::kafka::webrequest: change the JSON format [puppet] - 10https://gerrit.wikimedia.org/r/980912 (https://phabricator.wikimedia.org/T346463) (owner: 10Elukey)
[13:41:54] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[13:42:00] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[13:42:08] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.reimage for host db1179.eqiad.wmnet with OS bookworm
[13:42:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 48.26% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[13:45:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 49.02% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[13:45:31] <TheresNoTime>	 (is there a reason for ^ or..?)
[13:45:54] <wikibugs>	 (03Merged) 10jenkins-bot: Merge tag 'v3.7.8' into wmf/stable-3.7 [software/gerrit] (wmf/stable-3.7) - 10https://gerrit.wikimedia.org/r/1010189 (https://phabricator.wikimedia.org/T359819) (owner: 10Hashar)
[13:46:31] <wikibugs>	 (03PS1) 10Hashar: Update Gerrit to v3.7.8 and update plugins [software/gerrit] (deploy/wmf/stable-3.7) - 10https://gerrit.wikimedia.org/r/1010192 (https://phabricator.wikimedia.org/T359819)
[13:46:56] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:47:33] <mvolz>	 TheresNoTime: who is that directed at? 
[13:47:45] <TheresNoTime>	 oh, just the channel, sorry
[13:48:05] <TheresNoTime>	 I put a note in -sre regardless :)
[13:48:10] <mvolz>	 ok, I wasn't sure of the context :). 
[13:48:48] <TheresNoTime>	 mvolz: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Citoid/+/1009740 looks perfect, so I'll ping you when I'm done with this patch and you can deploy. I'll be around if you need me :)
[13:48:56] <mvolz>	 ok great
[13:49:09] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[13:49:16] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[13:50:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 49.86% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[13:50:51] <logmsgbot>	 !log samtar@deploy2002 Finished scap: Backport for [[gerrit:1010155|nnwiki: Enable sandbox link (T359788)]] (duration: 12m 18s)
[13:50:56] <stashbot>	 T359788: Enable wmgUseSandboxLink on nnwiki - https://phabricator.wikimedia.org/T359788
[13:51:11] <TheresNoTime>	 Jhs: deployed :)
[13:51:16] <TheresNoTime>	 mvolz: all yours!
[13:51:25] <Jhs>	 TheresNoTime, thanks! lost my internet connection there for a bit, sorry
[13:51:33] <TheresNoTime>	 np!
[13:52:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 47.5% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[13:52:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.05s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:53:03] <mvolz>	 thanks! about to start 
[13:53:40] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by mvolz@deploy2002 using scap backport" [extensions/Citoid] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1009740 (https://phabricator.wikimedia.org/T359601) (owner: 10Mvolz)
[13:54:01] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
[13:56:14] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] Update Gerrit to v3.7.8 and update plugins [software/gerrit] (deploy/wmf/stable-3.7) - 10https://gerrit.wikimedia.org/r/1010192 (https://phabricator.wikimedia.org/T359819) (owner: 10Hashar)
[13:56:27] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
[13:56:49] <wikibugs>	 (03Merged) 10jenkins-bot: Update Gerrit to v3.7.8 and update plugins [software/gerrit] (deploy/wmf/stable-3.7) - 10https://gerrit.wikimedia.org/r/1010192 (https://phabricator.wikimedia.org/T359819) (owner: 10Hashar)
[13:57:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1.05s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[14:00:38] <wikibugs>	 (03Merged) 10jenkins-bot: editcheckreferenceurl: don't error when aborting the lookupPromise [extensions/Citoid] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1009740 (https://phabricator.wikimedia.org/T359601) (owner: 10Mvolz)
[14:00:54] <logmsgbot>	 !log mvolz@deploy2002 Started scap: Backport for [[gerrit:1009740|editcheckreferenceurl: don't error when aborting the lookupPromise (T359601)]]
[14:01:09] <stashbot>	 T359601: TypeError: Cannot read properties of undefined (reading 'abort') at ve.ui.CitoidInspector.performLookup - https://phabricator.wikimedia.org/T359601
[14:01:58] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[14:02:05] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[14:02:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 49.62% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[14:02:58] <logmsgbot>	 !log mvolz@deploy2002 mvolz: Backport for [[gerrit:1009740|editcheckreferenceurl: don't error when aborting the lookupPromise (T359601)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:04:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 47.58% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[14:10:16] <mvolz>	 tested, looks like the patch fixed the wedging, so continuing.
[14:10:21] <logmsgbot>	 !log mvolz@deploy2002 mvolz: Continuing with sync
[14:10:30] <TheresNoTime>	 :)
[14:13:51] <hashar>	 when you are done with the backport window, I will upgrade Gerrit
[14:14:01] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes1011 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[14:15:43] <wikibugs>	 (03PS1) 10Elukey: Remove unecessary regexes from Lift Wing metrics [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1010193
[14:17:00] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1179.eqiad.wmnet with OS bookworm
[14:19:46] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P58704 and previous config saved to /var/cache/conftool/dbconfig/20240311-141945-arnaudb.json
[14:20:15] <logmsgbot>	 !log mvolz@deploy2002 Finished scap: Backport for [[gerrit:1009740|editcheckreferenceurl: don't error when aborting the lookupPromise (T359601)]] (duration: 19m 20s)
[14:20:19] <stashbot>	 T359601: TypeError: Cannot read properties of undefined (reading 'abort') at ve.ui.CitoidInspector.performLookup - https://phabricator.wikimedia.org/T359601
[14:21:44] <wikibugs>	 (03PS1) 10Elukey: Remove response_code label from totals in Lift Wing Availability SLOs [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1010196
[14:23:04] <mvolz>	 I am done! Thanks for the hand-holding, that was my first "solo" mediawiki backport :). 
[14:24:15] <mvolz>	 hashar: you're up!
[14:24:28] <hashar>	 great :)
[14:24:39] <hashar>	 mvolz: and congratulations for the backport deployment!
[14:27:48] <logmsgbot>	 !log hashar@deploy2002 Started deploy [gerrit/gerrit@737c475]: Gerrit to 3.7.8 on gerrit2002
[14:27:51] <logmsgbot>	 !log hashar@deploy2002 Finished deploy [gerrit/gerrit@737c475]: Gerrit to 3.7.8 on gerrit2002 (duration: 00m 03s)
[14:28:57] <hashar>	 I forgot to poke T359819
[14:28:57] <stashbot>	 T359819: Upgrade to Gerrit 3.7.8 - https://phabricator.wikimedia.org/T359819
[14:30:16] <TheresNoTime>	 mvolz: congrats! ^^
[14:31:21] <logmsgbot>	 !log hashar@deploy2002 Started deploy [gerrit/gerrit@2150230]: Gerrit to 3.7.8 on gerrit2002 - T359819
[14:31:28] <logmsgbot>	 !log hashar@deploy2002 Finished deploy [gerrit/gerrit@2150230]: Gerrit to 3.7.8 on gerrit2002 - T359819 (duration: 00m 07s)
[14:31:37] * hashar whistles about forgetting `git rebase` on the deployment server
[14:31:42] <wikibugs>	 (03PS2) 10Elukey: Remove response_code label from totals in Lift Wing Availability SLOs [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1010196
[14:33:07] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: toggle notifications for db2205 [puppet] - 10https://gerrit.wikimedia.org/r/1009956 (https://phabricator.wikimedia.org/T355422)
[14:33:42] * hashar whistles about forgetting `git rebase` on the deployment server
[14:34:20] <wikibugs>	 (03PS2) 10Arnaudb: mariadb: toggle notifications for db2205/6/8 [puppet] - 10https://gerrit.wikimedia.org/r/1009956 (https://phabricator.wikimedia.org/T355422)
[14:34:33] <hashar>	 arnaudb: I am now upgrading Gerrit :D
[14:34:46] <wikibugs>	 (03PS2) 10Elukey: Remove unecessary regexes from Lift Wing metrics [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1010193
[14:34:46] <wikibugs>	 (03PS3) 10Elukey: Remove response_code label from totals in Lift Wing Availability SLOs [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1010196
[14:34:52] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 2%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P58705 and previous config saved to /var/cache/conftool/dbconfig/20240311-143451-arnaudb.json
[14:35:04] <logmsgbot>	 !log hashar@deploy2002 Started deploy [gerrit/gerrit@2150230]: Gerrit to 3.7.8 on gerrit1003 - T359819
[14:35:09] <stashbot>	 T359819: Upgrade to Gerrit 3.7.8 - https://phabricator.wikimedia.org/T359819
[14:35:14] <logmsgbot>	 !log hashar@deploy2002 Finished deploy [gerrit/gerrit@2150230]: Gerrit to 3.7.8 on gerrit1003 - T359819 (duration: 00m 10s)
[14:35:25] <arnaudb>	 🤞
[14:35:34] <James_F>	 Aha, it's planned.
[14:35:54] <hashar>	 here is my monitoring assistant :)
[14:36:01] * James_F was code-reviewing. :-P
[14:36:16] <James_F>	 Aka I was awake.
[14:36:31] <hashar>	 my bad, I should have announced it earlier today before my lunch
[14:36:57] <James_F>	 No worries.
[14:37:14] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:37:55] <hashar>	 I think it worked
[14:38:08] <James_F>	 It's back up.
[14:38:17] <James_F>	 Whether or not it works, we'll see.
[14:38:31] <jinxer-wm>	 (ProbeDown) firing: (2) Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit1003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:38:42] <jinxer-wm>	 (ProbeDown) firing: (2) Service gerrit1003:29418 has failed probes (tcp_gerrit_ssh_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#gerrit1003:29418 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:42:35] <hashar>	 ^ lies
[14:42:51] <hashar>	 my guess is the probe is lagging
[14:43:31] <jinxer-wm>	 (ProbeDown) resolved: (2) Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit1003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:43:42] <jinxer-wm>	 (ProbeDown) resolved: (2) Service gerrit1003:29418 has failed probes (tcp_gerrit_ssh_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#gerrit1003:29418 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:44:01] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes1011 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[14:49:26] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: provisionning db2209.codfw.wmnet - T355422
[14:49:31] <stashbot>	 T355422: Productionize db2196-db2220 - https://phabricator.wikimedia.org/T355422
[14:49:40] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: provisionning db2209.codfw.wmnet - T355422
[14:49:44] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2209.codfw.wmnet with reason: provisionning db2209.codfw.wmnet - T355422
[14:49:47] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2209.codfw.wmnet with reason: provisionning db2209.codfw.wmnet - T355422
[14:50:11] <wikibugs>	 (03PS7) 10SBassett: Remove X-Webkit-CSP-Report-Only response header from foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1003108 (https://phabricator.wikimedia.org/T357479) (owner: 10TheDJ)
[14:50:19] <wikibugs>	 (03CR) 10Jgiannelos: "Indeed node17 introduced a change on how DNS resolution works (verbatim=True by default) [1]. This means that it might be the case ipv6 ge" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1007959 (https://phabricator.wikimedia.org/T358017) (owner: 10Sbailey)
[14:50:37] <icinga-wm>	 RECOVERY - Kafka broker TLS certificate validity on kafka-logging1003 is OK: SSL OK - Certificate kafka-logging1003.eqiad.wmnet valid until 2025-03-03 12:57:00 +0000 (expires in 356 days) https://wikitech.wikimedia.org/wiki/Kafka/Administration%23Renew_TLS_certificate
[14:51:02] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Cloning db2109 in db2209 for T355422', diff saved to https://phabricator.wikimedia.org/P58706 and previous config saved to /var/cache/conftool/dbconfig/20240311-145102-arnaudb.json
[14:51:12] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 4%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P58707 and previous config saved to /var/cache/conftool/dbconfig/20240311-145111-arnaudb.json
[14:51:36] <wikibugs>	 (03PS4) 10Elukey: Remove response_code label from totals in Lift Wing Availability SLOs [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1010196
[14:52:04] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.clone Will create a clone of db2109.codfw.wmnet onto db2209.codfw.wmnet
[14:52:54] <wikibugs>	 (03PS1) 10KartikMistry: Enable Content/Section translation on some Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010226 (https://phabricator.wikimedia.org/T353510)
[14:54:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 48.26% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[14:54:43] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: provisionning db2210.codfw.wmnet - T355422
[14:54:47] <stashbot>	 T355422: Productionize db2196-db2220 - https://phabricator.wikimedia.org/T355422
[14:54:57] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: provisionning db2210.codfw.wmnet - T355422
[14:55:00] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: provisionning db2210.codfw.wmnet - T355422
[14:55:03] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: provisionning db2210.codfw.wmnet - T355422
[14:56:04] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Cloning db2110 in db2210 for T355422', diff saved to https://phabricator.wikimedia.org/P58708 and previous config saved to /var/cache/conftool/dbconfig/20240311-145604-arnaudb.json
[14:57:02] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.clone Will create a clone of db2110.codfw.wmnet onto db2210.codfw.wmnet
[14:57:14] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:58:31] <wikibugs>	 (03PS5) 10Elukey: Remove response_code label from totals in Lift Wing Availability SLOs [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1010196
[14:59:06] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: provisionning db2211.codfw.wmnet - T355422
[14:59:20] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: provisionning db2211.codfw.wmnet - T355422
[14:59:23] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: provisionning db2211.codfw.wmnet - T355422
[14:59:26] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: provisionning db2211.codfw.wmnet - T355422
[15:00:13] <icinga-wm>	 PROBLEM - Uncommitted dbctl configuration changes- check dbctl config diff on cumin2002 is CRITICAL: CRITICAL - Uncommitted dbctl configuration changes, check dbctl config diff https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs
[15:00:26] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Cloning db2111 in db2211 for T355422', diff saved to https://phabricator.wikimedia.org/P58709 and previous config saved to /var/cache/conftool/dbconfig/20240311-150025-arnaudb.json
[15:00:36] <stashbot>	 T355422: Productionize db2196-db2220 - https://phabricator.wikimedia.org/T355422
[15:01:36] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.clone Will create a clone of db2111.codfw.wmnet onto db2211.codfw.wmnet
[15:03:44] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: toggle notifications for db2209/10/11 [puppet] - 10https://gerrit.wikimedia.org/r/1010246 (https://phabricator.wikimedia.org/T355422)
[15:05:15] <icinga-wm>	 RECOVERY - Uncommitted dbctl configuration changes- check dbctl config diff on cumin2002 is OK: OK - no diffs https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs
[15:06:18] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 8%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P58710 and previous config saved to /var/cache/conftool/dbconfig/20240311-150617-arnaudb.json
[15:21:23] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 16%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P58711 and previous config saved to /var/cache/conftool/dbconfig/20240311-152123-arnaudb.json
[15:30:05] <jouncebot>	 jan_drewniak: #bothumor My software never has bugs. It just develops random features. Rise for Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T1530).
[15:30:59] <jnuche>	 jouncebot: nowandnext
[15:30:59] <jouncebot>	 For the next 0 hour(s) and 29 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T1530)
[15:30:59] <jouncebot>	 In 1 hour(s) and 29 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T1700)
[15:30:59] <jouncebot>	 In 1 hour(s) and 29 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T1700)
[15:33:34] <Daimona>	 !log T357007 Running mwscript CampaignEvents:GenerateInvitationList --wiki=metawiki --listfile=/home/daimona/list.txt
[15:33:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:39] <stashbot>	 T357007: Generate Invitation Lists for Event Organizers - https://phabricator.wikimedia.org/T357007
[15:35:57] <logmsgbot>	 !log jnuche@deploy2002 Installing scap version "4.71.0" for 376 hosts
[15:36:29] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 32%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P58712 and previous config saved to /var/cache/conftool/dbconfig/20240311-153628-arnaudb.json
[15:36:53] <logmsgbot>	 !log jnuche@deploy2002 Installation of scap version "4.71.0" completed for 376 hosts
[15:39:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 49.92% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:41:25] <jinxer-wm>	 (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:42:25] <jinxer-wm>	 (SystemdUnitFailed) firing: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:44:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 49.92% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:45:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 49.62% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:46:05] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[15:49:30] <jinxer-wm>	 (PHPFPMTooBusy) resolved: (2) Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 47.95% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:49:54] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2111.codfw.wmnet onto db2211.codfw.wmnet
[15:51:34] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P58713 and previous config saved to /var/cache/conftool/dbconfig/20240311-155134-arnaudb.json
[16:06:40] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P58714 and previous config saved to /var/cache/conftool/dbconfig/20240311-160639-arnaudb.json
[16:08:35] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64605/IPv4: Idle - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:11:17] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 100 probes of 734 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:16:17] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 84 probes of 734 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:21:21] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to mwmaint for rkhan / Himejijo - https://phabricator.wikimedia.org/T359490#9620985 (10Himejijo)
[16:21:46] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P58715 and previous config saved to /var/cache/conftool/dbconfig/20240311-162145-arnaudb.json
[16:26:05] <wikibugs>	 (03PS6) 10MdsShakil: Add `suppressredirect` right to pagemover and filemover user groups in azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1009729 (https://phabricator.wikimedia.org/T359614)
[16:33:49] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2109.codfw.wmnet onto db2209.codfw.wmnet
[16:38:09] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to mwmaint for rkhan / Himejijo - https://phabricator.wikimedia.org/T359490#9621054 (10Himejijo) Can I just edit this ticket?
[16:42:25] <jinxer-wm>	 (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:46:05] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[16:54:28] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[16:54:34] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[17:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T1700)
[17:00:05] <jouncebot>	 ryankemper: Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T1700). Please do the needful.
[17:04:22] <elukey>	 hashar: o/ if you have time (even tomorrow) - https://gerrit.wikimedia.org/r/c/integration/config/+/1009218
[17:11:32] <wikibugs>	 (03CR) 10Krinkle: Support cookies in XWikimediaDebug (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1000307 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza)
[17:18:44] <wikibugs>	 (03CR) 10Krinkle: Support cookies in XWikimediaDebug (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1000307 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza)
[17:20:49] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: WIP - httpbb: add ores-legacy tests [puppet] - 10https://gerrit.wikimedia.org/r/1010245 (https://phabricator.wikimedia.org/T359871)
[17:25:51] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2110.codfw.wmnet onto db2210.codfw.wmnet
[17:29:25] <wikibugs>	 06SRE, 10ops-eqiad, 10procurement: install (2) 1.92TB SSDs from decom into prometheus100[56] - https://phabricator.wikimedia.org/T359632#9621206 (10lmata) thank you for all the help and care @Jclark-ctr and @RobH
[17:35:31] <dancy>	 elukey: I'll process it.
[17:35:55] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to mwmaint for rkhan / Himejijo - https://phabricator.wikimedia.org/T359490#9621223 (10Marostegui)
[17:39:38] <dancy>	 elukey: Done
[17:40:18] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to mwmaint for rkhan / Himejijo - https://phabricator.wikimedia.org/T359490#9621231 (10Marostegui) @thcipriani would you approve this request to mwmaint?
[17:41:25] <jinxer-wm>	 (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:43:05] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to mwmaint for rkhan / Himejijo - https://phabricator.wikimedia.org/T359490#9621241 (10thcipriani) >>! In T359490#9621230, @Marostegui wrote: > @thcipriani would you approve this request to mwmaint?  This is for `restricted`, correct? Approved from me.
[17:46:25] <jinxer-wm>	 (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:46:45] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to mwmaint for rkhan / Himejijo - https://phabricator.wikimedia.org/T359490#9621245 (10Marostegui)
[17:47:11] <jinxer-wm>	 (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:52:46] <wikibugs>	 (03PS1) 10Jforrester: Be able to disable MobileFrontend and drop the secondary domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010268 (https://phabricator.wikimedia.org/T349408)
[17:52:50] <wikibugs>	 (03PS1) 10Jforrester: [BETA CLUSTER] Disable MobileFrontend for Wikifunctions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010269 (https://phabricator.wikimedia.org/T358329)
[17:52:58] <wikibugs>	 (03PS1) 10Jforrester: [wikifunctionswiki] Disable MobileFrontend in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010270 (https://phabricator.wikimedia.org/T349408)
[17:53:18] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Data-Platform-SRE (2024.03.04 - 2024.03.24): Requesting access to kubernetes deployment for tjones - https://phabricator.wikimedia.org/T359092#9621295 (10thcipriani) >>! In T359092#9599307, @Marostegui wrote: > @thcipriani can you approve this request for the deployment group?...
[18:23:31] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to mwmaint for rkhan / Himejijo - https://phabricator.wikimedia.org/T359490#9621355 (10Himejijo)
[18:26:54] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[18:27:00] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[18:38:53] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:39:05] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:40:33] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:44:07] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[18:44:13] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[18:45:33] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Mon 15 Apr 2024 02:06:19 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:45:49] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.231 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:45:59] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51594 bytes in 0.078 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:48:44] <wikibugs>	 (03PS1) 10Jdlrobson: Disable special pages on a per name basis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010286 (https://phabricator.wikimedia.org/T359183)
[18:50:05] <wikibugs>	 (03PS2) 10Jdlrobson: Disable special pages on a per name basis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010286 (https://phabricator.wikimedia.org/T359183)
[18:51:49] <wikibugs>	 (03PS1) 10Jdlrobson: Interaction to Next Paint (INP) Core Web Vital Improvement [skins/Vector] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010215 (https://phabricator.wikimedia.org/T358380)
[18:56:02] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[18:56:08] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[18:57:14] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:34:23] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: toggle notifications for db2209/10/11 [puppet] - 10https://gerrit.wikimedia.org/r/1010246 (https://phabricator.wikimedia.org/T355422) (owner: 10Arnaudb)
[19:34:47] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: toggle notifications for db2205/6/8 [puppet] - 10https://gerrit.wikimedia.org/r/1009956 (https://phabricator.wikimedia.org/T355422) (owner: 10Arnaudb)
[19:35:06] <wikibugs>	 (03CR) 10Mabualruz: [C: 03+1] Disable special pages on a per name basis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010286 (https://phabricator.wikimedia.org/T359183) (owner: 10Jdlrobson)
[19:37:37] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[19:37:44] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[19:58:13] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[19:58:19] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:00:04] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: May I have your attention please! UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T2000)
[20:00:05] <jouncebot>	 Jdlrobson: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:16] <urbanecm>	 oh, the late window is early this time
[20:00:19] <urbanecm>	 i can deploy today :)
[20:00:26] <urbanecm>	 Jdlrobson: around?
[20:00:35] <wikibugs>	 (03PS3) 10Jdlrobson: Disable special pages on a per name basis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010286 (https://phabricator.wikimedia.org/T359183)
[20:00:37] <Jdlrobson>	 urbanecm: yep
[20:00:45] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Interaction to Next Paint (INP) Core Web Vital Improvement [skins/Vector] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010215 (https://phabricator.wikimedia.org/T358380) (owner: 10Jdlrobson)
[20:00:48] <Jdlrobson>	 YAY CLOCK CHANGES
[20:01:41] <urbanecm>	 fortunately, i use an electronic calendar. it would be a nightmare to keep track of this via a paper one.
[20:02:06] <urbanecm>	 Jdlrobson: should i wait for the backport with the config? or is it ok to deploy the config in the meantime?
[20:03:06] <Jdlrobson>	 neither blocks each other urbanecm 
[20:03:09] <urbanecm>	 okay
[20:03:12] <Jdlrobson>	 you can do them in whatever order makes sense
[20:03:19] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Disable special pages on a per name basis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010286 (https://phabricator.wikimedia.org/T359183) (owner: 10Jdlrobson)
[20:03:32] <urbanecm>	 thanks for clarifying. just wanted to double check  :)
[20:03:44] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010286 (https://phabricator.wikimedia.org/T359183) (owner: 10Jdlrobson)
[20:04:05] <wikibugs>	 (03Merged) 10jenkins-bot: Disable special pages on a per name basis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010286 (https://phabricator.wikimedia.org/T359183) (owner: 10Jdlrobson)
[20:04:23] <logmsgbot>	 !log urbanecm@deploy2002 Started scap: Backport for [[gerrit:1010286|Disable special pages on a per name basis (T359183)]]
[20:04:27] <stashbot>	 T359183: Exclude non-functional pages from night mode - https://phabricator.wikimedia.org/T359183
[20:06:36] <logmsgbot>	 !log urbanecm@deploy2002 jdlrobson and urbanecm: Backport for [[gerrit:1010286|Disable special pages on a per name basis (T359183)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[20:06:50] <urbanecm>	 Jdlrobson: the config's at mwdebug. can you test, please? :)
[20:06:59] <Jdlrobson>	 yep on it
[20:07:34] <Jdlrobson>	 urbanecm: lgtm please sync
[20:07:38] <logmsgbot>	 !log urbanecm@deploy2002 jdlrobson and urbanecm: Continuing with sync
[20:07:41] <urbanecm>	 proceeding, thank you
[20:09:17] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[20:09:23] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:14:27] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw1367 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[20:16:33] <icinga-wm>	 PROBLEM - Disk space on centrallog1002 is CRITICAL: DISK CRITICAL - free space: /srv 52954 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=centrallog1002&var-datasource=eqiad+prometheus/ops
[20:18:06] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap: Backport for [[gerrit:1010286|Disable special pages on a per name basis (T359183)]] (duration: 13m 43s)
[20:18:10] <stashbot>	 T359183: Exclude non-functional pages from night mode - https://phabricator.wikimedia.org/T359183
[20:18:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy2002 using scap backport" [skins/Vector] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010215 (https://phabricator.wikimedia.org/T358380) (owner: 10Jdlrobson)
[20:19:47] <wikibugs>	 (03Merged) 10jenkins-bot: Interaction to Next Paint (INP) Core Web Vital Improvement [skins/Vector] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010215 (https://phabricator.wikimedia.org/T358380) (owner: 10Jdlrobson)
[20:20:01] <logmsgbot>	 !log urbanecm@deploy2002 Started scap: Backport for [[gerrit:1010215|Interaction to Next Paint (INP) Core Web Vital Improvement (T358380)]]
[20:20:08] <stashbot>	 T358380: [3 days] Interaction to Next Paint (INP) Core Web Vital is scored as "Needs Improvement" or "Poor" for Mobile users on Desktop - https://phabricator.wikimedia.org/T358380
[20:22:14] <logmsgbot>	 !log urbanecm@deploy2002 urbanecm and jdlrobson: Backport for [[gerrit:1010215|Interaction to Next Paint (INP) Core Web Vital Improvement (T358380)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[20:22:36] <urbanecm>	 Jdlrobson: can you test the backport as well, please? :)
[20:23:23] <Jdlrobson>	 urbanecm: yep lgtm
[20:23:26] <logmsgbot>	 !log urbanecm@deploy2002 urbanecm and jdlrobson: Continuing with sync
[20:23:30] <urbanecm>	 that was quick, syncing :)
[20:33:42] <Jdlrobson>	 thanks urbanecm :)
[20:33:59] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap: Backport for [[gerrit:1010215|Interaction to Next Paint (INP) Core Web Vital Improvement (T358380)]] (duration: 13m 57s)
[20:34:09] <stashbot>	 T358380: [3 days] Interaction to Next Paint (INP) Core Web Vital is scored as "Needs Improvement" or "Poor" for Mobile users on Desktop - https://phabricator.wikimedia.org/T358380
[20:36:43] <urbanecm>	 and all done :)
[20:36:45] <urbanecm>	 no problem
[20:37:16] <tgr>	 urbanecm: I have a late addition if you are finished
[20:37:36] <urbanecm>	 tgr: no problem. do you want to self-serve, or do you want me to deploy for you?
[20:38:08] <wikibugs>	 (03PS3) 10Gergő Tisza: Move checkuser grant configuration to CheckUser extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1009865 (https://phabricator.wikimedia.org/T359537)
[20:38:17] <tgr>	 thx, I can do it
[20:38:24] <urbanecm>	 ack, feel free to go ahead then :)
[20:40:05] <tgr>	 on second thought it needs to wait one more train cycle
[20:41:05] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[20:41:25] <jinxer-wm>	 (SystemdUnitFailed) firing: puppet-agent-timer.service on poolcounter2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:44:27] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw1367 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[20:46:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) httpbb_kubernetes_mw-parsoid_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:48:21] <wikibugs>	 (03PS1) 10Dwisehaupt: Update lp.email cname and validation domain [dns] - 10https://gerrit.wikimedia.org/r/1010315 (https://phabricator.wikimedia.org/T336000)
[20:49:20] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Update lp.email cname and validation domain [dns] - 10https://gerrit.wikimedia.org/r/1010315 (https://phabricator.wikimedia.org/T336000) (owner: 10Dwisehaupt)
[20:51:15] <rzl>	 I'm "live testing" one step of the switchdc cookbook -- it'll only touch eqiad (currently the read-only DC) so no production impact
[20:51:28] <logmsgbot>	 !log rzl@cumin2002 START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
[20:51:44] <logmsgbot>	 !log rzl@cumin2002 END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
[20:52:20] <rzl>	 done 👍
[20:55:11] <wikibugs>	 (03CR) 10Dwisehaupt: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/1010315 (https://phabricator.wikimedia.org/T336000) (owner: 10Dwisehaupt)
[20:56:05] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Update lp.email cname and validation domain [dns] - 10https://gerrit.wikimedia.org/r/1010315 (https://phabricator.wikimedia.org/T336000) (owner: 10Dwisehaupt)
[21:00:05] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240311T2100). Please do the needful.
[21:01:19] <wikibugs>	 (03PS2) 10Dwisehaupt: Update lp.email cname and validation domain [dns] - 10https://gerrit.wikimedia.org/r/1010315 (https://phabricator.wikimedia.org/T336000)
[21:01:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) httpbb_kubernetes_mw-parsoid_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:08:51] <wikibugs>	 (03PS3) 10Dwisehaupt: Update lp.email cname and validation domain [dns] - 10https://gerrit.wikimedia.org/r/1010315 (https://phabricator.wikimedia.org/T336000)
[21:11:45] <wikibugs>	 (03CR) 10Jgreen: [C: 03+2] Update lp.email cname and validation domain [dns] - 10https://gerrit.wikimedia.org/r/1010315 (https://phabricator.wikimedia.org/T336000) (owner: 10Dwisehaupt)
[21:12:05] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[21:12:09] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[21:13:57] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.328 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[21:13:59] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51594 bytes in 0.112 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[21:22:00] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[21:22:07] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[21:41:05] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[21:41:25] <jinxer-wm>	 (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-parsoid_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:47:10] <jinxer-wm>	 (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:47:11] <jinxer-wm>	 (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:31:33] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] "Untested but LGTM. I suggest whoever merges it, perhaps runs it first and/or shortly afterwards to confirm just in case that the dums stil" [puppet] - 10https://gerrit.wikimedia.org/r/1009784 (https://phabricator.wikimedia.org/T99268) (owner: 10Ahmon Dancy)
[22:47:29] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[22:47:36] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[22:53:26] <wikibugs>	 (03PS3) 10Gergő Tisza: Support cookies in XWikimediaDebug [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1000307 (https://phabricator.wikimedia.org/T350094)
[22:53:31] <wikibugs>	 (03CR) 10Gergő Tisza: Support cookies in XWikimediaDebug (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1000307 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza)
[22:57:29] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[23:14:47] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[23:14:54] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[23:17:57] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[23:18:03] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[23:25:50] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[23:25:56] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[23:37:24] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[23:37:30] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[23:46:25] <jinxer-wm>	 (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:50:25] <jinxer-wm>	 (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:52:24] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[23:52:30] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply