[00:00:55] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1127 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:13:21] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1127 is CRITICAL: CRITICAL - degraded: The following units failed: user-runtime-dir@116.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:17:53] <icinga-wm>	 PROBLEM - Check systemd state on lists1001 is CRITICAL: CRITICAL - degraded: The following units failed: puppet-agent-timer.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:24:26] <wikibugs>	 10SRE, 10Performance-Team, 10Traffic, 10serviceops: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling)
[00:27:41] <wikibugs>	 10SRE, 10Performance-Team, 10Traffic, 10serviceops: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling)
[00:34:54] <icinga-wm>	 RECOVERY - Check systemd state on lists1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:40:13] <wikibugs>	 10SRE, 10Performance-Team, 10Traffic, 10serviceops: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling)
[00:42:07] <icinga-wm>	 RECOVERY - Check systemd state on logstash1026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:43:09] <icinga-wm>	 RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:54:19] <wikibugs>	 10SRE, 10Performance-Team, 10Traffic, 10serviceops: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling)
[01:14:44] <Tamzin>	 some slow pageloads and 503s
[01:14:45] <perryprog>	 From #-tech:
[01:14:45] <perryprog>	 Dragonfly6-7 I'm trying to create a page on Commons and I've twice gotten the error message:
[01:14:46] <perryprog>	 [21:14:01] Dragonfly6-7 upstream connect error or disconnect/reset before headers. reset reason: overflow
[01:14:46] <perryprog>	 [21:14:21] Dragonfly6-7 thrice now
[01:14:55] <Tamzin>	 ha, beat you by a second
[01:14:59] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1127 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:15:00] <perryprog>	 :(. Want to page?
[01:15:27] <Tamzin>	 above my paygrade. hey TheresNoTime you're online
[01:15:34] <TheresNoTime>	 I am
[01:15:35] <jinxer-wm>	 (FrontendUnavailable) firing: HAProxy (cache_text) has reduced HTTP availability #page - TODO - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[01:15:39] <perryprog>	 jinxer-wm wins
[01:15:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[01:16:07] <rzl>	 hi
[01:16:11] <TheresNoTime>	 Call me a book, cos I got pages
[01:16:18] <jinxer-wm>	 (ProbeDown) firing: (5) Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:16:18] <jinxer-wm>	 (ProbeDown) firing: (22) Service appservers-https:443 has failed probes (http_appservers-https_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:16:34] <jinxer-wm>	 (FrontendUnavailable) firing: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[01:17:00] <TheresNoTime>	 Monitoring was a bit slow on that one to be honest, I noticed timeouts before jinxer piped up
[01:17:29] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on alert1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[01:19:59] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[01:20:35] <jinxer-wm>	 (FrontendUnavailable) resolved: HAProxy (cache_text) has reduced HTTP availability #page - TODO - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[01:20:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[01:21:18] <jinxer-wm>	 (ProbeDown) resolved: (5) Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:21:18] <jinxer-wm>	 (ProbeDown) resolved: (22) Service appservers-https:443 has failed probes (http_appservers-https_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:21:34] <jinxer-wm>	 (FrontendUnavailable) resolved: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[01:21:46] <rzl>	 this should be resolved, still digging a little but speak up if you still see errors or slowness :)
[01:22:27] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1127 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service,user-runtime-dir@116.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:34:59] <icinga-wm>	 PROBLEM - SSH on db1109.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:37:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:42:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:45:03] <AntiComposite>	 rzl, it's back
[01:45:27] <rzl>	 thanks
[01:46:00] <Kizule>	 Hello, is there any kind of maintenance on MediaWiki.org?
[01:46:14] <Kizule>	 I'm asking because when I want to go there, I get this:
[01:46:15] <Kizule>	 upstream connect error or disconnect/reset before headers. reset reason: overflow
[01:46:18] <jinxer-wm>	 (ProbeDown) firing: (21) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:46:20] <Vermont>	 yes, it's everything
[01:46:35] <jinxer-wm>	 (FrontendUnavailable) firing: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[01:46:35] <jinxer-wm>	 (FrontendUnavailable) firing: HAProxy (cache_text) has reduced HTTP availability #page - TODO - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[01:46:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[01:46:55] <Kizule>	 Vermont: You think like Wikipedia and on all other projects?
[01:47:18] <jinxer-wm>	 (ProbeDown) firing: (22) Service appservers-https:443 has failed probes (http_appservers-https_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:48:17] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle php7.2-fpm.service workers for Mediawiki appserver at eqiad #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=54&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=appserver - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[01:48:47] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 129 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[01:51:03] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[01:51:18] <jinxer-wm>	 (ProbeDown) resolved: (22) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:51:35] <jinxer-wm>	 (FrontendUnavailable) resolved: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[01:51:35] <jinxer-wm>	 (FrontendUnavailable) resolved: HAProxy (cache_text) has reduced HTTP availability #page - TODO - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[01:51:42] <rzl>	 sorry for the trouble :) everything should be recovered now, let us know if you're still having problems
[01:51:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[01:52:18] <jinxer-wm>	 (ProbeDown) resolved: (22) Service appservers-https:443 has failed probes (http_appservers-https_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:52:45] <jinxer-wm>	 (JobUnavailable) resolved: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:53:17] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle php7.2-fpm.service workers for Mediawiki appserver at eqiad #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=54&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=appserver - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[02:36:25] <icinga-wm>	 RECOVERY - SSH on db1109.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:52:21] <icinga-wm>	 PROBLEM - Check systemd state on netbox1002 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_report_accounting_run.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:03:37] <icinga-wm>	 PROBLEM - puppet last run on an-worker1127 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[03:05:25] <icinga-wm>	 RECOVERY - SSH on mw1321.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:14:47] <icinga-wm>	 RECOVERY - Check systemd state on netbox1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:36:17] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:05:43] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:37:41] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:38:26] <wikibugs>	 10SRE, 10SRE-OnFire, 10Sustainability (Incident Followup): Klaxon redirects to http://klaxon.wikimedia.org (not https) - https://phabricator.wikimedia.org/T308941 (10lmata)
[05:00:01] <icinga-wm>	 PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:07:09] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:14:33] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[05:17:03] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 7 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[05:21:16] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[05:21:29] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[05:21:36] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db2079.codfw.wmnet with reason: Maintenance
[05:21:49] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2079.codfw.wmnet with reason: Maintenance
[05:21:51] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 4:00:00 on 15 hosts with reason: Maintenance
[05:22:13] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 15 hosts with reason: Maintenance
[05:22:16] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[05:22:30] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[05:22:32] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1111.eqiad.wmnet with reason: Maintenance
[05:22:46] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1111.eqiad.wmnet with reason: Maintenance
[05:22:51] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1111 (T313070)', diff saved to https://phabricator.wikimedia.org/P31257 and previous config saved to /var/cache/conftool/dbconfig/20220718-052250-marostegui.json
[05:22:54] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[05:23:30] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Remove db2082 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/814400 (https://phabricator.wikimedia.org/T313003)
[05:25:11] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] instances.yaml: Remove db2082 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/814400 (https://phabricator.wikimedia.org/T313003) (owner: 10Marostegui)
[05:26:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove db2082 T313003', diff saved to https://phabricator.wikimedia.org/P31258 and previous config saved to /var/cache/conftool/dbconfig/20220718-052605-marostegui.json
[05:26:09] <stashbot>	 T313003: decommission db2082 - https://phabricator.wikimedia.org/T313003
[05:26:16] <wikibugs>	 10SRE, 10Observability-Metrics, 10User-fgiunchedi: Programmatic generation of grafana dashboards - https://phabricator.wikimedia.org/T171482 (10lmata) 05Open→03Declined this is superseded by grizzly, which is already in production for SLO dashboarding
[05:26:45] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Remove db2082 from puppet [puppet] - 10https://gerrit.wikimedia.org/r/814510 (https://phabricator.wikimedia.org/T313003)
[05:34:01] <wikibugs>	 10SRE, 10SRE-OnFire, 10Shellbox, 10serviceops, 10Sustainability (Incident Followup): Shellbox resource management - https://phabricator.wikimedia.org/T310557 (10Legoktm) https://gerrit.wikimedia.org/r/c/mediawiki/extensions/SyntaxHighlight_GeSHi/+/812911 is related I believe.
[05:36:06] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission for hosts db2082.codfw.wmnet
[05:39:50] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.dns.netbox
[05:43:57] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[05:44:09] <wikibugs>	 10SRE-OnFire, 10DBA, 10Sustainability (Incident Followup): Investigate mariadb 10.6 performance regression during spikes/high load - https://phabricator.wikimedia.org/T311106 (10Marostegui) db1111 had issues a few hours ago and it had performance schema disabled: ` mysql:root@localhost [(none)]> show global...
[05:44:59] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Remove db2082 from puppet [puppet] - 10https://gerrit.wikimedia.org/r/814510 (https://phabricator.wikimedia.org/T313003) (owner: 10Marostegui)
[05:46:19] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2082.codfw.wmnet
[05:46:52] <wikibugs>	 10ops-codfw, 10decommission-hardware: decommission db2082 - https://phabricator.wikimedia.org/T313003 (10Marostegui) a:03Papaul
[05:46:58] <wikibugs>	 10ops-codfw, 10decommission-hardware: decommission db2082 - https://phabricator.wikimedia.org/T313003 (10Marostegui) @Papaul host ready for you
[05:48:12] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add db2166 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/814572 (https://phabricator.wikimedia.org/T311493)
[05:49:11] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] instances.yaml: Add db2166 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/814572 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[05:50:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db2166 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P31259 and previous config saved to /var/cache/conftool/dbconfig/20220718-055051-marostegui.json
[05:50:55] <stashbot>	 T311493: Productionize db2153.codfw.wmnet - db2174.codfw.wmnet - https://phabricator.wikimedia.org/T311493
[05:51:30] <wikibugs>	 (03PS1) 10Marostegui: db2166: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/814573 (https://phabricator.wikimedia.org/T311493)
[05:52:37] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2166: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/814573 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[06:15:46] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqord is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:17:28] <icinga-wm>	 PROBLEM - OSPF status on cr3-ulsfo is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:21:04] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1127 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:23:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1111 (T313070)', diff saved to https://phabricator.wikimedia.org/P31260 and previous config saved to /var/cache/conftool/dbconfig/20220718-062304-marostegui.json
[06:23:12] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[06:24:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[06:24:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[06:26:30] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[06:26:44] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[06:26:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1135 (T312984)', diff saved to https://phabricator.wikimedia.org/P31261 and previous config saved to /var/cache/conftool/dbconfig/20220718-062648-ladsgroup.json
[06:26:53] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[06:27:06] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1127 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service,user-runtime-dir@116.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:31:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312984)', diff saved to https://phabricator.wikimedia.org/P31262 and previous config saved to /var/cache/conftool/dbconfig/20220718-063155-ladsgroup.json
[06:32:01] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[06:38:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P31263 and previous config saved to /var/cache/conftool/dbconfig/20220718-063809-marostegui.json
[06:47:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P31264 and previous config saved to /var/cache/conftool/dbconfig/20220718-064700-ladsgroup.json
[06:49:05] <wikibugs>	 (03PS6) 10Ladsgroup: core.pp: Make sync_binlog and trx_commit configurable [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[06:49:34] <wikibugs>	 (03CR) 10Ladsgroup: core.pp: Make sync_binlog and trx_commit configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[06:53:15] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P31265 and previous config saved to /var/cache/conftool/dbconfig/20220718-065315-marostegui.json
[07:00:05] <jouncebot>	 Amir1 and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220718T0700).
[07:00:05] <jouncebot>	 kart_: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:00:24] <Amir1>	 you can self-serve I assume?
[07:00:50] * kart_ is here.
[07:00:57] <kart_>	 Amir1: Self deploy :)
[07:01:58] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Enable Content and Section translation on WPs with NLLB-200 MT support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814015 (https://phabricator.wikimedia.org/T309384) (owner: 10KartikMistry)
[07:02:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P31266 and previous config saved to /var/cache/conftool/dbconfig/20220718-070205-ladsgroup.json
[07:02:50] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Content and Section translation on WPs with NLLB-200 MT support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814015 (https://phabricator.wikimedia.org/T309384) (owner: 10KartikMistry)
[07:03:08] <icinga-wm>	 RECOVERY - OSPF status on cr3-ulsfo is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:05:02] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqord is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:07:48] <kart_>	 Change looks good. Deploying..
[07:07:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[07:08:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1111 (T313070)', diff saved to https://phabricator.wikimedia.org/P31267 and previous config saved to /var/cache/conftool/dbconfig/20220718-070820-marostegui.json
[07:08:22] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1172.eqiad.wmnet with reason: Maintenance
[07:08:24] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[07:08:35] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1172.eqiad.wmnet with reason: Maintenance
[07:08:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1172 (T313070)', diff saved to https://phabricator.wikimedia.org/P31268 and previous config saved to /var/cache/conftool/dbconfig/20220718-070840-marostegui.json
[07:08:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[07:08:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[07:09:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T313070)', diff saved to https://phabricator.wikimedia.org/P31269 and previous config saved to /var/cache/conftool/dbconfig/20220718-070946-marostegui.json
[07:09:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[07:10:55] <logmsgbot>	 !log kartik@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814015|Enable Content and Section translation on WPs with NLLB-200 MT support (T309384)]] (duration: 02m 53s)
[07:11:00] <stashbot>	 T309384: Enable Content and Section translation on wikipedias with new MT support from Flores - https://phabricator.wikimedia.org/T309384
[07:11:13] <kart_>	 I'm done.
[07:17:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312984)', diff saved to https://phabricator.wikimedia.org/P31270 and previous config saved to /var/cache/conftool/dbconfig/20220718-071711-ladsgroup.json
[07:17:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1132.eqiad.wmnet with reason: Maintenance
[07:17:17] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[07:17:26] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1132.eqiad.wmnet with reason: Maintenance
[07:18:44] <kart_>	 Amir1: Help needed. I can see config change I deployed showing in mwdebug1001 but not in Production. What can be possible reason(s)?
[07:19:05] <Amir1>	 kart_: forgot rebase? 
[07:19:10] <kart_>	 Amir1: ah. Cache. No worry.
[07:19:18] <Amir1>	 cool
[07:19:32] <kart_>	 No. Works fine now. We do rebase and then only test on mwdebug, right?
[07:19:42] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[07:19:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[07:20:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[07:21:02] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS bullseye
[07:21:07] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2028.codfw.wmnet with OS bullseye
[07:22:01] <wikibugs>	 (03PS2) 10Muehlenhoff: rancid: Assign SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/809626 (https://phabricator.wikimedia.org/T308013)
[07:22:06] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2103.codfw.wmnet with reason: Maintenance
[07:22:08] <Amir1>	 kart_: yeah
[07:22:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2103.codfw.wmnet with reason: Maintenance
[07:22:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on 13 hosts with reason: Maintenance
[07:22:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: Maintenance
[07:24:31] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[07:24:32] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[07:24:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[07:24:51] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31271 and previous config saved to /var/cache/conftool/dbconfig/20220718-072451-marostegui.json
[07:25:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[07:26:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] rancid: Assign SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/809626 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[07:26:52] <wikibugs>	 (03PS1) 10KartikMistry: Enable ContentTranslation out of Beta for ay, ilo, kg, ln, nso, and tn Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814706 (https://phabricator.wikimedia.org/T309384)
[07:27:00] <kart_>	 Amir1: I've one more followup patch, adding to the calendar.
[07:27:06] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[07:27:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[07:28:31] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[07:29:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] "Thanks! Merging." [puppet] - 10https://gerrit.wikimedia.org/r/813595 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[07:29:23] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1118.eqiad.wmnet with reason: Maintenance
[07:29:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1118.eqiad.wmnet with reason: Maintenance
[07:29:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1118 (T312984)', diff saved to https://phabricator.wikimedia.org/P31272 and previous config saved to /var/cache/conftool/dbconfig/20220718-072953-ladsgroup.json
[07:29:58] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[07:30:21] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] "Thanks! Merging" [puppet] - 10https://gerrit.wikimedia.org/r/813596 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[07:30:28] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: blackbox_exporter: remove un-managed module files [puppet] - 10https://gerrit.wikimedia.org/r/813654 (owner: 10Majavah)
[07:30:58] <godog>	 moritzm: merged your change too
[07:32:22] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Enable ContentTranslation out of Beta for ay, ilo, kg, ln, nso, and tn Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814706 (https://phabricator.wikimedia.org/T309384) (owner: 10KartikMistry)
[07:32:35] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] "Thanks! Merging" [puppet] - 10https://gerrit.wikimedia.org/r/813597 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[07:32:54] <moritzm>	 godog: ack, thx
[07:33:09] <wikibugs>	 (03Merged) 10jenkins-bot: Enable ContentTranslation out of Beta for ay, ilo, kg, ln, nso, and tn Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814706 (https://phabricator.wikimedia.org/T309384) (owner: 10KartikMistry)
[07:33:26] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [alerts] - 10https://gerrit.wikimedia.org/r/813274 (owner: 10David Caro)
[07:33:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1118 (T312984)', diff saved to https://phabricator.wikimedia.org/P31273 and previous config saved to /var/cache/conftool/dbconfig/20220718-073359-ladsgroup.json
[07:34:46] <wikibugs>	 (03PS1) 10David Caro: kiwix: create dest dir before rsyncing if it does not exist [puppet] - 10https://gerrit.wikimedia.org/r/814707
[07:34:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] "Thanks! Merging" [puppet] - 10https://gerrit.wikimedia.org/r/813598 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[07:35:04] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
[07:36:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] "Thanks! Merging" [puppet] - 10https://gerrit.wikimedia.org/r/813599 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[07:37:25] <icinga-wm>	 PROBLEM - etcd request latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 operation=create https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[07:38:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] "Thanks! Merging" [puppet] - 10https://gerrit.wikimedia.org/r/813600 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[07:38:35] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[07:38:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] "Thanks! Merging" [puppet] - 10https://gerrit.wikimedia.org/r/813601 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[07:38:46] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
[07:39:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31274 and previous config saved to /var/cache/conftool/dbconfig/20220718-073956-marostegui.json
[07:40:16] <wikibugs>	 (03PS2) 10David Caro: kiwix: create dest dir before rsyncing if it does not exist [puppet] - 10https://gerrit.wikimedia.org/r/814707
[07:40:23] <wikibugs>	 (03PS3) 10David Caro: kiwix: create dest dir before rsyncing if it does not exist [puppet] - 10https://gerrit.wikimedia.org/r/814707
[07:40:24] <logmsgbot>	 !log kartik@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814706|Enable ContentTranslation out of Beta for ay, ilo, kg, ln, nso, and tn Wikipedias (T309384)]] (duration: 02m 51s)
[07:40:27] <stashbot>	 T309384: Enable Content and Section translation on wikipedias with new MT support from Flores - https://phabricator.wikimedia.org/T309384
[07:40:45] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] "Thanks! Merging" [puppet] - 10https://gerrit.wikimedia.org/r/813602 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[07:41:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] profile: make loki data directory configurable [puppet] - 10https://gerrit.wikimedia.org/r/813715 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[07:41:10] <icinga-wm>	 RECOVERY - etcd request latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[07:41:16] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[07:41:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[07:41:56] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Very nice!" [puppet] - 10https://gerrit.wikimedia.org/r/813724 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[07:42:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: enable pipeline-managed index patterns [puppet] - 10https://gerrit.wikimedia.org/r/799001 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[07:42:36] <kostajh>	 Am I too late for deploying a config patch?
[07:42:52] <wikibugs>	 (03CR) 10Muehlenhoff: "There's also three manifests in modules/profile/manifests/bird, could you please also include them?" [puppet] - 10https://gerrit.wikimedia.org/r/813603 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[07:43:40] <kostajh>	 Amir1 urbanecm ^
[07:44:07] <Amir1>	 kostajh: nope, I think kart_ is done
[07:44:35] * urbanecm waves
[07:45:15] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[07:45:53] <wikibugs>	 (03PS1) 10Kosta Harlan: Structured task: Disable free text for "other" rejection reason [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814708 (https://phabricator.wikimedia.org/T304099)
[07:45:56] <kostajh>	 cool, will add to calendar and deploy, then
[07:46:20] <kart_>	 kostajh: yes. Go ahead.
[07:46:56] <wikibugs>	 (03CR) 10Filippo Giunchedi: phabricator: switch to prometheus-only network probes/checks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/812846 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[07:47:22] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2028.codfw.wmnet with OS bullseye
[07:47:25] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2028.codfw.wmnet with OS bullseye executed with errors: - ganeti2028 (**FAIL**)   - D...
[07:47:59] <kostajh>	 urbanecm: if you're around, can you glance at https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/814708 ?
[07:48:18] <urbanecm>	 Sure
[07:48:41] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "lgtm" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814708 (https://phabricator.wikimedia.org/T304099) (owner: 10Kosta Harlan)
[07:48:43] <wikibugs>	 10SRE-swift-storage, 10User-fgiunchedi: Shorten Thanos retention - https://phabricator.wikimedia.org/T311690 (10fgiunchedi) 05Open→03Stalled Space is freed now, and we are at ~73% bytes used overall. I'll stall the task and check back in 45/50 days to assess the situation again and act accordingly
[07:48:49] <urbanecm>	 Patch looks good
[07:49:03] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+2] Structured task: Disable free text for "other" rejection reason [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814708 (https://phabricator.wikimedia.org/T304099) (owner: 10Kosta Harlan)
[07:49:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P31275 and previous config saved to /var/cache/conftool/dbconfig/20220718-074904-ladsgroup.json
[07:49:07] <kostajh>	 urbanecm: thanks!
[07:49:16] <urbanecm>	 Np 
[07:49:59] <wikibugs>	 (03Merged) 10jenkins-bot: Structured task: Disable free text for "other" rejection reason [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814708 (https://phabricator.wikimedia.org/T304099) (owner: 10Kosta Harlan)
[07:50:56] <wikibugs>	 (03CR) 10David Caro: wmcs: vps: remove_instance: add support for puppet deactivation (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801784 (owner: 10Majavah)
[07:51:22] <wikibugs>	 (03PS1) 10Ladsgroup: admin: Revoke foks' production access temporarily [puppet] - 10https://gerrit.wikimedia.org/r/814709
[07:54:23] <logmsgbot>	 !log kharlan@deploy1002 Synchronized wmf-config: Config: [[gerrit:814708|Structured task: Disable free text for "other" rejection reason (T304099)]] (duration: 02m 41s)
[07:54:28] <stashbot>	 T304099: Structured tasks: temporary free text for "other" rejection reason - https://phabricator.wikimedia.org/T304099
[07:54:52] <kostajh>	 ok, I'm done
[07:55:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T313070)', diff saved to https://phabricator.wikimedia.org/P31276 and previous config saved to /var/cache/conftool/dbconfig/20220718-075501-marostegui.json
[07:55:03] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[07:55:06] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[07:55:17] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[07:55:18] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[07:55:23] <wikibugs>	 (03CR) 10ArielGlenn: "This looks ok to me, though I've not tested it. Two questions: do we really want to log every time we find it still running? Would someone" [puppet] - 10https://gerrit.wikimedia.org/r/814707 (owner: 10David Caro)
[07:55:23] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[07:55:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[07:55:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1167 (T313070)', diff saved to https://phabricator.wikimedia.org/P31277 and previous config saved to /var/cache/conftool/dbconfig/20220718-075527-marostegui.json
[07:55:31] <wikibugs>	 (03CR) 10David Caro: wmcs: toolforge: add a cookbook to remove a grid node (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801785 (https://phabricator.wikimedia.org/T309525) (owner: 10Majavah)
[07:56:21] <wikibugs>	 (03CR) 10David Caro: "LGTM, let me know if you wont to merge as is and I'll +2, thanks!" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801784 (owner: 10Majavah)
[07:56:32] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] admin: Revoke foks' production access temporarily [puppet] - 10https://gerrit.wikimedia.org/r/814709 (owner: 10Ladsgroup)
[07:56:36] <wikibugs>	 (03PS2) 10Ladsgroup: admin: Revoke foks' production access temporarily [puppet] - 10https://gerrit.wikimedia.org/r/814709
[07:56:39] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2] admin: Revoke foks' production access temporarily [puppet] - 10https://gerrit.wikimedia.org/r/814709 (owner: 10Ladsgroup)
[07:57:54] <foks>	 huh?
[07:58:05] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[07:58:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[07:59:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[08:00:31] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
[08:00:35] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
[08:00:49] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Thank you Mark for tackling this!" [alerts] - 10https://gerrit.wikimedia.org/r/812883 (https://phabricator.wikimedia.org/T312765) (owner: 10Mark Bergsma)
[08:04:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P31278 and previous config saved to /var/cache/conftool/dbconfig/20220718-080409-ladsgroup.json
[08:07:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T313070)', diff saved to https://phabricator.wikimedia.org/P31279 and previous config saved to /var/cache/conftool/dbconfig/20220718-080735-marostegui.json
[08:07:40] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[08:09:11] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:09:17] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for Leila - https://phabricator.wikimedia.org/T313134 (10Joe) 05Open→03Resolved p:05Triage→03Medium a:03Joe Hi @leila I'm a bit surprised you're not in the wmf ldap group!  I found two developer accounts that are linked to your email @wikimedia.org.n...
[08:10:02] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
[08:10:06] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
[08:11:18] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
[08:11:22] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
[08:12:17] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
[08:12:21] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
[08:13:06] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti2012.codfw.wmnet with OS bullseye
[08:13:11] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2012.codfw.wmnet with OS bullseye
[08:15:02] <wikibugs>	 10SRE-swift-storage: Uncaught TimeoutError from inactivedc_request caused swift-proxy to wedge itself - https://phabricator.wikimedia.org/T313102 (10tstarling) p:05Medium→03High Increasing priority to high since it's an accident waiting to happen.
[08:19:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1118 (T312984)', diff saved to https://phabricator.wikimedia.org/P31280 and previous config saved to /var/cache/conftool/dbconfig/20220718-081914-ladsgroup.json
[08:19:16] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[08:19:20] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[08:19:30] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[08:19:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1119 (T312984)', diff saved to https://phabricator.wikimedia.org/P31281 and previous config saved to /var/cache/conftool/dbconfig/20220718-081934-ladsgroup.json
[08:22:33] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:22:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31282 and previous config saved to /var/cache/conftool/dbconfig/20220718-082241-marostegui.json
[08:23:29] <wikibugs>	 (03PS2) 10Zabe: bird: Add SPDX headers to bird profile [puppet] - 10https://gerrit.wikimedia.org/r/813603 (https://phabricator.wikimedia.org/T308013)
[08:23:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T312984)', diff saved to https://phabricator.wikimedia.org/P31283 and previous config saved to /var/cache/conftool/dbconfig/20220718-082342-ladsgroup.json
[08:24:59] <wikibugs>	 (03CR) 10Zabe: bird: Add SPDX headers to bird profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/813603 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[08:27:57] <wikibugs>	 (03PS3) 10Majavah: dynamicproxy: urlproxy: add a simple rate limit [puppet] - 10https://gerrit.wikimedia.org/r/814193 (https://phabricator.wikimedia.org/T313131)
[08:28:31] <wikibugs>	 (03CR) 10Majavah: dynamicproxy: urlproxy: add a simple rate limit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/814193 (https://phabricator.wikimedia.org/T313131) (owner: 10Majavah)
[08:29:57] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
[08:30:19] <wikibugs>	 (03CR) 10Majavah: wmcs: vps: remove_instance: add support for puppet deactivation (032 comments) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801784 (owner: 10Majavah)
[08:33:18] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
[08:37:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31284 and previous config saved to /var/cache/conftool/dbconfig/20220718-083746-marostegui.json
[08:38:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P31285 and previous config saved to /var/cache/conftool/dbconfig/20220718-083847-ladsgroup.json
[08:41:27] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] "Thanks! Merging" [puppet] - 10https://gerrit.wikimedia.org/r/813603 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[08:42:58] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2012.codfw.wmnet with OS bullseye
[08:43:03] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2012.codfw.wmnet with OS bullseye executed with errors: - ganeti2012 (**FAIL**)   - D...
[08:45:38] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Actually run tests on type: php scaffold [deployment-charts] - 10https://gerrit.wikimedia.org/r/813843 (owner: 10JMeybohm)
[08:48:27] <wikibugs>	 (03PS1) 10Elukey: ml-services: update Docker image for editquality goodfaith [deployment-charts] - 10https://gerrit.wikimedia.org/r/814718 (https://phabricator.wikimedia.org/T301878)
[08:48:43] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] wmcs: vps: remove_instance: add support for puppet deactivation (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801784 (owner: 10Majavah)
[08:48:51] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] wmcs: toolforge: add a cookbook to remove a grid node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801785 (https://phabricator.wikimedia.org/T309525) (owner: 10Majavah)
[08:49:24] <wikibugs>	 (03Merged) 10jenkins-bot: Actually run tests on type: php scaffold [deployment-charts] - 10https://gerrit.wikimedia.org/r/813843 (owner: 10JMeybohm)
[08:52:51] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T313070)', diff saved to https://phabricator.wikimedia.org/P31286 and previous config saved to /var/cache/conftool/dbconfig/20220718-085251-marostegui.json
[08:52:53] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1109.eqiad.wmnet with reason: Maintenance
[08:52:57] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[08:53:07] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1109.eqiad.wmnet with reason: Maintenance
[08:53:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1109 (T313070)', diff saved to https://phabricator.wikimedia.org/P31287 and previous config saved to /var/cache/conftool/dbconfig/20220718-085312-marostegui.json
[08:53:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P31288 and previous config saved to /var/cache/conftool/dbconfig/20220718-085352-ladsgroup.json
[08:55:18] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109 (T313070)', diff saved to https://phabricator.wikimedia.org/P31289 and previous config saved to /var/cache/conftool/dbconfig/20220718-085518-marostegui.json
[08:55:38] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs: vps: remove_instance: add support for puppet deactivation [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801784 (owner: 10Majavah)
[08:55:40] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs: toolforge: add a cookbook to remove a grid node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801785 (https://phabricator.wikimedia.org/T309525) (owner: 10Majavah)
[08:56:11] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] ml-services: update Docker image for editquality goodfaith [deployment-charts] - 10https://gerrit.wikimedia.org/r/814718 (https://phabricator.wikimedia.org/T301878) (owner: 10Elukey)
[08:56:19] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: retry apt in provision.sh [puppet] - 10https://gerrit.wikimedia.org/r/814719
[08:56:21] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: validate host fqdn during bootstrap [puppet] - 10https://gerrit.wikimedia.org/r/814720
[08:56:23] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: support to set/override domain during provisioning [puppet] - 10https://gerrit.wikimedia.org/r/814721
[08:58:30] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
[08:59:59] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/814193 (https://phabricator.wikimedia.org/T313131) (owner: 10Majavah)
[09:03:07] <icinga-wm>	 RECOVERY - SSH on wtp1044.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:05:05] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
[09:05:10] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
[09:08:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T312984)', diff saved to https://phabricator.wikimedia.org/P31290 and previous config saved to /var/cache/conftool/dbconfig/20220718-090857-ladsgroup.json
[09:09:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[09:09:07] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[09:09:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[09:09:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31291 and previous config saved to /var/cache/conftool/dbconfig/20220718-090919-ladsgroup.json
[09:10:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31292 and previous config saved to /var/cache/conftool/dbconfig/20220718-091023-marostegui.json
[09:10:49] <wikibugs>	 (03PS1) 10Majavah: P:toolforge::proxy: raise rate limit + add hiera config [puppet] - 10https://gerrit.wikimedia.org/r/814722 (https://phabricator.wikimedia.org/T313131)
[09:12:21] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/814722 (https://phabricator.wikimedia.org/T313131) (owner: 10Majavah)
[09:13:31] * urbanecm staging at mwdebug1001
[09:13:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31293 and previous config saved to /var/cache/conftool/dbconfig/20220718-091340-ladsgroup.json
[09:14:09] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:15:20] * urbanecm done
[09:17:56] <wikibugs>	 (03CR) 10David Caro: kiwix: create dest dir before rsyncing if it does not exist (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/814707 (owner: 10David Caro)
[09:18:51] <wikibugs>	 (03PS4) 10David Caro: kiwix: create dest dir before rsyncing if it does not exist [puppet] - 10https://gerrit.wikimedia.org/r/814707
[09:19:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1111 T311106', diff saved to https://phabricator.wikimedia.org/P31295 and previous config saved to /var/cache/conftool/dbconfig/20220718-091957-root.json
[09:20:01] <stashbot>	 T311106: Investigate mariadb 10.6 performance regression during spikes/high load - https://phabricator.wikimedia.org/T311106
[09:21:14] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+1] "Giving my thumbs up (but note I have not tested it)." [puppet] - 10https://gerrit.wikimedia.org/r/814707 (owner: 10David Caro)
[09:21:23] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] mtail: fix regexes due to changes in apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/811934 (https://phabricator.wikimedia.org/T312634) (owner: 10Giuseppe Lavagetto)
[09:25:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31297 and previous config saved to /var/cache/conftool/dbconfig/20220718-092528-marostegui.json
[09:27:45] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "The config looks good, maybe let's ask to Moritz if the choice of the cumin aliases is ok or not." [puppet] - 10https://gerrit.wikimedia.org/r/813841 (https://phabricator.wikimedia.org/T310170) (owner: 10Btullis)
[09:28:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/813841 (https://phabricator.wikimedia.org/T310170) (owner: 10Btullis)
[09:28:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P31298 and previous config saved to /var/cache/conftool/dbconfig/20220718-092845-ladsgroup.json
[09:33:47] <icinga-wm>	 PROBLEM - grafana-next.wikimedia.org on grafana2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1687 bytes in 0.149 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[09:34:43] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10MoritzMuehlenhoff) This seems to happen again, today's reimages of ganeti2012 and ganeti2028 failed since the host key change didn't get properl...
[09:35:10] <wikibugs>	 (03PS1) 10Hashar: Json schema from Gerrit Java event classes [software/gerrit/plugins/events-wikimedia] - 10https://gerrit.wikimedia.org/r/814725 (https://phabricator.wikimedia.org/T304947)
[09:36:04] <wikibugs>	 (03PS1) 10David Caro: rabbit.drain_queue: Don't fail if the queue has no messages [puppet] - 10https://gerrit.wikimedia.org/r/814726
[09:36:10] <wikibugs>	 (03PS1) 10Majavah: dynamicproxy: urlproxy: enable bursting in rate limits [puppet] - 10https://gerrit.wikimedia.org/r/814727 (https://phabricator.wikimedia.org/T313131)
[09:36:13] <icinga-wm>	 RECOVERY - grafana-next.wikimedia.org on grafana2001 is OK: HTTP OK: HTTP/1.1 200 OK - 115218 bytes in 0.311 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[09:37:52] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic-Icebox, 10netops, 10User-jbond: fetch_external_clouds_vendors_nets.py fails to update DigitalOcean network ranges - https://phabricator.wikimedia.org/T313206 (10Vgutierrez)
[09:38:12] <wikibugs>	 (03CR) 10David Caro: kiwix: create dest dir before rsyncing if it does not exist (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/814707 (owner: 10David Caro)
[09:38:38] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10netops, 10User-jbond: fetch_external_clouds_vendors_nets.py fails to update DigitalOcean network ranges - https://phabricator.wikimedia.org/T313206 (10Vgutierrez) p:05Triage→03Medium
[09:39:40] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/814727 (https://phabricator.wikimedia.org/T313131) (owner: 10Majavah)
[09:40:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109 (T313070)', diff saved to https://phabricator.wikimedia.org/P31299 and previous config saved to /var/cache/conftool/dbconfig/20220718-094033-marostegui.json
[09:40:36] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[09:40:38] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[09:40:39] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[09:40:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31300 and previous config saved to /var/cache/conftool/dbconfig/20220718-094043-marostegui.json
[09:41:07] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on puppetmaster1001 is CRITICAL: CRITICAL - degraded: The following units failed: dump_cloud_ip_ranges.service Valentin Gutierrez T313206 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:41:07] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on puppetmaster2001 is CRITICAL: CRITICAL - degraded: The following units failed: dump_cloud_ip_ranges.service Valentin Gutierrez T313206 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:41:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31301 and previous config saved to /var/cache/conftool/dbconfig/20220718-094150-marostegui.json
[09:42:13] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mtail: fix regexes due to changes in apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/811934 (https://phabricator.wikimedia.org/T312634) (owner: 10Giuseppe Lavagetto)
[09:42:18] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+1] kiwix: create dest dir before rsyncing if it does not exist (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/814707 (owner: 10David Caro)
[09:43:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P31302 and previous config saved to /var/cache/conftool/dbconfig/20220718-094351-ladsgroup.json
[09:45:36] <wikibugs>	 (03CR) 10David Caro: kiwix: create dest dir before rsyncing if it does not exist (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/814707 (owner: 10David Caro)
[09:45:47] <wikibugs>	 (03CR) 10Hashar: "recheck" [software/gerrit/plugins/events-wikimedia] - 10https://gerrit.wikimedia.org/r/814725 (https://phabricator.wikimedia.org/T304947) (owner: 10Hashar)
[09:46:26] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netbox, 10netops, 10Patch-For-Review: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10Volans) >>! In T296832#8065318, @cmooney wrote: > @volans could you point me at any existing custom_facts and the cod...
[09:52:56] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Code looks nicer and simpler and I like the description diff that mentions the actual name. But I'll leave it to Cathal and you to decide " [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/805898 (https://phabricator.wikimedia.org/T310591) (owner: 10Ayounsi)
[09:53:17] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:53:51] <wikibugs>	 (03PS1) 10Ladsgroup: Add change_templatelinks_pk.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814729 (https://phabricator.wikimedia.org/T312863)
[09:56:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31303 and previous config saved to /var/cache/conftool/dbconfig/20220718-095656-marostegui.json
[09:57:24] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Add change_templatelinks_pk.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814729 (https://phabricator.wikimedia.org/T312863) (owner: 10Ladsgroup)
[09:57:40] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add change_templatelinks_pk.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814729 (https://phabricator.wikimedia.org/T312863) (owner: 10Ladsgroup)
[09:58:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31304 and previous config saved to /var/cache/conftool/dbconfig/20220718-095856-ladsgroup.json
[09:58:58] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[09:59:01] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[09:59:12] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[09:59:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31305 and previous config saved to /var/cache/conftool/dbconfig/20220718-095916-ladsgroup.json
[10:00:14] <wikibugs>	 (03Merged) 10jenkins-bot: Add change_templatelinks_pk.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814729 (https://phabricator.wikimedia.org/T312863) (owner: 10Ladsgroup)
[10:00:19] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:03:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31306 and previous config saved to /var/cache/conftool/dbconfig/20220718-100329-ladsgroup.json
[10:06:04] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] wmf-netbox: simplify interface description for circuits [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/805898 (https://phabricator.wikimedia.org/T310591) (owner: 10Ayounsi)
[10:07:08] <wikibugs>	 (03PS1) 10Majavah: P:prometheus:openstack_exporter: disable slow metrics [puppet] - 10https://gerrit.wikimedia.org/r/814738
[10:09:06] <wikibugs>	 (03CR) 10Majavah: wmcs: Add novafullstack alerts (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/813274 (owner: 10David Caro)
[10:11:55] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:12:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31307 and previous config saved to /var/cache/conftool/dbconfig/20220718-101201-marostegui.json
[10:16:49] <icinga-wm>	 PROBLEM - grafana-next.wikimedia.org on grafana2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1687 bytes in 0.148 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[10:17:20] <vgutierrez>	 uh :)
[10:18:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P31308 and previous config saved to /var/cache/conftool/dbconfig/20220718-101834-ladsgroup.json
[10:19:11] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, minor nit inline." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/812288 (https://phabricator.wikimedia.org/T296832) (owner: 10Ayounsi)
[10:20:15] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] wmcs: Add novafullstack alerts (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/813274 (owner: 10David Caro)
[10:21:47] <icinga-wm>	 RECOVERY - grafana-next.wikimedia.org on grafana2001 is OK: HTTP OK: HTTP/1.1 200 OK - 115218 bytes in 0.237 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[10:22:54] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs: Add novafullstack alerts [alerts] - 10https://gerrit.wikimedia.org/r/813274 (owner: 10David Caro)
[10:23:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
[10:23:58] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
[10:23:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
[10:24:18] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
[10:25:23] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/814738 (owner: 10Majavah)
[10:26:09] <Amir1>	 !log dbmaint on s5@codfw (T312863)
[10:26:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:13] <stashbot>	 T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863
[10:26:24] <Amir1>	 !log dbmaint on s5@eqiad (T312863)
[10:26:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:41] <Amir1>	 I forgot cebwiki is on s5, 200GB table is being altered now
[10:27:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31310 and previous config saved to /var/cache/conftool/dbconfig/20220718-102706-marostegui.json
[10:27:08] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1114.eqiad.wmnet with reason: Maintenance
[10:27:14] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[10:27:22] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1114.eqiad.wmnet with reason: Maintenance
[10:27:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1114 (T313070)', diff saved to https://phabricator.wikimedia.org/P31311 and previous config saved to /var/cache/conftool/dbconfig/20220718-102726-marostegui.json
[10:28:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114 (T313070)', diff saved to https://phabricator.wikimedia.org/P31312 and previous config saved to /var/cache/conftool/dbconfig/20220718-102832-marostegui.json
[10:29:11] <icinga-wm>	 PROBLEM - grafana-next.wikimedia.org on grafana2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1687 bytes in 0.150 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[10:29:20] <wikibugs>	 (03PS3) 10Ayounsi: Add parent support for servers interfaces creation [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/812288 (https://phabricator.wikimedia.org/T296832)
[10:29:34] <wikibugs>	 (03CR) 10Ayounsi: "Thanks!" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/812288 (https://phabricator.wikimedia.org/T296832) (owner: 10Ayounsi)
[10:30:58] <wikibugs>	 (03CR) 10Volans: "replies inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/763215 (owner: 10Jbond)
[10:31:41] <icinga-wm>	 RECOVERY - grafana-next.wikimedia.org on grafana2001 is OK: HTTP OK: HTTP/1.1 200 OK - 115218 bytes in 0.241 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[10:33:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P31313 and previous config saved to /var/cache/conftool/dbconfig/20220718-103339-ladsgroup.json
[10:33:42] <wikibugs>	 (03PS1) 10Ayounsi: Remove test_juniper_inventory_descs [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/814742 (https://phabricator.wikimedia.org/T305126)
[10:34:29] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Remove test_juniper_inventory_descs [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/814742 (https://phabricator.wikimedia.org/T305126) (owner: 10Ayounsi)
[10:35:15] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Add parent support for servers interfaces creation [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/812288 (https://phabricator.wikimedia.org/T296832) (owner: 10Ayounsi)
[10:36:19] <wikibugs>	 (03Merged) 10jenkins-bot: Add parent support for servers interfaces creation [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/812288 (https://phabricator.wikimedia.org/T296832) (owner: 10Ayounsi)
[10:43:26] <wikibugs>	 (03PS2) 10Muehlenhoff: calico: Assign SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/809624 (https://phabricator.wikimedia.org/T308013)
[10:43:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31314 and previous config saved to /var/cache/conftool/dbconfig/20220718-104337-marostegui.json
[10:46:37] <icinga-wm>	 PROBLEM - grafana-next.wikimedia.org on grafana2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1687 bytes in 0.149 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[10:46:52] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] calico: Assign SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/809624 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[10:48:40] <jbond>	 !log disable puppet fleet wide to resync db
[10:48:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:48:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31315 and previous config saved to /var/cache/conftool/dbconfig/20220718-104844-ladsgroup.json
[10:48:46] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[10:48:49] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[10:49:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[10:49:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[10:49:07] <icinga-wm>	 RECOVERY - grafana-next.wikimedia.org on grafana2001 is OK: HTTP OK: HTTP/1.1 200 OK - 115218 bytes in 0.239 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[10:49:17] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[10:49:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1106 (T312984)', diff saved to https://phabricator.wikimedia.org/P31316 and previous config saved to /var/cache/conftool/dbconfig/20220718-104921-ladsgroup.json
[10:51:03] <wikibugs>	 (03PS2) 10Ayounsi: Remove test_juniper_inventory_descs [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/814742 (https://phabricator.wikimedia.org/T305126)
[10:51:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] build_envoy_deb.sh: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/812294 (owner: 10Muehlenhoff)
[10:52:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] ganeti: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/809616 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[10:54:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312984)', diff saved to https://phabricator.wikimedia.org/P31317 and previous config saved to /var/cache/conftool/dbconfig/20220718-105411-ladsgroup.json
[10:54:16] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[10:54:22] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] "Self merging to clear the Netbox report. Feel free to do a post merge review." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/814742 (https://phabricator.wikimedia.org/T305126) (owner: 10Ayounsi)
[10:55:08] <wikibugs>	 (03Merged) 10jenkins-bot: Remove test_juniper_inventory_descs [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/814742 (https://phabricator.wikimedia.org/T305126) (owner: 10Ayounsi)
[10:55:17] <icinga-wm>	 RECOVERY - puppet last run on an-worker1127 is OK: OK: Puppet is currently disabled (re-sync postgres), not alerting. Last run 13 hours ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[10:56:06] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10jbond) >>! In T263578#8083691, @MoritzMuehlenhoff wrote: > This seems to happen again, today's reimages of ganeti2012 and ganeti2028 failed sinc...
[10:56:35] <icinga-wm>	 PROBLEM - grafana-next.wikimedia.org on grafana2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1688 bytes in 0.170 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[10:57:14] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "post-merge +1" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/814742 (https://phabricator.wikimedia.org/T305126) (owner: 10Ayounsi)
[10:58:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31318 and previous config saved to /var/cache/conftool/dbconfig/20220718-105843-marostegui.json
[11:00:34] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10Volans) For the record, the last state change of the Icinga alert that is alerting since then is `Last State Change:    2022-07-12 16:11:29`, re...
[11:01:35] <icinga-wm>	 RECOVERY - grafana-next.wikimedia.org on grafana2001 is OK: HTTP OK: HTTP/1.1 200 OK - 115218 bytes in 0.241 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[11:03:42] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Looks ok to me too." [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/813589 (https://phabricator.wikimedia.org/T304710) (owner: 10Ayounsi)
[11:08:28] <wikibugs>	 (03PS2) 10Ayounsi: Interface description: handle patch panels properly [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/813589 (https://phabricator.wikimedia.org/T304710)
[11:09:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P31319 and previous config saved to /var/cache/conftool/dbconfig/20220718-110916-ladsgroup.json
[11:10:01] <wikibugs>	 (03PS3) 10Ayounsi: Interface description: handle patch panels properly [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/813589 (https://phabricator.wikimedia.org/T304710)
[11:10:30] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] "Thanks!" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/813589 (https://phabricator.wikimedia.org/T304710) (owner: 10Ayounsi)
[11:10:47] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Interface description: handle patch panels properly [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/813589 (https://phabricator.wikimedia.org/T304710) (owner: 10Ayounsi)
[11:11:47] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 99 probes of 675 (alerts on 90) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:11:59] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 127 probes of 675 (alerts on 90) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:12:37] <wikibugs>	 (03CR) 10Volans: "Unit tests needs adapting to cover the new code." [software/homer] - 10https://gerrit.wikimedia.org/r/813604 (https://phabricator.wikimedia.org/T304710) (owner: 10Ayounsi)
[11:12:49] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 116 probes of 684 (alerts on 90) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:13:13] <wikibugs>	 (03PS1) 10Marostegui: db2085: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/814752 (https://phabricator.wikimedia.org/T311493)
[11:13:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114 (T313070)', diff saved to https://phabricator.wikimedia.org/P31322 and previous config saved to /var/cache/conftool/dbconfig/20220718-111348-marostegui.json
[11:13:51] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[11:13:55] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[11:14:05] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[11:14:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31323 and previous config saved to /var/cache/conftool/dbconfig/20220718-111409-marostegui.json
[11:14:48] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2085: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/814752 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[11:15:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31324 and previous config saved to /var/cache/conftool/dbconfig/20220718-111515-marostegui.json
[11:15:19] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 36 probes of 768 (alerts on 35) - https://atlas.ripe.net/measurements/32390538/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:16:13] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 0 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[11:16:27] <icinga-wm>	 PROBLEM - uWSGI puppetboard -http via nrpe- on puppetboard2002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 BAD GATEWAY - 275 bytes in 0.053 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/puppetboard
[11:18:08] <wikibugs>	 (03PS1) 10Marostegui: install_server: Do not reimage db216[2-7] [puppet] - 10https://gerrit.wikimedia.org/r/814760 (https://phabricator.wikimedia.org/T311493)
[11:18:19] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 89 probes of 675 (alerts on 90) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:19:03] <icinga-wm>	 RECOVERY - uWSGI puppetboard -http via nrpe- on puppetboard2002 is OK: HTTP OK: HTTP/1.1 200 OK - 67741 bytes in 3.322 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/puppetboard
[11:19:06] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage db216[2-7] [puppet] - 10https://gerrit.wikimedia.org/r/814760 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[11:19:23] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 239 probes of 682 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:21:33] <icinga-wm>	 PROBLEM - grafana-next.wikimedia.org on grafana2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1687 bytes in 0.143 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[11:21:47] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 15 probes of 768 (alerts on 35) - https://atlas.ripe.net/measurements/32390538/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:24:03] <icinga-wm>	 RECOVERY - grafana-next.wikimedia.org on grafana2001 is OK: HTTP OK: HTTP/1.1 200 OK - 115218 bytes in 0.235 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[11:24:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P31325 and previous config saved to /var/cache/conftool/dbconfig/20220718-112422-ladsgroup.json
[11:24:57] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 83 probes of 675 (alerts on 90) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:25:06] <jbond>	 !log re-enable puppet post postgresql re-sync
[11:25:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:25:49] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 68 probes of 684 (alerts on 90) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:30:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31326 and previous config saved to /var/cache/conftool/dbconfig/20220718-113020-marostegui.json
[11:32:41] <icinga-wm>	 PROBLEM - puppet last run on an-worker1127 is CRITICAL: CRITICAL: Puppet last ran 14 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[11:32:56] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS bullseye
[11:33:06] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2028.codfw.wmnet with OS bullseye
[11:37:32] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Productionize db2167 [puppet] - 10https://gerrit.wikimedia.org/r/814765 (https://phabricator.wikimedia.org/T311493)
[11:39:10] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10jbond) i have re-synced puppetdb, however we need to prevent this from happening again.  It seems we can increase the wal_keep_size but it may b...
[11:39:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312984)', diff saved to https://phabricator.wikimedia.org/P31327 and previous config saved to /var/cache/conftool/dbconfig/20220718-113927-ladsgroup.json
[11:39:29] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[11:39:33] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[11:39:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[11:39:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1134 (T312984)', diff saved to https://phabricator.wikimedia.org/P31328 and previous config saved to /var/cache/conftool/dbconfig/20220718-113947-ladsgroup.json
[11:39:55] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Productionize db2167 [puppet] - 10https://gerrit.wikimedia.org/r/814765 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[11:44:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312984)', diff saved to https://phabricator.wikimedia.org/P31329 and previous config saved to /var/cache/conftool/dbconfig/20220718-114454-ladsgroup.json
[11:45:01] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[11:45:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31330 and previous config saved to /var/cache/conftool/dbconfig/20220718-114525-marostegui.json
[11:46:13] <icinga-wm>	 PROBLEM - SSH on db1109.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:46:29] <icinga-wm>	 PROBLEM - grafana-next.wikimedia.org on grafana2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1687 bytes in 0.150 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[11:47:28] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
[11:50:25] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
[11:54:21] <icinga-wm>	 PROBLEM - Ensure hosts are not performing a change on every puppet run on cumin2002 is CRITICAL: CRITICAL: the following (25) node(s) change every puppet run: aqs2001, aqs2002, aqs2003, aqs2004, aqs2005, aqs2006, aqs2007, aqs2008, aqs2009, aqs2010, aqs2011, aqs2012, cloudservices1003, cloudservices1004, ms-fe1010, ms-fe1011, ms-fe1012, ms-fe2010, ms-fe2011, ms-fe2012, thanos-fe1002, thanos-fe1003, thanos-fe2001, thanos-fe2002, thanos-fe20
[11:54:21] <icinga-wm>	 ://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes
[11:58:27] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetdb postgress: Improve postgress standby server - https://phabricator.wikimedia.org/T313217 (10jbond)
[11:58:34] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetdb postgress: Improve postgress standby server - https://phabricator.wikimedia.org/T313217 (10jbond) p:05Triage→03Medium
[12:00:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P31331 and previous config saved to /var/cache/conftool/dbconfig/20220718-115959-ladsgroup.json
[12:00:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31332 and previous config saved to /var/cache/conftool/dbconfig/20220718-120030-marostegui.json
[12:00:32] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
[12:00:36] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[12:00:46] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
[12:00:51] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31333 and previous config saved to /var/cache/conftool/dbconfig/20220718-120051-marostegui.json
[12:01:22] <icinga-wm>	 PROBLEM - etcd request latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 operation=create https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[12:01:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31334 and previous config saved to /var/cache/conftool/dbconfig/20220718-120157-marostegui.json
[12:02:30] <wikibugs>	 (03PS1) 10Sergio Gimeno: Mentorship: enable the Vue version of the dashboard in test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814789 (https://phabricator.wikimedia.org/T300532)
[12:03:38] <icinga-wm>	 PROBLEM - Check systemd state on thanos-be2002 is CRITICAL: CRITICAL - degraded: The following units failed: swift-drive-audit.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:04:41] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2028.codfw.wmnet with OS bullseye
[12:04:46] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2028.codfw.wmnet with OS bullseye completed: - ganeti2028 (**PASS**)   - Downtimed on...
[12:04:51] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: retry apt in provision.sh [puppet] - 10https://gerrit.wikimedia.org/r/814719 (owner: 10Filippo Giunchedi)
[12:04:57] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: validate host fqdn during bootstrap [puppet] - 10https://gerrit.wikimedia.org/r/814720 (owner: 10Filippo Giunchedi)
[12:05:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: support to set/override domain during provisioning [puppet] - 10https://gerrit.wikimedia.org/r/814721 (owner: 10Filippo Giunchedi)
[12:05:23] <wikibugs>	 (03PS2) 10Filippo Giunchedi: pontoon: validate host fqdn during bootstrap [puppet] - 10https://gerrit.wikimedia.org/r/814720
[12:05:26] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2] pontoon: validate host fqdn during bootstrap [puppet] - 10https://gerrit.wikimedia.org/r/814720 (owner: 10Filippo Giunchedi)
[12:05:41] <wikibugs>	 (03PS2) 10Filippo Giunchedi: pontoon: support to set/override domain during provisioning [puppet] - 10https://gerrit.wikimedia.org/r/814721
[12:05:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2] pontoon: support to set/override domain during provisioning [puppet] - 10https://gerrit.wikimedia.org/r/814721 (owner: 10Filippo Giunchedi)
[12:09:58] <wikibugs>	 (03PS1) 10Filippo Giunchedi: aptrepo: upgrade to grafana 8.5 [puppet] - 10https://gerrit.wikimedia.org/r/814791
[12:10:11] <godog>	 I'm seeking reviewers for an easy one ^
[12:10:26] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 55 probes of 682 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[12:11:36] <icinga-wm>	 RECOVERY - etcd request latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[12:12:35] <wikibugs>	 (03PS1) 10Filippo Giunchedi: smokeping: remove sampled hosts, probed by Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/814792 (https://phabricator.wikimedia.org/T169860)
[12:12:56] <wikibugs>	 10SRE-Access-Requests: Grant Access to analytics-privatedata-users for Segun Oworu - https://phabricator.wikimedia.org/T313213 (10Aklapper) @soworu: For future reference, please add project tags so someone could find this task, and please follow the instructions pointing to https://phabricator.wikimedia.org/proj...
[12:13:00] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10MoritzMuehlenhoff) I retried the ganeti2028 reimage and everything works fine again, thanks!
[12:13:15] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti2012.codfw.wmnet with OS bullseye
[12:13:20] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2012.codfw.wmnet with OS bullseye
[12:14:24] <wikibugs>	 10SRE-Access-Requests: Grant Access to analytics-privatedata-users for Segun Oworu - https://phabricator.wikimedia.org/T313213 (10Aklapper)
[12:14:36] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/814791 (owner: 10Filippo Giunchedi)
[12:14:54] <icinga-wm>	 RECOVERY - grafana-next.wikimedia.org on grafana2001 is OK: HTTP OK: HTTP/1.1 200 OK - 115233 bytes in 0.249 second response time https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org
[12:15:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P31335 and previous config saved to /var/cache/conftool/dbconfig/20220718-121504-ladsgroup.json
[12:16:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] aptrepo: upgrade to grafana 8.5 [puppet] - 10https://gerrit.wikimedia.org/r/814791 (owner: 10Filippo Giunchedi)
[12:17:02] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31336 and previous config saved to /var/cache/conftool/dbconfig/20220718-121702-marostegui.json
[12:19:39] <wikibugs>	 10SRE, 10Cloud-Services: Update Grafana on cloudmetrics* to 8.x - https://phabricator.wikimedia.org/T313219 (10MoritzMuehlenhoff)
[12:20:10] <wikibugs>	 (03PS1) 10Filippo Giunchedi: aptrepo: actually update to Grafana 8.5 [puppet] - 10https://gerrit.wikimedia.org/r/814794
[12:20:44] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] aptrepo: actually update to Grafana 8.5 [puppet] - 10https://gerrit.wikimedia.org/r/814794 (owner: 10Filippo Giunchedi)
[12:21:44] <wikibugs>	 (03PS2) 10Filippo Giunchedi: aptrepo: actually update to Grafana 8.5 [puppet] - 10https://gerrit.wikimedia.org/r/814794
[12:22:04] <wikibugs>	 10SRE, 10Cloud-Services: Update Grafana on cloudmetrics* to 8.x - https://phabricator.wikimedia.org/T313219 (10MoritzMuehlenhoff) 05Open→03Invalid Never mind, I missed that only cloudmetrics1001/1002 are running 7.x (which are only using role::spare::system, so possibly up for decom).
[12:22:26] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] aptrepo: actually update to Grafana 8.5 [puppet] - 10https://gerrit.wikimedia.org/r/814794 (owner: 10Filippo Giunchedi)
[12:23:25] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] aptrepo: actually update to Grafana 8.5 [puppet] - 10https://gerrit.wikimedia.org/r/814794 (owner: 10Filippo Giunchedi)
[12:26:13] <wikibugs>	 (03CR) 10Jbond: sre.hardware.dell: create new cookbook for updating idrac and bios (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/763215 (owner: 10Jbond)
[12:29:28] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
[12:30:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312984)', diff saved to https://phabricator.wikimedia.org/P31337 and previous config saved to /var/cache/conftool/dbconfig/20220718-123009-ladsgroup.json
[12:30:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[12:30:13] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[12:30:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[12:30:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1169 (T312984)', diff saved to https://phabricator.wikimedia.org/P31338 and previous config saved to /var/cache/conftool/dbconfig/20220718-123029-ladsgroup.json
[12:32:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31339 and previous config saved to /var/cache/conftool/dbconfig/20220718-123207-marostegui.json
[12:33:25] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
[12:34:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312984)', diff saved to https://phabricator.wikimedia.org/P31340 and previous config saved to /var/cache/conftool/dbconfig/20220718-123433-ladsgroup.json
[12:35:39] <godog>	 !log update grafana to 8.5.9
[12:35:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:37:41] <wikibugs>	 (03PS6) 10Muehlenhoff: Add a cookbook to change the storage type of a Ganeti VM [cookbooks] - 10https://gerrit.wikimedia.org/r/811970 (https://phabricator.wikimedia.org/T312116)
[12:38:05] <wikibugs>	 (03PS1) 10Filippo Giunchedi: aptrepo: upgrade Grafana to 8.5 (#3) [puppet] - 10https://gerrit.wikimedia.org/r/814796
[12:38:58] <wikibugs>	 (03PS2) 10Filippo Giunchedi: aptrepo: upgrade Grafana to 8.5 (#3) [puppet] - 10https://gerrit.wikimedia.org/r/814796
[12:40:43] <wikibugs>	 (03PS1) 10Matthias Mullie: Use getOption to detect user preferences [extensions/ImageSuggestions] (wmf/1.39.0-wmf.19) - 10https://gerrit.wikimedia.org/r/814767 (https://phabricator.wikimedia.org/T313209)
[12:40:52] <wikibugs>	 (03CR) 10Matthias Mullie: [C: 03+1] Use getOption to detect user preferences [extensions/ImageSuggestions] (wmf/1.39.0-wmf.19) - 10https://gerrit.wikimedia.org/r/814767 (https://phabricator.wikimedia.org/T313209) (owner: 10Matthias Mullie)
[12:41:08] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a cookbook to change the storage type of a Ganeti VM [cookbooks] - 10https://gerrit.wikimedia.org/r/811970 (https://phabricator.wikimedia.org/T312116) (owner: 10Muehlenhoff)
[12:41:28] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] aptrepo: upgrade Grafana to 8.5 (#3) [puppet] - 10https://gerrit.wikimedia.org/r/814796 (owner: 10Filippo Giunchedi)
[12:44:40] <wikibugs>	 (03PS7) 10Muehlenhoff: Add a cookbook to change the storage type of a Ganeti VM [cookbooks] - 10https://gerrit.wikimedia.org/r/811970 (https://phabricator.wikimedia.org/T312116)
[12:46:57] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] smokeping: remove sampled hosts, probed by Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/814792 (https://phabricator.wikimedia.org/T169860) (owner: 10Filippo Giunchedi)
[12:47:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31341 and previous config saved to /var/cache/conftool/dbconfig/20220718-124712-marostegui.json
[12:47:14] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[12:47:18] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[12:47:28] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[12:47:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31342 and previous config saved to /var/cache/conftool/dbconfig/20220718-124732-marostegui.json
[12:47:37] <wikibugs>	 (03PS1) 10David Caro: wmcs.novafullstack: Remove nrpe checks [puppet] - 10https://gerrit.wikimedia.org/r/814798
[12:48:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31343 and previous config saved to /var/cache/conftool/dbconfig/20220718-124838-marostegui.json
[12:49:09] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2012.codfw.wmnet with OS bullseye
[12:49:13] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2012.codfw.wmnet with OS bullseye completed: - ganeti2012 (**PASS**)   - Downtimed on...
[12:49:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P31344 and previous config saved to /var/cache/conftool/dbconfig/20220718-124938-ladsgroup.json
[12:51:09] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+1] Mentorship: enable the Vue version of the dashboard in test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814789 (https://phabricator.wikimedia.org/T300532) (owner: 10Sergio Gimeno)
[12:51:57] <wikibugs>	 (03CR) 10Muehlenhoff: Add a cookbook to change the storage type of a Ganeti VM (039 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/811970 (https://phabricator.wikimedia.org/T312116) (owner: 10Muehlenhoff)
[12:56:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] smokeping: remove sampled hosts, probed by Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/814792 (https://phabricator.wikimedia.org/T169860) (owner: 10Filippo Giunchedi)
[12:59:17] <wikibugs>	 (03PS2) 10David Caro: wmcs.novafullstack: Remove nrpe checks [puppet] - 10https://gerrit.wikimedia.org/r/814798
[13:00:04] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti2018.codfw.wmnet with OS bullseye
[13:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, and awight: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220718T1300).
[13:00:04] <jouncebot>	 Daimona, cormacparle, and matthiasmullie: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:08] <matthiasmullie>	 o/
[13:00:10] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2018.codfw.wmnet with OS bullseye
[13:00:13] <Daimona>	 o/
[13:00:29] <Lucas_WMDE>	 jouncebot: I thought the functions had arguments?
[13:00:32] <Lucas_WMDE>	 o/
[13:00:56] <Lucas_WMDE>	 alright, I can deploy
[13:01:17] <icinga-wm>	 RECOVERY - Check systemd state on thanos-be2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:01:29] <Daimona>	 Thanks Lucas
[13:01:38] <Lucas_WMDE>	 uh, not sure if I know how to create db tables on beta though
[13:01:49] <Daimona>	 Neither do I! Let the fun begin
[13:02:01] <Daimona>	 I was going to ask if it's somehting I could do myself
[13:02:11] <cormacparle>	 It's been so long since I did a deployment I'd forgotten which channel I'm supposed to be in
[13:02:23] <Tks4Fish>	 Lucas_WMDE: I see there's a "Max 6 patches" notice on the window, and we're already over that, but can you deploy 2 more that are actually one (add and use logos for brwikimedia)? if you can't that's fine :) 
[13:02:35] <Lucas_WMDE>	 we’ll see
[13:02:45] <Tks4Fish>	 okay :)
[13:02:52] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Use getOption to detect user preferences [extensions/ImageSuggestions] (wmf/1.39.0-wmf.19) - 10https://gerrit.wikimedia.org/r/814767 (https://phabricator.wikimedia.org/T313209) (owner: 10Matthias Mullie)
[13:03:02] <Lucas_WMDE>	 let’s start by +2ing the backport, since that’ll take a bit in gate-and-submit
[13:03:13] <Lucas_WMDE>	 cormacparle: are you theres
[13:03:14] <Lucas_WMDE>	 *there?
[13:03:23] <cormacparle>	 I am here
[13:03:29] <Lucas_WMDE>	 ok, then let’s start with those config changes
[13:03:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31345 and previous config saved to /var/cache/conftool/dbconfig/20220718-130343-marostegui.json
[13:03:47] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Update config for commons custommatch search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814108 (owner: 10Cparle)
[13:03:53] <cormacparle>	 cool
[13:04:43] <matthiasmullie>	 Lucas_WMDE: that backport doesn't need to go to mwdebug, can be synced right away
[13:04:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P31346 and previous config saved to /var/cache/conftool/dbconfig/20220718-130443-ladsgroup.json
[13:04:52] <Lucas_WMDE>	 ok
[13:04:55] <Daimona>	 Anyone here familiar with how to create DB tables on beta? Without breaking the world, that is.
[13:04:57] <wikibugs>	 (03PS1) 10David Caro: wmcs.novafullstack: stop sending stats to statsd [puppet] - 10https://gerrit.wikimedia.org/r/814800
[13:05:25] <Tks4Fish>	 Daimona: think of the sticker you can get!
[13:05:59] <Daimona>	 I ain't doin' that for a sticker, must be a tshirt at least!
[13:06:21] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmcs.novafullstack: stop sending stats to statsd [puppet] - 10https://gerrit.wikimedia.org/r/814800 (owner: 10David Caro)
[13:06:40] <wikibugs>	 10SRE, 10SRE-Access-Requests: Grant Access to analytics-privatedata-users for Segun Oworu - https://phabricator.wikimedia.org/T313213 (10Vgutierrez) p:05Triage→03Medium cc @Ottomata || @odimitrijevic for analytics-privatedata-users approval as of data.yaml
[13:06:44] <wikibugs>	 (03PS1) 10Ayounsi: Interface description: handle one more patch panel special case [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/814803 (https://phabricator.wikimedia.org/T304710)
[13:06:45] <wikibugs>	 10SRE, 10SRE-Access-Requests: Grant Access to analytics-privatedata-users for Segun Oworu - https://phabricator.wikimedia.org/T313213 (10Vgutierrez) a:03Vgutierrez
[13:07:01] <taavi>	 Daimona: beta runs update.php once an hour for all wikis so if the tables are wired up they'll get created automatically
[13:07:18] <Daimona>	 They're not, because we need them in wikishared and not in the local wiki DB :)
[13:07:28] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Update config for commons custommatch search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814108 (owner: 10Cparle)
[13:07:29] <Daimona>	 We don't like easy stuff.
[13:08:18] <taavi>	 ah, then connect to a deployment-mwmaint host, and manually create them on a mysql shell (`sql wikishared --write`)
[13:08:20] <wikibugs>	 (03Merged) 10jenkins-bot: Update config for commons custommatch search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814108 (owner: 10Cparle)
[13:09:03] <Lucas_WMDE>	 cormacparle: the custommatch change is live on mwdebug1001, can you test it?
[13:09:10] <cormacparle>	 sure
[13:09:10] <Daimona>	 What are the beta mwmaint hosts? I'm not sure if I have access :D
[13:09:38] <Lucas_WMDE>	 of course it’s sql and not mwscript mysql.php $uhhIDontKnowWhichWiki
[13:10:05] <Daimona>	 Lol
[13:10:18] <taavi>	 `sql` is smart and has a special-case for wikishared :-)
[13:10:25] * Lucas_WMDE loves special cases
[13:10:42] <taavi>	 https://openstack-browser.toolforge.org/project/deployment-prep says you have access and the host you want is deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud
[13:10:48] <wikibugs>	 (03Merged) 10jenkins-bot: Use getOption to detect user preferences [extensions/ImageSuggestions] (wmf/1.39.0-wmf.19) - 10https://gerrit.wikimedia.org/r/814767 (https://phabricator.wikimedia.org/T313209) (owner: 10Matthias Mullie)
[13:10:52] <wikibugs>	 (03PS2) 10Ayounsi: Interface description: handle one more patch panel special case [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/814803 (https://phabricator.wikimedia.org/T304710)
[13:11:06] <cormacparle>	 Lucas_WMDE: that custommatch change looks good
[13:11:11] <wikibugs>	 (03PS2) 10Ayounsi: Netbox _get_circuits: add patch panel support [software/homer] - 10https://gerrit.wikimedia.org/r/813604 (https://phabricator.wikimedia.org/T304710)
[13:11:14] <Daimona>	 taavi: thanks, let me try
[13:11:17] <Lucas_WMDE>	 alright, syncing that
[13:11:56] <Lucas_WMDE>	 Daimona: please don’t make changes while I’m supposed to be responsible for the backport+config window…
[13:12:27] <Daimona>	 Don't worry, just trying to see if I can access that host. I may need it for the future.
[13:12:30] <wikibugs>	 (03CR) 10Ayounsi: "Example diff before:" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/814803 (https://phabricator.wikimedia.org/T304710) (owner: 10Ayounsi)
[13:12:47] <icinga-wm>	 PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The following units failed: smokeping.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:13:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:14:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:14:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:14:21] <Lucas_WMDE>	 alright, added `sql wikishared --write` to some deployment-prep wikitech docs
[13:14:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:15:12] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814108|Update config for commons custommatch search]] (duration: 02m 55s)
[13:15:30] <Lucas_WMDE>	 alright, matthiasmullie’s backport is next
[13:15:46] <matthiasmullie>	 great
[13:16:28] <taavi>	 i simplified the docs a bit too
[13:16:28] <wikibugs>	 (03PS1) 10Hashar: Send events to Wikimedia EventGate [software/gerrit/plugins/events-wikimedia] - 10https://gerrit.wikimedia.org/r/814807
[13:16:28] <Daimona>	 (Confirming that I have access to that, thanks taavi, I wrote it down so I won't have to bother you or someone else next time)
[13:16:46] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Netbox _get_circuits: add patch panel support [software/homer] - 10https://gerrit.wikimedia.org/r/813604 (https://phabricator.wikimedia.org/T304710) (owner: 10Ayounsi)
[13:17:25] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Make weighted_tags search default for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814111 (owner: 10Cparle)
[13:18:26] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
[13:18:31] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2018.codfw.wmnet with reason: host reimage
[13:18:31] <wikibugs>	 (03CR) 10Hashar: "This is merely non sense coming from https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master  "Genera" [software/gerrit/plugins/events-wikimedia] - 10https://gerrit.wikimedia.org/r/814807 (owner: 10Hashar)
[13:18:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31347 and previous config saved to /var/cache/conftool/dbconfig/20220718-131848-marostegui.json
[13:19:30] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized php-1.39.0-wmf.19/extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php: Backport: [[gerrit:814767|Use getOption to detect user preferences (T313209)]] (duration: 02m 50s)
[13:19:34] <stashbot>	 T313209: SendNotificationsForUnillustratedWatchedTitles does not consider GlobalPreferences - https://phabricator.wikimedia.org/T313209
[13:19:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312984)', diff saved to https://phabricator.wikimedia.org/P31348 and previous config saved to /var/cache/conftool/dbconfig/20220718-131949-ladsgroup.json
[13:19:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[13:19:52] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[13:20:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:20:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[13:20:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1184 (T312984)', diff saved to https://phabricator.wikimedia.org/P31349 and previous config saved to /var/cache/conftool/dbconfig/20220718-132009-ladsgroup.json
[13:20:45] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "`git show --patience --color-moved=dimmed-zebra` nicely shows that this only moves a block of code around without changing anything inside" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814111 (owner: 10Cparle)
[13:21:29] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2018.codfw.wmnet with reason: host reimage
[13:21:48] <wikibugs>	 (03Merged) 10jenkins-bot: Make weighted_tags search default for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814111 (owner: 10Cparle)
[13:22:15] <cormacparle>	 Lucas_WMDE: yes one of those patches is just moving a block of code from one place to another ... the first search config in the list is the default, so it's changing the default search mechanism for commons
[13:22:24] <Lucas_WMDE>	 yup, makes sense
[13:22:36] <Lucas_WMDE>	 just wanted to confirm, and I always like spreading awareness of the --color-moved option ;)
[13:22:45] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:22:46] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:22:55] <Lucas_WMDE>	 cormacparle: alright, that change is on mwdebug1001, can you test it?
[13:22:55] <cormacparle>	 heh cool, I wasn't aware of it before!
[13:22:59] <cormacparle>	 sure
[13:24:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312984)', diff saved to https://phabricator.wikimedia.org/P31350 and previous config saved to /var/cache/conftool/dbconfig/20220718-132411-ladsgroup.json
[13:25:00] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetdb postgress: Improve postgress standby server - https://phabricator.wikimedia.org/T313217 (10Volans) Replication slots seems more interesting and tailored on what we need here as far as I can tell from a quick look. Thanks for opening this.
[13:25:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:25:58] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetdb postgress: Improve postgress standby server - https://phabricator.wikimedia.org/T313217 (10jbond)
[13:26:40] <Lucas_WMDE>	 taavi: do you know if it’s still the case that Beta SAL messages should be logged in -releng instead of using !log deployment-prep in -cloud?
[13:27:01] <taavi>	 no clue
[13:27:01] <Lucas_WMDE>	 that’s what I heard a few years ago but the latest entries on https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL don’t directly look related
[13:27:08] <Lucas_WMDE>	 alright
[13:27:09] <cormacparle>	 Lucas_WMDE: the commons config change looks good
[13:27:11] <Lucas_WMDE>	 then I’ll just go with that ^^
[13:27:13] <Lucas_WMDE>	 thanks cormacparle 
[13:27:49] <Lucas_WMDE>	 syncing
[13:28:11] <wikibugs>	 (03CR) 10Volans: "reply inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/763215 (owner: 10Jbond)
[13:28:31] <wikibugs>	 (03PS1) 10Jbond: P:postgress::database: add docs and fix minor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/814809 (https://phabricator.wikimedia.org/T313217)
[13:28:33] <wikibugs>	 (03PS1) 10Jbond: C:postgress::server: add replication slot support [puppet] - 10https://gerrit.wikimedia.org/r/814810 (https://phabricator.wikimedia.org/T313217)
[13:30:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:30:39] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814111|Make weighted_tags search default for commonswiki]] (duration: 02m 54s)
[13:30:40] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2028.codfw.wmnet
[13:31:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:31:19] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:31:21] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
[13:31:38] <Lucas_WMDE>	 alright, I’ll take a closer look at the beta changes now
[13:32:07] <wikibugs>	 (03PS2) 10David Caro: wmcs.novafullstack: stop sending stats to statsd [puppet] - 10https://gerrit.wikimedia.org/r/814800
[13:32:11] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:33:05] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] C:postgress::server: add replication slot support [puppet] - 10https://gerrit.wikimedia.org/r/814810 (https://phabricator.wikimedia.org/T313217) (owner: 10Jbond)
[13:33:49] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "Isn’t the addition to CommonSettings-labs.php redundant? It looks like most other extensions are only loaded via CommonSettings.php, as fa" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813991 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:33:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31351 and previous config saved to /var/cache/conftool/dbconfig/20220718-133354-marostegui.json
[13:33:55] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[13:34:00] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[13:34:09] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[13:34:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1178 (T313070)', diff saved to https://phabricator.wikimedia.org/P31352 and previous config saved to /var/cache/conftool/dbconfig/20220718-133414-marostegui.json
[13:35:23] <wikibugs>	 (03CR) 10Daimona Eaytoy: Load and configure the CampaignEvents extension where enabled (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813991 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:35:52] <wikibugs>	 (03PS2) 10Daimona Eaytoy: Load and configure the CampaignEvents extension where enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813991 (https://phabricator.wikimedia.org/T311752)
[13:36:43] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): Load and configure the CampaignEvents extension where enabled (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813991 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:37:01] <Lucas_WMDE>	 okay, so the command to create the tables would be (on deployment-deploy03):
[13:37:03] <Lucas_WMDE>	 `sql wikishared --write < /srv/mediawiki-staging/php-master/extensions/CampaignEvents/db_patches/mysql/tables-generated.sql`
[13:37:10] <Lucas_WMDE>	 does that look okay Daimona taavi?
[13:38:12] <icinga-wm>	 PROBLEM - Check systemd state on netmon2001 is CRITICAL: CRITICAL - degraded: The following units failed: smokeping.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:38:49] <wikibugs>	 (03CR) 10Daimona Eaytoy: Load and configure the CampaignEvents extension where enabled (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813991 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:38:51] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
[13:39:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P31353 and previous config saved to /var/cache/conftool/dbconfig/20220718-133916-ladsgroup.json
[13:39:38] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): Load and configure the CampaignEvents extension where enabled (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813991 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:40:00] <Lucas_WMDE>	 anyone wanna ack my SQL command above? ^^
[13:40:00] <Daimona>	 Looks OK
[13:40:04] <Lucas_WMDE>	 ok thanks :)
[13:40:18] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2018.codfw.wmnet with OS bullseye
[13:40:23] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2018.codfw.wmnet with OS bullseye completed: - ganeti2018 (**PASS**)   - Downtimed on...
[13:40:55] <Lucas_WMDE>	 seems to have worked
[13:41:15] <Lucas_WMDE>	 the table appears in DESCRIBE, including in a non-`--write` command (which I hope connects to a replica and indicates that the table creation replicated properly)
[13:41:17] <Daimona>	 Yay!
[13:42:17] <Lucas_WMDE>	 also, php-1.39..0-wmf.19/extensions/CampaignEvents/ exists on deploy1002 (prod), so that looks fine as well
[13:42:25] <Lucas_WMDE>	 I think we can go ahead with the config changes
[13:42:29] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Add CampaignEvents to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813986 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:42:39] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Add CampaignEvents to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813986 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:43:31] <wikibugs>	 (03Merged) 10jenkins-bot: Add CampaignEvents to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813986 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:44:28] <Lucas_WMDE>	 not sure if extension-list needs to be scapped in prod but let’s just do it
[13:44:37] <Lucas_WMDE>	 and I think I might also do a sync-world at the end
[13:44:41] <Lucas_WMDE>	 just in case
[13:45:02] <Lucas_WMDE>	 (if anything I say sounds like a bad idea, do let me know ^^)
[13:45:06] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti2028.codfw.wmnet to cluster codfw and group A
[13:45:10] <Daimona>	 Yeah, I was also trying to find out if that's the case
[13:45:20] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Add config variable for the CampaignEvents extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813989 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:46:07] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2028.codfw.wmnet to cluster codfw and group A
[13:46:27] <wikibugs>	 (03PS2) 10Eevans: [DRAFT]: Bootstrap new AQS Cassandra nodes (eqiad) [puppet] - 10https://gerrit.wikimedia.org/r/812426 (https://phabricator.wikimedia.org/T307802)
[13:47:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:47:59] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/extension-list: Config: [[gerrit:813986|Add CampaignEvents to extension-list (T311752)]] (duration: 03m 08s)
[13:48:00] <icinga-wm>	 RECOVERY - SSH on db1109.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:48:04] <stashbot>	 T311752: Release V0 of the CampaignEvents extension to the Beta Cluster - https://phabricator.wikimedia.org/T311752
[13:48:09] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Add config variable for the CampaignEvents extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813989 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:48:58] <wikibugs>	 (03Merged) 10jenkins-bot: Add config variable for the CampaignEvents extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813989 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:50:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:50:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:50:06] <wikibugs>	 (03PS2) 10Jbond: P:postgress::database: add docs and fix minor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/814809 (https://phabricator.wikimedia.org/T313217)
[13:50:39] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Enable the CampaignEvents extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813990 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:51:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:51:06] <icinga-wm>	 PROBLEM - Check systemd state on mw1383 is CRITICAL: CRITICAL - degraded: The following units failed: php7.2-fpm_check_restart.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:53:17] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813989|Add config variable for the CampaignEvents extension (T311752)]] (no-op) (duration: 02m 55s)
[13:53:23] <stashbot>	 T311752: Release V0 of the CampaignEvents extension to the Beta Cluster - https://phabricator.wikimedia.org/T311752
[13:54:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P31354 and previous config saved to /var/cache/conftool/dbconfig/20220718-135421-ladsgroup.json
[13:54:29] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "Note that the extension won’t *actually* be enabled until change I48805455fc wires up CommonSettings(-labs).php to load and configure the " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813990 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:55:16] <Lucas_WMDE>	 jouncebot: next
[13:55:16] <jouncebot>	 In 1 hour(s) and 34 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220718T1530)
[13:55:23] <wikibugs>	 (03Merged) 10jenkins-bot: Enable the CampaignEvents extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813990 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:55:29] <Lucas_WMDE>	 okay, I think we’ll run over a bit but it should be okay
[13:56:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:57:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:57:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:57:14] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10MoritzMuehlenhoff)
[13:57:59] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:58:27] <wikibugs>	 (03PS3) 10Lucas Werkmeister (WMDE): Load and configure the CampaignEvents extension where enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813991 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[13:58:51] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:813990|Enable the CampaignEvents extension on beta (T311752)]] (no-op) (duration: 02m 43s)
[13:58:54] <stashbot>	 T311752: Release V0 of the CampaignEvents extension to the Beta Cluster - https://phabricator.wikimedia.org/T311752
[14:00:07] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Load and configure the CampaignEvents extension where enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813991 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[14:00:19] <Lucas_WMDE>	 (backport+config window continues, for the record)
[14:01:56] <wikibugs>	 (03Merged) 10jenkins-bot: Load and configure the CampaignEvents extension where enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/813991 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy)
[14:02:26] <Lucas_WMDE>	 Daimona: I’ve pulled the last change to mwdebug1001, can you quickly check that the extension definitely isn’t enabled in production?
[14:02:41] <Lucas_WMDE>	 perhaps load some special page it would provide, or something
[14:02:43] <Daimona>	 \o/ sure
[14:03:03] <Lucas_WMDE>	 I can’t see it in Wikidata’s Special:Version which is already a good sign
[14:03:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:04:14] <Daimona>	 Yup, can't see it in prod
[14:04:58] <Lucas_WMDE>	 ok thanks
[14:07:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:07:19] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:08:30] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:813991|Load and configure the CampaignEvents extension where enabled (T311752)]] (1/2: should be no-op) (duration: 02m 51s)
[14:08:34] <stashbot>	 T311752: Release V0 of the CampaignEvents extension to the Beta Cluster - https://phabricator.wikimedia.org/T311752
[14:09:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312984)', diff saved to https://phabricator.wikimedia.org/P31355 and previous config saved to /var/cache/conftool/dbconfig/20220718-140926-ladsgroup.json
[14:09:28] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1128.eqiad.wmnet with reason: Maintenance
[14:09:32] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[14:09:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1128.eqiad.wmnet with reason: Maintenance
[14:09:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1128 (T312984)', diff saved to https://phabricator.wikimedia.org/P31356 and previous config saved to /var/cache/conftool/dbconfig/20220718-140947-ladsgroup.json
[14:09:54] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[14:11:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:11:46] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:813991|Load and configure the CampaignEvents extension where enabled (T311752)]] (2/2: should be prod no-op) (duration: 02m 40s)
[14:12:53] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/814820
[14:12:56] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[14:13:24] <Lucas_WMDE>	 Daimona: it looks like it should be enabled in Beta by now
[14:13:30] <Lucas_WMDE>	 (https://integration.wikimedia.org/ci/job/beta-scap-sync-world/60217/console)
[14:13:34] <Daimona>	 Yup, it's there!
[14:13:37] <Lucas_WMDE>	 yay
[14:13:45] <Lucas_WMDE>	 I’ll do a final sync-world in production just to make sure everything’s clean
[14:13:52] <Daimona>	 Noice!
[14:13:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312984)', diff saved to https://phabricator.wikimedia.org/P31357 and previous config saved to /var/cache/conftool/dbconfig/20220718-141354-ladsgroup.json
[14:13:55] <Daimona>	 Thanks again.
[14:13:57] <Lucas_WMDE>	 since I’m not sure the extension’s i18n would’ve been built when it was in wmf.19 but not yet in extension-list
[14:13:58] <Lucas_WMDE>	 np
[14:14:32] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[14:14:47] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap: refresh everything after adding CampaignEvents to extension-list (T311752, only enabled in Beta so far), just in case
[14:14:52] <stashbot>	 T311752: Release V0 of the CampaignEvents extension to the Beta Cluster - https://phabricator.wikimedia.org/T311752
[14:16:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:18:42] <thcipriani>	 Lucas_WMDE: looked like a big backport window today, nice work---thank you <3
[14:18:52] <Lucas_WMDE>	 you’re welcome :)
[14:18:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:18:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:19:24] <Lucas_WMDE>	 Tks4Fish: sorry we had no time for your patches today, please schedule them another time
[14:19:39] <Lucas_WMDE>	 ah, I see you already did, for the late window
[14:20:11] <Daimona>	 So is it officially over?
[14:20:35] <Lucas_WMDE>	 not until the sync-world finishes, but I’m not going to do more patches after it
[14:20:42] <Lucas_WMDE>	 (currently at sync-proxies 87% btw)
[14:21:21] <Daimona>	 Yeah, just asking because I didn't want to open the champagne before 100%
[14:21:35] <Lucas_WMDE>	 :D
[14:22:04] <Lucas_WMDE>	 if champagne is for v0 on beta, how are you going to celebrate production deployment? ;)
[14:22:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:22:50] <Daimona>	 champagne, but with an extra 0 on the price tag
[14:22:59] <Lucas_WMDE>	 fair enough
[14:23:29] <wikibugs>	 10SRE-OnFire, 10SRE Observability (FY2022/2023-Q1): Set up POC Dispatch environment and evaluate its viability - https://phabricator.wikimedia.org/T309033 (10herron)
[14:23:54] <Daimona>	 Well, guess I'll be happy with an iced tea for now.
[14:24:07] <Lucas_WMDE>	 93% sync-apaches
[14:24:13] <Lucas_WMDE>	 it should be almost done
[14:24:46] <wikibugs>	 10SRE-OnFire, 10SRE Observability (FY2022/2023-Q1): Set up POC Dispatch environment and evaluate its viability - https://phabricator.wikimedia.org/T309033 (10herron)
[14:25:09] <wikibugs>	 10SRE-OnFire, 10SRE Observability (FY2022/2023-Q1): Set up POC Dispatch environment and evaluate its viability - https://phabricator.wikimedia.org/T309033 (10herron)
[14:25:15] <Daimona>	 Oh, > 90%, that's when things usually start breaking
[14:25:22] <Lucas_WMDE>	 oh wow it tells me the rsync transfer was a total of 900 gigabytes
[14:25:33] <Lucas_WMDE>	 (average some 2½ gigs per host)
[14:25:38] <Reedy>	 o_0
[14:26:46] <wikibugs>	 10SRE-OnFire, 10SRE Observability (FY2022/2023-Q1): Set up POC Dispatch environment and evaluate its viability - https://phabricator.wikimedia.org/T309033 (10herron) p:05Triage→03Medium
[14:27:09] <Lucas_WMDE>	 I suspect this is the total file size, not the amount of transferred data
[14:27:35] <Lucas_WMDE>	 earlier on mwdebug the scap pull reported 280k files, 10 GB total file size, but just 1 GB bytes transferred
[14:27:44] <Lucas_WMDE>	 hm
[14:29:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P31358 and previous config saved to /var/cache/conftool/dbconfig/20220718-142859-ladsgroup.json
[14:29:00] <thcipriani>	 ^ dancy that seems like...a lot
[14:29:27] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: refresh everything after adding CampaignEvents to extension-list (T311752, only enabled in Beta so far), just in case (duration: 14m 40s)
[14:29:30] <Lucas_WMDE>	 I can keep the terminal window open if you want the full outputs
[14:29:31] <stashbot>	 T311752: Release V0 of the CampaignEvents extension to the Beta Cluster - https://phabricator.wikimedia.org/T311752
[14:29:39] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[14:29:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:42] <thcipriani>	 of transfer for a sync-world that was a cleanup, (is that right Lucas_WMDE -- no new l10n expected?)
[14:29:49] <Lucas_WMDE>	 yes
[14:29:57] <dancy>	 hmm?
[14:30:11] <thcipriani>	 I have a suspicion that there's some process that's invalidating l10n when it's uneeded
[14:30:13] <Lucas_WMDE>	 the l10n rebuild finished in less than a minute too
[14:30:50] <thcipriani>	 whether that's in scap or on the deployment server: I'm unsure. But rsync seems to be syncing all l10n on each sync-world (or the last few I've done)
[14:30:57] <Lucas_WMDE>	 hm
[14:31:19] <dancy>	 thcipriani: Even for a back-to-back run?
[14:32:10] <thcipriani>	 dancy: I only did one, and it didn't do it for back-to-back runs last I checked (a couple weeks ago) but when I try it at the beginning of the backport window it seems to happen every time
[14:32:39] <thcipriani>	 which makes me think it could be some automated process on the deployment box somewhere
[14:32:55] <thcipriani>	 the good news: everything works correctly; the bad news: takes a long time
[14:32:57] <dancy>	 There aren't any such processes that I'm aware of.
[14:33:12] <thcipriani>	 yeah, same
[14:33:40] <Lucas_WMDE>	 the new l10n backports are only for REL1_ branches, not wmf. ones, right?
[14:33:46] <Lucas_WMDE>	 (now-ish)
[14:33:50] <Lucas_WMDE>	 (*new-ish)
[14:34:01] <wikibugs>	 (03PS2) 10Jbond: C:postgress::server: add replication slot support [puppet] - 10https://gerrit.wikimedia.org/r/814810 (https://phabricator.wikimedia.org/T313217)
[14:34:05] <Lucas_WMDE>	 otherwise that might be an explanation
[14:34:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T313070)', diff saved to https://phabricator.wikimedia.org/P31359 and previous config saved to /var/cache/conftool/dbconfig/20220718-143428-marostegui.json
[14:34:30] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/814803 (https://phabricator.wikimedia.org/T304710) (owner: 10Ayounsi)
[14:34:34] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[14:36:21] <wikibugs>	 (03PS1) 10Jbond: O:puppetdb: enable postgress slots for replication [puppet] - 10https://gerrit.wikimedia.org/r/814824 (https://phabricator.wikimedia.org/T313217)
[14:41:16] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36282/console" [puppet] - 10https://gerrit.wikimedia.org/r/814824 (https://phabricator.wikimedia.org/T313217) (owner: 10Jbond)
[14:41:55] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] C:postgress::server: add replication slot support [puppet] - 10https://gerrit.wikimedia.org/r/814810 (https://phabricator.wikimedia.org/T313217) (owner: 10Jbond)
[14:42:22] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
[14:44:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P31360 and previous config saved to /var/cache/conftool/dbconfig/20220718-144404-ladsgroup.json
[14:45:37] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:46:39] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/799001 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[14:47:09] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48391 bytes in 0.121 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:49:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31361 and previous config saved to /var/cache/conftool/dbconfig/20220718-144934-marostegui.json
[14:50:32] <wikibugs>	 (03PS1) 10Jbond: test reverting storconfig change [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/814826
[14:51:15] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
[14:51:30] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/813724 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[14:52:30] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+1] profile: make loki data directory configurable [puppet] - 10https://gerrit.wikimedia.org/r/813715 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[14:52:34] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] test reverting storconfig change [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/814826 (owner: 10Jbond)
[14:53:55] <wikibugs>	 (03PS3) 10Ayounsi: Netbox _get_circuits: add patch panel support [software/homer] - 10https://gerrit.wikimedia.org/r/813604 (https://phabricator.wikimedia.org/T304710)
[14:53:57] <wikibugs>	 (03PS1) 10Ayounsi: Add Python 3.10 support [software/homer] - 10https://gerrit.wikimedia.org/r/814827
[14:55:17] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti2012.codfw.wmnet to cluster codfw and group C
[14:56:12] <wikibugs>	 (03CR) 10Ayounsi: "This returns error:" [software/homer] - 10https://gerrit.wikimedia.org/r/814827 (owner: 10Ayounsi)
[14:56:57] <wikibugs>	 (03PS4) 10Ayounsi: Netbox _get_circuits: add patch panel support [software/homer] - 10https://gerrit.wikimedia.org/r/813604 (https://phabricator.wikimedia.org/T304710)
[14:58:21] <wikibugs>	 (03CR) 10Dzahn: "I am just trying to avoid paging the entire SRE team. When switching to new types of monitoring there often is a false positive or some fo" [puppet] - 10https://gerrit.wikimedia.org/r/812846 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[14:58:24] <wikibugs>	 (03PS5) 10Dbrant: Add sampling to android.breadcrumbs event stream. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/811765 (https://phabricator.wikimedia.org/T310847)
[14:59:07] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2012.codfw.wmnet to cluster codfw and group C
[14:59:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312984)', diff saved to https://phabricator.wikimedia.org/P31362 and previous config saved to /var/cache/conftool/dbconfig/20220718-145909-ladsgroup.json
[14:59:11] <wikibugs>	 (03CR) 10Ahmon Dancy: safe-service-restart.py: Avoid uninitialized access to 'status' (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/807624 (https://phabricator.wikimedia.org/T311182) (owner: 10Ahmon Dancy)
[14:59:15] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[14:59:47] <wikibugs>	 (03PS1) 10Jbond: create_puppetconf  no longer takes the directory parameter [puppet] - 10https://gerrit.wikimedia.org/r/814832
[15:00:04] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] create_puppetconf  no longer takes the directory parameter [puppet] - 10https://gerrit.wikimedia.org/r/814832 (owner: 10Jbond)
[15:00:41] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add Python 3.10 support [software/homer] - 10https://gerrit.wikimedia.org/r/814827 (owner: 10Ayounsi)
[15:02:08] <wikibugs>	 (03PS5) 10Ayounsi: Netbox _get_circuits: add patch panel support [software/homer] - 10https://gerrit.wikimedia.org/r/813604 (https://phabricator.wikimedia.org/T304710)
[15:03:05] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:postgress::database: add docs and fix minor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/814809 (https://phabricator.wikimedia.org/T313217) (owner: 10Jbond)
[15:04:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31363 and previous config saved to /var/cache/conftool/dbconfig/20220718-150439-marostegui.json
[15:08:17] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Netbox _get_circuits: add patch panel support [software/homer] - 10https://gerrit.wikimedia.org/r/813604 (https://phabricator.wikimedia.org/T304710) (owner: 10Ayounsi)
[15:08:51] <wikibugs>	 (03PS2) 10Muehlenhoff: tcpircbot: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/811231 (https://phabricator.wikimedia.org/T308013)
[15:08:58] <wikibugs>	 (03PS3) 10Jbond: C:postgress::server: add replication slot support [puppet] - 10https://gerrit.wikimedia.org/r/814810 (https://phabricator.wikimedia.org/T313217)
[15:09:00] <wikibugs>	 (03PS2) 10Jbond: O:puppetdb: enable postgress slots for replication [puppet] - 10https://gerrit.wikimedia.org/r/814824 (https://phabricator.wikimedia.org/T313217)
[15:09:30] <wikibugs>	 (03PS4) 10Jbond: C:postgress::server: add replication slot support [puppet] - 10https://gerrit.wikimedia.org/r/814810 (https://phabricator.wikimedia.org/T313217)
[15:09:36] <wikibugs>	 (03PS3) 10Jbond: O:puppetdb: enable postgress slots for replication [puppet] - 10https://gerrit.wikimedia.org/r/814824 (https://phabricator.wikimedia.org/T313217)
[15:10:29] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/codfw to Bullseye - https://phabricator.wikimedia.org/T311686 (10MoritzMuehlenhoff)
[15:10:56] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] O:puppetdb: enable postgress slots for replication [puppet] - 10https://gerrit.wikimedia.org/r/814824 (https://phabricator.wikimedia.org/T313217) (owner: 10Jbond)
[15:11:22] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] tcpircbot: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/811231 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[15:13:26] <wikibugs>	 10SRE-OnFire, 10Discovery-Search, 10Wikidata, 10wdwb-tech, and 4 others: Only generate maxlag from pooled query service servers. - https://phabricator.wikimedia.org/T238751 (10Gehel)
[15:14:03] <wikibugs>	 10SRE-OnFire, 10Wikidata, 10Wikidata-Query-Service, 10wdwb-tech, and 4 others: Only generate maxlag from pooled query service servers. - https://phabricator.wikimedia.org/T238751 (10Gehel)
[15:17:03] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Ensure custom locales for Moment.js overrides, don't change 'en' [core] (wmf/1.39.0-wmf.19) - 10https://gerrit.wikimedia.org/r/814769 (https://phabricator.wikimedia.org/T313188)
[15:19:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T313070)', diff saved to https://phabricator.wikimedia.org/P31364 and previous config saved to /var/cache/conftool/dbconfig/20220718-151944-marostegui.json
[15:19:46] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
[15:19:49] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[15:19:54] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.4 point update - https://phabricator.wikimedia.org/T312637 (10MoritzMuehlenhoff)
[15:19:59] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
[15:20:02] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
[15:20:05] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
[15:20:07] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[15:20:21] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[15:20:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31365 and previous config saved to /var/cache/conftool/dbconfig/20220718-152026-marostegui.json
[15:21:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31366 and previous config saved to /var/cache/conftool/dbconfig/20220718-152132-marostegui.json
[15:25:37] <wikibugs>	 (03PS2) 10Ayounsi: Add Python 3.10 support [software/homer] - 10https://gerrit.wikimedia.org/r/814827
[15:25:39] <wikibugs>	 (03PS6) 10Ayounsi: Netbox _get_circuits: add patch panel support [software/homer] - 10https://gerrit.wikimedia.org/r/813604 (https://phabricator.wikimedia.org/T304710)
[15:25:41] <wikibugs>	 (03PS1) 10Ayounsi: Workaround mypy type error on pyyaml [software/homer] - 10https://gerrit.wikimedia.org/r/814839
[15:29:18] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.4 point update - https://phabricator.wikimedia.org/T312637 (10MoritzMuehlenhoff)
[15:30:04] <jouncebot>	 jan_drewniak: How many deployers does it take to do Wikimedia Portals Update deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220718T1530).
[15:32:09] <wikibugs>	 10SRE, 10ops-codfw, 10Elasticsearch, 10Discovery-Search (Current work), 10Patch-For-Review: Degraded RAID on elastic2049 - https://phabricator.wikimedia.org/T311939 (10MPhamWMF)
[15:33:54] <wikibugs>	 10SRE, 10ops-codfw, 10Elasticsearch, 10Discovery-Search (Current work), 10Patch-For-Review: Degraded RAID on elastic2049 - https://phabricator.wikimedia.org/T311939 (10Gehel) a:05Papaul→03bking We have enough over capacity in that cluster, and this server should be scheduled for refresh next year. Le...
[15:35:14] <wikibugs>	 (03CR) 10PleaseStand: Add change_templatelinks_pk.py (031 comment) [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814729 (https://phabricator.wikimedia.org/T312863) (owner: 10Ladsgroup)
[15:36:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31367 and previous config saved to /var/cache/conftool/dbconfig/20220718-153637-marostegui.json
[15:36:53] <wikibugs>	 (03Abandoned) 10David Caro: DONOTMERGE: skeleteon for the replicaconfig service [puppet] - 10https://gerrit.wikimedia.org/r/780853 (owner: 10David Caro)
[15:37:29] <wikibugs>	 (03Abandoned) 10David Caro: novafullstack: allow running on codfw [puppet] - 10https://gerrit.wikimedia.org/r/811318 (owner: 10David Caro)
[15:38:00] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Interface description: handle one more patch panel special case [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/814803 (https://phabricator.wikimedia.org/T304710) (owner: 10Ayounsi)
[15:40:12] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add change_templatelinks_pk.py (031 comment) [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814729 (https://phabricator.wikimedia.org/T312863) (owner: 10Ladsgroup)
[15:41:28] <wikibugs>	 (03PS1) 10Ladsgroup: change_templatelinks_pk: Fix check [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814844
[15:42:21] <wikibugs>	 (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814846 (https://phabricator.wikimedia.org/T128546)
[15:44:18] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] change_templatelinks_pk: Fix check [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814844 (owner: 10Ladsgroup)
[15:44:29] <wikibugs>	 (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814846 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[15:45:08] <wikibugs>	 (03Merged) 10jenkins-bot: change_templatelinks_pk: Fix check [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814844 (owner: 10Ladsgroup)
[15:45:28] <wikibugs>	 (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814846 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[15:46:04] <wikibugs>	 10SRE-swift-storage, 10Maps, 10Product-Infrastructure-Team-Backlog, 10User-fgiunchedi: Followups for Tegola and Swift interactions - https://phabricator.wikimedia.org/T307184 (10fgiunchedi) >>! In T307184#8069141, @fgiunchedi wrote: > Overall the idea of sending additional headers is the right one @Jgianne...
[15:48:35] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[15:49:44] <logmsgbot>	 !log jdrewniak@deploy1002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:814846| Bumping portals to master (T128546)]] (duration: 03m 03s)
[15:49:48] <stashbot>	 T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
[15:51:16] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[15:51:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[15:51:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31368 and previous config saved to /var/cache/conftool/dbconfig/20220718-155143-marostegui.json
[15:52:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[15:52:43] <logmsgbot>	 !log jdrewniak@deploy1002 Synchronized portals: Wikimedia Portals Update: [[gerrit:814846| Bumping portals to master (T128546)]] (duration: 02m 59s)
[15:53:20] <wikibugs>	 (03PS1) 10Andrea Denisse: netmon: Add suppport for multiple backup/passive nodes in Puppet [puppet] - 10https://gerrit.wikimedia.org/r/814848 (https://phabricator.wikimedia.org/T309074)
[15:54:20] <wikibugs>	 (03PS24) 10Ayounsi: Decom cookbook: configure switches using cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/803262
[15:54:22] <wikibugs>	 (03PS2) 10Ayounsi: provision cookbook: configure switches using cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/811730
[15:54:36] <icinga-wm>	 RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:54:45] <wikibugs>	 (03CR) 10Ayounsi: "thanks!" [cookbooks] - 10https://gerrit.wikimedia.org/r/803262 (owner: 10Ayounsi)
[15:55:21] <wikibugs>	 (03PS1) 10Filippo Giunchedi: smokeping: fix targets configuration for drmrs [puppet] - 10https://gerrit.wikimedia.org/r/814849 (https://phabricator.wikimedia.org/T169860)
[15:55:47] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] smokeping: fix targets configuration for drmrs [puppet] - 10https://gerrit.wikimedia.org/r/814849 (https://phabricator.wikimedia.org/T169860) (owner: 10Filippo Giunchedi)
[15:56:13] <wikibugs>	 (03PS2) 10Filippo Giunchedi: smokeping: fix targets configuration for drmrs [puppet] - 10https://gerrit.wikimedia.org/r/814849 (https://phabricator.wikimedia.org/T169860)
[15:57:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] smokeping: fix targets configuration for drmrs [puppet] - 10https://gerrit.wikimedia.org/r/814849 (https://phabricator.wikimedia.org/T169860) (owner: 10Filippo Giunchedi)
[16:02:31] <wikibugs>	 (03PS1) 10Ahmon Dancy: Avoid additional errors if connection to poolcounter server fails [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/814851 (https://phabricator.wikimedia.org/T310835)
[16:05:29] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Avoid additional errors if connection to poolcounter server fails [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/814851 (https://phabricator.wikimedia.org/T310835) (owner: 10Ahmon Dancy)
[16:06:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31369 and previous config saved to /var/cache/conftool/dbconfig/20220718-160648-marostegui.json
[16:06:50] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
[16:06:53] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[16:07:03] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
[16:07:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31370 and previous config saved to /var/cache/conftool/dbconfig/20220718-160708-marostegui.json
[16:07:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[16:08:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31371 and previous config saved to /var/cache/conftool/dbconfig/20220718-160813-marostegui.json
[16:09:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[16:09:59] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[16:12:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[16:17:03] <wikibugs>	 (03PS1) 10Ebernhardson: reindex: Detect index type from live mappings [extensions/CirrusSearch] (wmf/1.39.0-wmf.20) - 10https://gerrit.wikimedia.org/r/814770
[16:17:14] <wikibugs>	 (03PS1) 10Ebernhardson: reindex: Detect index type from live mappings [extensions/CirrusSearch] (wmf/1.39.0-wmf.19) - 10https://gerrit.wikimedia.org/r/814771
[16:23:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31372 and previous config saved to /var/cache/conftool/dbconfig/20220718-162319-marostegui.json
[16:24:22] <wikibugs>	 (03PS2) 10Ahmon Dancy: Avoid additional errors if connection to poolcounter server fails [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/814851 (https://phabricator.wikimedia.org/T310835)
[16:28:01] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Avoid additional errors if connection to poolcounter server fails [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/814851 (https://phabricator.wikimedia.org/T310835) (owner: 10Ahmon Dancy)
[16:28:14] <wikibugs>	 10SRE-swift-storage, 10Maps, 10Product-Infrastructure-Team-Backlog, 10User-fgiunchedi: Followups for Tegola and Swift interactions - https://phabricator.wikimedia.org/T307184 (10Jgiannelos) Technically we can do this (although it wasn't very trivial from a quick look at the s3 go sdk). Maybe its worth revi...
[16:29:21] <wikibugs>	 10SRE, 10Security-Team, 10WMF-Legal, 10SecTeam-Processed, and 2 others: T166179 has attachments that perhaps shouldn't have been made public - https://phabricator.wikimedia.org/T313125 (10sbassett) No need to directly engage #wmf-legal on this.  The issue appears to be resolved by @RobH, so making this tas...
[16:29:27] <wikibugs>	 10SRE, 10Security-Team, 10WMF-Legal, 10SecTeam-Processed, and 2 others: T166179 has attachments that perhaps shouldn't have been made public - https://phabricator.wikimedia.org/T313125 (10sbassett)
[16:29:35] <wikibugs>	 10SRE, 10Security-Team, 10WMF-Legal, 10SecTeam-Processed, and 2 others: T166179 has attachments that perhaps shouldn't have been made public - https://phabricator.wikimedia.org/T313125 (10sbassett) p:05Triage→03Low a:03RobH
[16:30:02] <wikibugs>	 10SRE, 10Security-Team, 10WMF-Legal, 10SecTeam-Processed, and 2 others: T166179 has attachments that perhaps shouldn't have been made public - https://phabricator.wikimedia.org/T313125 (10sbassett) 05Open→03Resolved
[16:31:11] <wikibugs>	 (03CR) 10Ahmon Dancy: "Not sure what to do about the various CI errors.  I don't think they're related to the changes I made." [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/814851 (https://phabricator.wikimedia.org/T310835) (owner: 10Ahmon Dancy)
[16:37:38] <wikibugs>	 (03PS1) 10Majavah: openstack: wmcs-image-create: adapt for systemd based puppet runs [puppet] - 10https://gerrit.wikimedia.org/r/814857
[16:37:50] <wikibugs>	 (03PS2) 10Majavah: openstack: wmcs-image-create: adapt for systemd based puppet runs [puppet] - 10https://gerrit.wikimedia.org/r/814857
[16:38:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31373 and previous config saved to /var/cache/conftool/dbconfig/20220718-163824-marostegui.json
[16:52:20] <wikibugs>	 10SRE, 10DC-Ops, 10Patch-For-Review: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH)
[16:53:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31374 and previous config saved to /var/cache/conftool/dbconfig/20220718-165329-marostegui.json
[16:53:31] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[16:53:36] <stashbot>	 T313070: Adjust the field type of wb_changes.change_time to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T313070
[16:53:45] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[16:53:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31375 and previous config saved to /var/cache/conftool/dbconfig/20220718-165349-marostegui.json
[16:54:02] <icinga-wm>	 PROBLEM - etcd request latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 operation=create https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[16:54:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31376 and previous config saved to /var/cache/conftool/dbconfig/20220718-165455-marostegui.json
[16:56:18] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31377 and previous config saved to /var/cache/conftool/dbconfig/20220718-165617-root.json
[16:56:30] <icinga-wm>	 RECOVERY - etcd request latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[17:00:05] <jouncebot>	 ryankemper: Time to snap out of that daydream and deploy Wikidata Query Service weekly deploy. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220718T1700).
[17:11:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31378 and previous config saved to /var/cache/conftool/dbconfig/20220718-171122-root.json
[17:11:45] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Keystone: add a no-op userid hash generator [puppet] - 10https://gerrit.wikimedia.org/r/812403 (owner: 10Andrew Bogott)
[17:21:31] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Remove cloudstore100[89] IPs from the dmz_cidr [puppet] - 10https://gerrit.wikimedia.org/r/810351 (https://phabricator.wikimedia.org/T311844) (owner: 10Andrew Bogott)
[17:23:10] <wikibugs>	 (03PS2) 10David Caro: wmcs.labstore: add some alerts for labstore [alerts] - 10https://gerrit.wikimedia.org/r/813926
[17:25:44] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmcs.labstore: add some alerts for labstore [alerts] - 10https://gerrit.wikimedia.org/r/813926 (owner: 10David Caro)
[17:26:17] <wikibugs>	 (03PS3) 10David Caro: wmcs.labstore: add some alerts for labstore [alerts] - 10https://gerrit.wikimedia.org/r/813926
[17:26:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31379 and previous config saved to /var/cache/conftool/dbconfig/20220718-172626-root.json
[17:28:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmcs.labstore: add some alerts for labstore [alerts] - 10https://gerrit.wikimedia.org/r/813926 (owner: 10David Caro)
[17:30:20] <wikibugs>	 10SRE-swift-storage, 10User-fgiunchedi: Shorten Thanos retention - https://phabricator.wikimedia.org/T311690 (10Aklapper)
[17:33:11] <wikibugs>	 (03PS3) 10Sohom Datta: Enable edit-in-sequence on Beta Wikisource for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810054 (https://phabricator.wikimedia.org/T308098)
[17:41:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31380 and previous config saved to /var/cache/conftool/dbconfig/20220718-174130-root.json
[17:43:08] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS bullseye
[17:47:47] <wikibugs>	 (03CR) 10Raymond Ndibe: [C: 03+1] "Not enough context to +2, so I'll just +1" [puppet] - 10https://gerrit.wikimedia.org/r/813275 (owner: 10David Caro)
[17:51:38] <icinga-wm>	 PROBLEM - etcd request latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 operation=create https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[17:54:08] <icinga-wm>	 RECOVERY - etcd request latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[17:56:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31381 and previous config saved to /var/cache/conftool/dbconfig/20220718-175634-root.json
[17:56:38] <wikibugs>	 (03PS1) 10Jdlrobson: Collapse sidebar by default for anonymous users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814865 (https://phabricator.wikimedia.org/T287609)
[17:57:00] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2065.codfw.wmnet with reason: host reimage
[17:59:19] <wikibugs>	 (03PS29) 10Jbond: beaker: add initial beaker files [puppet] - 10https://gerrit.wikimedia.org/r/809224 (https://phabricator.wikimedia.org/T253635)
[17:59:21] <wikibugs>	 (03PS1) 10Jbond: beaker: add a method to hack fixes specific to beaker [puppet] - 10https://gerrit.wikimedia.org/r/814866
[18:02:10] <wikibugs>	 (03PS1) 10Jdlrobson: Enable language switching button for logged-out users on non-pilot wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814867 (https://phabricator.wikimedia.org/T312861)
[18:02:12] <wikibugs>	 (03PS1) 10Jdlrobson: Turn off fixed width in main namespace on Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814868 (https://phabricator.wikimedia.org/T311607)
[18:02:14] <wikibugs>	 (03PS1) 10Jdlrobson: Deploy the new grid layout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814869 (https://phabricator.wikimedia.org/T312241)
[18:02:49] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2065.codfw.wmnet with reason: host reimage
[18:04:48] <wikibugs>	 10SRE, 10Gerrit, 10serviceops, 10serviceops-collab, 10Release-Engineering-Team (The Decommission Mission 💀): replacement for gerrit2001 - https://phabricator.wikimedia.org/T243027 (10thcipriani)
[18:05:38] <wikibugs>	 (03CR) 10Herron: logstash: enable pipeline-managed index patterns (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/799001 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[18:07:37] <wikibugs>	 (03CR) 10Herron: [C: 03+1] profile: make loki data directory configurable [puppet] - 10https://gerrit.wikimedia.org/r/813715 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[18:07:39] <wikibugs>	 (03PS3) 10Dduvall: gitlab_runner: Handle changes to runner config [puppet] - 10https://gerrit.wikimedia.org/r/812402 (https://phabricator.wikimedia.org/T311746)
[18:08:20] <wikibugs>	 (03CR) 10Dduvall: "Thanks for the review, Jelto. I believe I've addressed your comments." [puppet] - 10https://gerrit.wikimedia.org/r/812402 (https://phabricator.wikimedia.org/T311746) (owner: 10Dduvall)
[18:08:43] <wikibugs>	 (03PS1) 10Ebernhardson: Turn off ApiFeatureUsage extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814870 (https://phabricator.wikimedia.org/T313248)
[18:08:45] <wikibugs>	 (03PS1) 10Ebernhardson: Remove references to ApiFeatureUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814871 (https://phabricator.wikimedia.org/T313248)
[18:08:56] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Keystone: rearrange how service domains are configured. [puppet] - 10https://gerrit.wikimedia.org/r/812406 (owner: 10Andrew Bogott)
[18:09:07] <wikibugs>	 (03PS4) 10Andrew Bogott: Keystone: rearrange how service domains are configured. [puppet] - 10https://gerrit.wikimedia.org/r/812406
[18:09:11] <wikibugs>	 (03CR) 10Herron: [C: 03+1] hiera: deploy and enable loki on grafana hosts [puppet] - 10https://gerrit.wikimedia.org/r/813724 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[18:11:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31382 and previous config saved to /var/cache/conftool/dbconfig/20220718-181138-root.json
[18:15:12] <wikibugs>	 (03PS1) 10Ebernhardson: Remove unused wmgUseApiFeatureUsage config var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814873 (https://phabricator.wikimedia.org/T313248)
[18:16:27] <wikibugs>	 (03CR) 10Raymond Ndibe: "noop question here. Saw the word "Icinga" in atleast two of the currently open patches and googled about it. I'd say that I get the idea o" [alerts] - 10https://gerrit.wikimedia.org/r/813926 (owner: 10David Caro)
[18:16:56] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS bullseye
[18:17:16] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
[18:17:18] <wikibugs>	 (03Abandoned) 10Ebernhardson: Turn off ApiFeatureUsage extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814870 (https://phabricator.wikimedia.org/T313248) (owner: 10Ebernhardson)
[18:17:56] <wikibugs>	 (03PS2) 10Ebernhardson: Remove references to ApiFeatureUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814871 (https://phabricator.wikimedia.org/T313248)
[18:19:00] <wikibugs>	 (03PS10) 10Dduvall: docker_registry_ha: Authorize GitLab trusted runners using JWT [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501)
[18:19:23] <wikibugs>	 (03CR) 10Dduvall: docker_registry_ha: Authorize GitLab trusted runners using JWT (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501) (owner: 10Dduvall)
[18:24:07] <wikibugs>	 (03CR) 10Jforrester: "You have to do these kinds of changes as two or three different patches for deploy safety; first disable in IS (nothing to do here), then " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814871 (https://phabricator.wikimedia.org/T313248) (owner: 10Ebernhardson)
[18:24:18] <wikibugs>	 (03PS3) 10Ebernhardson: Remove references to ApiFeatureUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814871 (https://phabricator.wikimedia.org/T313248)
[18:24:20] <wikibugs>	 (03PS2) 10Ebernhardson: Remove unused wmgUseApiFeatureUsage config var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814873 (https://phabricator.wikimedia.org/T313248)
[18:26:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31384 and previous config saved to /var/cache/conftool/dbconfig/20220718-182642-root.json
[18:27:26] <wikibugs>	 (03CR) 10Ebernhardson: Remove references to ApiFeatureUsage (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814871 (https://phabricator.wikimedia.org/T313248) (owner: 10Ebernhardson)
[18:28:08] <wikibugs>	 (03CR) 10Clare Ming: Deploy the new grid layout (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814869 (https://phabricator.wikimedia.org/T312241) (owner: 10Jdlrobson)
[18:29:46] <wikibugs>	 (03CR) 10Clare Ming: [C: 03+1] Turn off fixed width in main namespace on Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814868 (https://phabricator.wikimedia.org/T311607) (owner: 10Jdlrobson)
[18:32:18] <wikibugs>	 (03CR) 10Clare Ming: Enable language switching button for logged-out users on non-pilot wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814867 (https://phabricator.wikimedia.org/T312861) (owner: 10Jdlrobson)
[18:32:42] <wikibugs>	 (03CR) 10Herron: logstash: duplicate alert logs for loki target (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/806349 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[18:35:24] <logmsgbot>	 !log ryankemper@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
[18:36:07] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
[18:39:53] <wikibugs>	 (03PS1) 10Zabe: cassandra: Add SPDX headers to cassandra profile [puppet] - 10https://gerrit.wikimedia.org/r/814876 (https://phabricator.wikimedia.org/T308013)
[18:39:55] <wikibugs>	 (03PS1) 10Zabe: certspotter: Add SPDX headers to certspotter profile [puppet] - 10https://gerrit.wikimedia.org/r/814877 (https://phabricator.wikimedia.org/T308013)
[18:39:57] <wikibugs>	 (03PS1) 10Zabe: chartmuseum: Add SPDX headers to chartmuseum profile [puppet] - 10https://gerrit.wikimedia.org/r/814878 (https://phabricator.wikimedia.org/T308013)
[18:39:59] <wikibugs>	 (03PS1) 10Zabe: codesearch: Add SPDX headers to codesearch profile [puppet] - 10https://gerrit.wikimedia.org/r/814879 (https://phabricator.wikimedia.org/T308013)
[18:40:01] <wikibugs>	 (03PS1) 10Zabe: conftool: Add SPDX headers to conftool profile [puppet] - 10https://gerrit.wikimedia.org/r/814880 (https://phabricator.wikimedia.org/T308013)
[18:40:03] <wikibugs>	 (03PS1) 10Zabe: cumin: Add SPDX headers to cumin profile [puppet] - 10https://gerrit.wikimedia.org/r/814881 (https://phabricator.wikimedia.org/T308013)
[18:40:07] <wikibugs>	 (03PS1) 10Zabe: dbbackups: Add SPDX headers to dbbackups profile [puppet] - 10https://gerrit.wikimedia.org/r/814882 (https://phabricator.wikimedia.org/T308013)
[18:40:09] <wikibugs>	 (03PS1) 10Zabe: debdeploy: Add SPDX headers to debdeploy profile [puppet] - 10https://gerrit.wikimedia.org/r/814883 (https://phabricator.wikimedia.org/T308013)
[18:40:11] <wikibugs>	 (03PS1) 10Zabe: diffscan: Add SPDX headers to diffscan profile [puppet] - 10https://gerrit.wikimedia.org/r/814884 (https://phabricator.wikimedia.org/T308013)
[18:41:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31385 and previous config saved to /var/cache/conftool/dbconfig/20220718-184146-root.json
[18:42:50] <wikibugs>	 (03PS4) 10Ebernhardson: Remove references to ApiFeatureUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814871 (https://phabricator.wikimedia.org/T313248)
[18:42:52] <wikibugs>	 (03PS3) 10Ebernhardson: Remove i18n and IS references to ApiFeatureUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814873 (https://phabricator.wikimedia.org/T313248)
[18:43:48] <wikibugs>	 (03PS2) 10Zabe: cassandra: Add SPDX headers to cassandra profile [puppet] - 10https://gerrit.wikimedia.org/r/814876 (https://phabricator.wikimedia.org/T308013)
[18:48:39] <wikibugs>	 (03PS2) 10Zabe: conftool: Add SPDX headers to conftool profile [puppet] - 10https://gerrit.wikimedia.org/r/814880 (https://phabricator.wikimedia.org/T308013)
[18:53:52] <wikibugs>	 (03PS2) 10Zabe: cumin: Add SPDX headers to cumin profile [puppet] - 10https://gerrit.wikimedia.org/r/814881 (https://phabricator.wikimedia.org/T308013)
[18:57:55] <wikibugs>	 (03PS2) 10Zabe: dbbackups: Add SPDX headers to dbbackups profile [puppet] - 10https://gerrit.wikimedia.org/r/814882 (https://phabricator.wikimedia.org/T308013)
[19:00:28] <wikibugs>	 (03PS2) 10Zabe: debdeploy: Add SPDX headers to debdeploy profile [puppet] - 10https://gerrit.wikimedia.org/r/814883 (https://phabricator.wikimedia.org/T308013)
[19:02:56] <logmsgbot>	 !log ryankemper@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
[19:03:12] <wikibugs>	 (03CR) 10Clare Ming: [C: 03+1] Collapse sidebar by default for anonymous users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814865 (https://phabricator.wikimedia.org/T287609) (owner: 10Jdlrobson)
[19:04:12] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
[19:09:52] <wikibugs>	 (03CR) 10Cwhite: logstash: enable pipeline-managed index patterns (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/799001 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[19:11:42] <wikibugs>	 (03PS1) 10Andrew Bogott: magnum.conf: use trustee_domain_admin_domain_name instead of _id [puppet] - 10https://gerrit.wikimedia.org/r/814886
[19:12:52] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] magnum.conf: use trustee_domain_admin_domain_name instead of _id [puppet] - 10https://gerrit.wikimedia.org/r/814886 (owner: 10Andrew Bogott)
[19:13:45] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] Mentorship: enable the Vue version of the dashboard in test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814789 (https://phabricator.wikimedia.org/T300532) (owner: 10Sergio Gimeno)
[19:19:53] <wikibugs>	 (03CR) 10Cwhite: logstash: duplicate alert logs for loki target (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/806349 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[19:20:27] <wikibugs>	 10SRE, 10Gerrit, 10serviceops, 10serviceops-collab, 10Release-Engineering-Team (The Decommission Mission 💀): replacement for gerrit2001 - https://phabricator.wikimedia.org/T243027 (10dancy)
[19:31:05] <wikibugs>	 (03PS2) 10Urbanecm: [beta] GrowthExperiments: Remove variables that are primarily set on-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/811663
[19:31:19] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "beta-only, should be no-op" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/811663 (owner: 10Urbanecm)
[19:31:59] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Change formatting of a few openstack calls [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810107 (owner: 10Andrew Bogott)
[19:32:12] <wikibugs>	 (03PS5) 10Andrew Bogott: Change formatting of a few openstack calls [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810107
[19:32:16] <wikibugs>	 (03Merged) 10jenkins-bot: [beta] GrowthExperiments: Remove variables that are primarily set on-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/811663 (owner: 10Urbanecm)
[19:35:07] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "Code is good and happy to deploy this. I have few questions (see the CU patch), but no issues with pinning this variable to SCHEMA_COMPAT_" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814305 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[19:38:06] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] "Please optimize the SVG files (see https://www.mediawiki.org/wiki/Manual:Assets#SVG_files for details on how). For any SVG resource in the" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814372 (https://phabricator.wikimedia.org/T313194) (owner: 10Tks4Fish)
[19:40:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[19:41:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[19:41:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[19:42:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[19:42:42] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] "one additional request: would it be possible to document the SVG file you used? a link to commons on the task would be ideal." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814372 (https://phabricator.wikimedia.org/T313194) (owner: 10Tks4Fish)
[19:43:46] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "Backport, starting CI slightly ahead B&C to save a bit of time." [core] (wmf/1.39.0-wmf.19) - 10https://gerrit.wikimedia.org/r/814769 (https://phabricator.wikimedia.org/T313188) (owner: 10Bartosz Dziewoński)
[19:45:25] <logmsgbot>	 !log ryankemper@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
[19:52:11] <wikibugs>	 (03PS1) 10Andrew Bogott: Make cloudcontrol100[67] into live cloudcontrol nodes [puppet] - 10https://gerrit.wikimedia.org/r/814890 (https://phabricator.wikimedia.org/T306853)
[19:54:59] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Make cloudcontrol100[67] into live cloudcontrol nodes [puppet] - 10https://gerrit.wikimedia.org/r/814890 (https://phabricator.wikimedia.org/T306853) (owner: 10Andrew Bogott)
[20:00:02] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] profile: make loki data directory configurable [puppet] - 10https://gerrit.wikimedia.org/r/813715 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[20:00:04] <jouncebot>	 RoanKattouw, Urbanecm, and cjming: #bothumor My software never has bugs. It just develops random features. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220718T2000).
[20:00:04] <jouncebot>	 zabe, Tks4Fish, sergi0, MatmaRex, ebernhardson, and Jdlrobson: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:11] <urbanecm>	 hi! i can deploy today
[20:00:14] <ebernhardson>	 \o
[20:00:16] <sergi0>	 hello
[20:00:19] <urbanecm>	 we're quite full today
[20:00:32] <MatmaRex>	 hi
[20:01:23] <urbanecm>	 it's likely we won't have time for some of the patches: are there any urgent patches that need to go out today? similarly, are there any non-urgent patches that can be skipped if needed? (thanks Jdlrobson for providing this info in the calendar)
[20:01:33] <zabe>	 hey 
[20:01:58] <wikibugs>	 (03PS2) 10Urbanecm: Mentorship: enable the Vue version of the dashboard in test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814789 (https://phabricator.wikimedia.org/T300532) (owner: 10Sergio Gimeno)
[20:02:03] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Mentorship: enable the Vue version of the dashboard in test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814789 (https://phabricator.wikimedia.org/T300532) (owner: 10Sergio Gimeno)
[20:02:14] <Jdlrobson>	 o/ present
[20:02:31] <wikibugs>	 (03PS1) 10Ahmon Dancy: Handle socket.timeout the same way as TimeoutError [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/814893
[20:02:32] <sergi0>	 urbanecm: we can re-schedule 814789 for tomorrow if that helps
[20:02:36] <Jdlrobson>	 urbanecm: i was worried it would be busy this morning :)
[20:03:00] <wikibugs>	 (03Merged) 10jenkins-bot: Ensure custom locales for Moment.js overrides, don't change 'en' [core] (wmf/1.39.0-wmf.19) - 10https://gerrit.wikimedia.org/r/814769 (https://phabricator.wikimedia.org/T313188) (owner: 10Bartosz Dziewoński)
[20:03:11] <wikibugs>	 (03Merged) 10jenkins-bot: Mentorship: enable the Vue version of the dashboard in test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814789 (https://phabricator.wikimedia.org/T300532) (owner: 10Sergio Gimeno)
[20:03:52] <ebernhardson>	 urbanecm: mine isn't particularly time critical, but i would like to get moving on the things it blocks.
[20:03:52] <urbanecm>	 sergi0: MatmaRex: your patches are at mwdebug1001, can you check?
[20:04:10] <sergi0>	 checking
[20:04:21] <wikibugs>	 (03PS2) 10Urbanecm: Collapse sidebar by default for anonymous users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814865 (https://phabricator.wikimedia.org/T287609) (owner: 10Jdlrobson)
[20:04:27] <MatmaRex>	 looking
[20:04:28] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Collapse sidebar by default for anonymous users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814865 (https://phabricator.wikimedia.org/T287609) (owner: 10Jdlrobson)
[20:05:20] <MatmaRex>	 urbanecm: looks good
[20:05:21] <wikibugs>	 (03Merged) 10jenkins-bot: Collapse sidebar by default for anonymous users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814865 (https://phabricator.wikimedia.org/T287609) (owner: 10Jdlrobson)
[20:05:23] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Handle socket.timeout the same way as TimeoutError [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/814893 (owner: 10Ahmon Dancy)
[20:05:24] <sergi0>	 urbanecm: all good from my end
[20:05:52] <urbanecm>	 zabe: hi, i commented on the CU patch. can you confirm beta should be pinned to old too?
[20:05:55] <urbanecm>	 thanks MatmaRex and sergi0, syncing
[20:06:30] <zabe>	 urbanecm, there is not checkuser on beta
[20:06:37] <zabe>	 s/not/no
[20:06:46] <urbanecm>	 somewhat, i forgot about that.
[20:06:53] <urbanecm>	 all good then :)
[20:07:15] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:07:57] <wikibugs>	 (03PS2) 10Urbanecm: Pin cu_log actor migration to old schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814305 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[20:08:07] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Pin cu_log actor migration to old schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814305 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[20:08:16] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:08:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:09:03] <wikibugs>	 (03Merged) 10jenkins-bot: Pin cu_log actor migration to old schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814305 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[20:09:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:10:35] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 76b7cd6379c25175570eeeb2a305de0fd0bc61e5: Mentorship: enable the Vue version of the dashboard in test (T300532) (duration: 03m 00s)
[20:10:39] <stashbot>	 T300532: Migration of mentee overview to Vue - https://phabricator.wikimedia.org/T300532
[20:11:18] <wikibugs>	 (03PS1) 10BCornwall: Icinga: Remove traffic alerts [puppet] - 10https://gerrit.wikimedia.org/r/814894 (https://phabricator.wikimedia.org/T300723)
[20:11:41] <urbanecm>	 Jdlrobson: your patch is at mwdebug1001, can you check?
[20:11:49] <urbanecm>	 (syncing the wmf.19 backport in the meanwhile)
[20:13:20] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.39.0-wmf.19/resources/src/moment/moment-locale-overrides.js: c4d8a217b4ce0a9f7aefaacc032136e7eb058d4d: Ensure custom locales for Moment.js overrides, dont change en (T313188) (duration: 02m 44s)
[20:13:24] <stashbot>	 T313188: Most of the reply links don't work on ckbwiki - https://phabricator.wikimedia.org/T313188
[20:13:48] <Jdlrobson>	 urbanecm: on it
[20:14:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:14:29] <wikibugs>	 (03Abandoned) 10Urbanecm: reindex: Detect index type from live mappings [extensions/CirrusSearch] (wmf/1.39.0-wmf.20) - 10https://gerrit.wikimedia.org/r/814770 (owner: 10Ebernhardson)
[20:14:34] <urbanecm>	 thanks
[20:14:40] <Jdlrobson>	 urbanecm: LGTM! feel free to sync!
[20:14:42] <urbanecm>	 ebernhardson: abandoned the wmf.20 version, as wmf.20 will not be deployed.
[20:14:45] <urbanecm>	 Jdlrobson: syncing, thanks!
[20:15:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:15:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:15:55] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "backport" [extensions/CirrusSearch] (wmf/1.39.0-wmf.19) - 10https://gerrit.wikimedia.org/r/814771 (owner: 10Ebernhardson)
[20:16:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:18:16] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 415c4ef44d9bf1abab6942fbbc552990a8e992c8: Collapse sidebar by default for anonymous users (T287609) (duration: 02m 41s)
[20:18:22] <stashbot>	 T287609: Collapse sidebar by default for logged-out people - https://phabricator.wikimedia.org/T287609
[20:18:23] <urbanecm>	 Jdlrobson: fyi, https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/814867 looks to have an unanswered comment by cj.ming. can you check it please? :)
[20:18:26] <urbanecm>	 (patch deployed)
[20:18:27] <ebernhardson>	 urbanecm: oh, i didn't remember that (but now that you mention it, i remember the email notice). thakns
[20:18:39] <Jdlrobson>	 urbanecm: looking
[20:18:56] <Jdlrobson>	 ah yeh that makes sense. I'll amend it now
[20:19:05] <urbanecm>	 zabe: just syncing yours, as there is nothing to test anyway :)
[20:19:16] <zabe>	 ok
[20:19:32] <urbanecm>	 thanks Jdlrobson :)
[20:20:30] <wikibugs>	 (03PS2) 10Jdlrobson: Enable language switching button for logged-out users on non-pilot wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814867 (https://phabricator.wikimedia.org/T312861)
[20:21:25] <wikibugs>	 (03PS3) 10Urbanecm: Enable language switching button for logged-out users on non-pilot wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814867 (https://phabricator.wikimedia.org/T312861) (owner: 10Jdlrobson)
[20:21:29] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Enable language switching button for logged-out users on non-pilot wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814867 (https://phabricator.wikimedia.org/T312861) (owner: 10Jdlrobson)
[20:21:34] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: f99c5331380a8c03f4c447e2f73cb76afca337a2: Pin cu_log actor migration to old schema (T233004) (duration: 02m 41s)
[20:21:38] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[20:22:34] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Enable language switching button for logged-out users on non-pilot wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814867 (https://phabricator.wikimedia.org/T312861) (owner: 10Jdlrobson)
[20:23:24] <wikibugs>	 (03Merged) 10jenkins-bot: Enable language switching button for logged-out users on non-pilot wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814867 (https://phabricator.wikimedia.org/T312861) (owner: 10Jdlrobson)
[20:23:50] <urbanecm>	 Jdlrobson: pulled r814867 to mwdebug1001, can you check please?
[20:24:23] <Jdlrobson>	 checking
[20:25:29] <Jdlrobson>	 thanks urbanecm 
[20:25:43] <wikibugs>	 (03CR) 10Urbanecm: "I must be missing something here. This appears to be already the case at wikisources?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814868 (https://phabricator.wikimedia.org/T311607) (owner: 10Jdlrobson)
[20:25:55] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2009 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[20:26:15] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:26:19] <wikibugs>	 (03PS1) 10Andrew Bogott: acme_chief: allow access for cloudcontrol100[67] [puppet] - 10https://gerrit.wikimedia.org/r/814895 (https://phabricator.wikimedia.org/T306853)
[20:27:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:27:19] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:27:41] <Jdlrobson>	 urbanecm: looking into the wikisource issue.. the max width appears to be enabled on https://en.wikisource.org/wiki/Popular_Science_Monthly/Volume_31/May_1887/Megalithic_Monuments_in_Spain_and_Portugal and we want to remove it.. trying to work out why
[20:28:03] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] acme_chief: allow access for cloudcontrol100[67] [puppet] - 10https://gerrit.wikimedia.org/r/814895 (https://phabricator.wikimedia.org/T306853) (owner: 10Andrew Bogott)
[20:28:13] <urbanecm>	 Jdlrobson: does that mean r814867 can be synced? or are you still testing? :)
[20:28:17] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1009 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[20:28:19] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:28:41] <Jdlrobson>	 814867 can be synced
[20:28:53] <urbanecm>	 syncing
[20:30:16] <wikibugs>	 (03CR) 10Jdlrobson: [C: 04-1] "Fixing. Config name was unintuitive :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814868 (https://phabricator.wikimedia.org/T311607) (owner: 10Jdlrobson)
[20:30:49] <urbanecm>	 looks like you figured the wikisources out, lmk if i can help with that.
[20:31:27] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1028 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:31:37] <wikibugs>	 (03PS2) 10Jdlrobson: Turn off fixed width in main namespace on Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814868 (https://phabricator.wikimedia.org/T311607)
[20:31:57] <Jdlrobson>	 ok urbanecm thanks for the eagle eyes on that patch :) saved us a few minutes of "why is this not working?" :)
[20:32:04] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Turn off fixed width in main namespace on Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814868 (https://phabricator.wikimedia.org/T311607) (owner: 10Jdlrobson)
[20:32:13] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 1c258b25e8a47caf9d531f01798d32cd3f9b1605: Enable language switching button for logged-out users on non-pilot wikis (T312861) (duration: 02m 43s)
[20:32:15] <wikibugs>	 (03PS3) 10Jdlrobson: Turn off fixed width in main namespace on Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814868 (https://phabricator.wikimedia.org/T311607)
[20:32:18] <stashbot>	 T312861: Enable language switching button for logged-out users on non-pilot wikis - https://phabricator.wikimedia.org/T312861
[20:32:22] <urbanecm>	 Jdlrobson: always happy to help!
[20:33:24] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Turn off fixed width in main namespace on Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814868 (https://phabricator.wikimedia.org/T311607) (owner: 10Jdlrobson)
[20:33:29] <urbanecm>	 let's try it out
[20:34:29] <wikibugs>	 (03Merged) 10jenkins-bot: Turn off fixed width in main namespace on Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814868 (https://phabricator.wikimedia.org/T311607) (owner: 10Jdlrobson)
[20:34:54] <urbanecm>	 Jdlrobson: pulled to mwdebug1001, can you test?
[20:35:05] <Jdlrobson>	 urbanecm: testing
[20:35:27] <Jdlrobson>	 urbanecm: yep that looks like it worked!
[20:35:49] <wikibugs>	 (03Merged) 10jenkins-bot: reindex: Detect index type from live mappings [extensions/CirrusSearch] (wmf/1.39.0-wmf.19) - 10https://gerrit.wikimedia.org/r/814771 (owner: 10Ebernhardson)
[20:36:08] <Jdlrobson>	 Please sync
[20:36:42] <urbanecm>	 great! syncing
[20:37:19] <urbanecm>	 ebernhardson: pulled your patch to mwdebug1001, if it's testable there
[20:37:25] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1028 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:37:48] <ebernhardson>	 urbanecm: yup, i can reindex testwiki. will take a couple minutes to run
[20:38:04] <urbanecm>	 ebernhardson: does that work with debug servers?
[20:38:14] <ebernhardson>	 urbanecm: yea, it just sends some requests to elasticsearch and then waits
[20:38:19] <Jdlrobson>	 urbanecm: thanks for all the help today!
[20:38:25] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:39:11] <urbanecm>	 ebernhardson: i see. I'll leave it up to you: if you think the test's useful, feel free to do it, otherwise, i can sync it directly too.
[20:39:27] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:39:28] <ebernhardson>	 urbanecm: it's running now, which means it already got past the point that it used to fail and the patch should work
[20:39:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:39:58] <urbanecm>	 sounds great! i'll sync it :)
[20:40:24] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 8d1663c93d2ddeb107d5f9b8982a7f4a7b880aba: Turn off fixed width in main namespace on Wikisource ( T311607) (duration: 02m 41s)
[20:40:27] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:40:27] <stashbot>	 T311607: Turn off fixed width in main namespace on Wikisource - https://phabricator.wikimedia.org/T311607
[20:40:45] <urbanecm>	 Jdlrobson: should be live!
[20:45:13] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.39.0-wmf.19/extensions/CirrusSearch/: 930ecb76a5a9266d498f40b49ab5ff82c01dbcf5: reindex: Detect index type from live mappings (duration: 02m 55s)
[20:45:23] <urbanecm>	 ebernhardson: and, should be live
[20:45:35] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:45:36] <ebernhardson>	 urbanecm: thanks!
[20:45:41] <urbanecm>	 np
[20:45:53] <urbanecm>	 !log UTC late B&C window finished
[20:45:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:34] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:46:35] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:47:23] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:49:54] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudcontrol100[6-7].wikimedia.org - https://phabricator.wikimedia.org/T306853 (10Andrew)
[20:50:59] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudcontrol100[6-7].wikimedia.org - https://phabricator.wikimedia.org/T306853 (10Andrew) These hosts are now in service and seem to be working.
[20:58:05] <ebernhardson>	 !log start reindex of all wikis except commonswiki and wikidatawiki in eqiad and codfw cirrus clusters
[20:58:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:05] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: Dear deployers, time to do the Weekly Security deployment window deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220718T2100).
[21:01:49] <wikibugs>	 (03CR) 10Jdlrobson: Deploy the new grid layout (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814869 (https://phabricator.wikimedia.org/T312241) (owner: 10Jdlrobson)
[21:28:30] <sbassett>	 Hey all - would like to quickly deploy a sec patch for T309894.  Let me know if I should wait.
[21:31:28] <bd808>	 jouncebot: now
[21:31:28] <jouncebot>	 For the next 1 hour(s) and 28 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220718T2100)
[21:31:54] <bd808>	 sbassett: ^ looks like you already have the conch :)
[21:32:16] <sbassett>	 bd808: yep, I just always like to double-check in case someone is fighting a fire :)
[21:34:00] <rzl>	 no fires yet, have at it :D
[21:36:02] <sbassett>	 !log Deployed security fix for T309894
[21:36:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:42:50] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[21:45:25] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[21:45:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[21:46:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[22:01:19] <wikibugs>	 (03PS2) 10Jdlrobson: Deploy the new grid layout to group 0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814869 (https://phabricator.wikimedia.org/T312241)
[22:01:21] <wikibugs>	 (03PS1) 10Jdlrobson: Deploy the new grid layout to group 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814906 (https://phabricator.wikimedia.org/T312241)
[22:01:23] <wikibugs>	 (03PS1) 10Jdlrobson: Deploy grid to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814907 (https://phabricator.wikimedia.org/T312241)
[22:01:42] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Deploy the new grid layout to group 0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814869 (https://phabricator.wikimedia.org/T312241) (owner: 10Jdlrobson)
[22:01:57] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Deploy the new grid layout to group 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814906 (https://phabricator.wikimedia.org/T312241) (owner: 10Jdlrobson)
[22:02:05] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus: Dont recycle completion suggester indices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814908
[22:02:07] <wikibugs>	 (03PS1) 10Ebernhardson: Revert "cirrus: Dont recycle completion suggester indices" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814909
[22:02:15] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Deploy grid to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814907 (https://phabricator.wikimedia.org/T312241) (owner: 10Jdlrobson)
[22:05:34] <wikibugs>	 (03PS3) 10Jdlrobson: Deploy the new grid layout to group 0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814869 (https://phabricator.wikimedia.org/T312241)
[22:05:54] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Deploy the new grid layout to group 0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814869 (https://phabricator.wikimedia.org/T312241) (owner: 10Jdlrobson)
[22:06:01] <wikibugs>	 (03PS4) 10Jdlrobson: Deploy the new grid layout to group 0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814869 (https://phabricator.wikimedia.org/T312241)
[22:06:09] <wikibugs>	 (03PS5) 10Jdlrobson: Deploy the new grid layout to group 0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814869 (https://phabricator.wikimedia.org/T312241)
[22:06:21] <wikibugs>	 (03PS2) 10Jdlrobson: Deploy the new grid layout to group 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814906 (https://phabricator.wikimedia.org/T312241)
[22:06:25] <wikibugs>	 (03PS2) 10Jdlrobson: Deploy grid to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814907 (https://phabricator.wikimedia.org/T312241)
[22:06:43] <wikibugs>	 (03PS3) 10Jdlrobson: Deploy grid to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814907 (https://phabricator.wikimedia.org/T312241)
[22:06:58] <wikibugs>	 (03PS4) 10Jdlrobson: Deploy grid to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814907 (https://phabricator.wikimedia.org/T312241)
[22:07:33] <icinga-wm>	 PROBLEM - SSH on wtp1038.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:09:52] <wikibugs>	 (03CR) 10Jdlrobson: Deploy the new grid layout to group 0 wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814869 (https://phabricator.wikimedia.org/T312241) (owner: 10Jdlrobson)
[22:13:27] <icinga-wm>	 PROBLEM - SSH on restbase2012.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:26:10] <wikibugs>	 (03PS1) 10Andrew Bogott: Install nova on new cloudvirt hosts [puppet] - 10https://gerrit.wikimedia.org/r/814911 (https://phabricator.wikimedia.org/T305194)
[22:26:17] <icinga-wm>	 PROBLEM - SSH on mw1321.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:32:19] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] hiera: deploy and enable loki on grafana hosts [puppet] - 10https://gerrit.wikimedia.org/r/813724 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[22:32:30] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Install nova on new cloudvirt hosts [puppet] - 10https://gerrit.wikimedia.org/r/814911 (https://phabricator.wikimedia.org/T305194) (owner: 10Andrew Bogott)
[22:33:19] <cwhite>	 andrewbogott: merged yours as well
[22:33:28] <andrewbogott>	 thank you!
[22:41:49] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[22:42:23] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[22:56:57] <wikibugs>	 (03PS1) 10Andrew Bogott: openstack::nova::compute::service: don't add 'nova' user to libvirt group [puppet] - 10https://gerrit.wikimedia.org/r/814913 (https://phabricator.wikimedia.org/T309342)
[23:07:01] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudvirt1049.eqiad.wmnet
[23:07:45] <icinga-wm>	 RECOVERY - SSH on wtp1038.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:13:51] <icinga-wm>	 RECOVERY - SSH on restbase2012.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:17:55] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1052 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:18:52] <wikibugs>	 (03PS1) 10Cwhite: logstash: enable loki public output on production [puppet] - 10https://gerrit.wikimedia.org/r/814915 (https://phabricator.wikimedia.org/T222826)
[23:19:25] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt1049.eqiad.wmnet
[23:19:55] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1049 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:22:49] <wikibugs>	 (03CR) 10Cwhite: "PCC checks out: https://puppet-compiler.wmflabs.org/pcc-worker1003/36284/" [puppet] - 10https://gerrit.wikimedia.org/r/814915 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[23:23:00] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1050 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:27:25] <icinga-wm>	 RECOVERY - SSH on mw1321.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:28:05] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1053 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:28:26] <wikibugs>	 (03PS2) 10Hashar: Send events to Wikimedia EventGate [software/gerrit/plugins/events-wikimedia] - 10https://gerrit.wikimedia.org/r/814807
[23:29:20] <wikibugs>	 (03CR) 10Hashar: "And on commenting on a change I get in the error log:" [software/gerrit/plugins/events-wikimedia] - 10https://gerrit.wikimedia.org/r/814807 (owner: 10Hashar)
[23:29:41] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1048 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:35:22] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt1048 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 Andrew Bogott new hosts, in progress https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:35:22] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt1049 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 Andrew Bogott new hosts, in progress https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:35:23] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt1050 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 Andrew Bogott new hosts, in progress https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:35:24] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt1052 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 Andrew Bogott new hosts, in progress https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:35:25] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt1053 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 Andrew Bogott new hosts, in progress https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:46:41] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudvirt1050.eqiad.wmnet
[23:50:23] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1051 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:53:29] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1048 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:56:33] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1049 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:56:53] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1051 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:57:27] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1050 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:58:05] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1050.eqiad.wmnet