[00:06:59] <icinga-wm>	 PROBLEM - Check systemd state on grafana1002 is CRITICAL: CRITICAL - degraded: The following units failed: grafana-ldap-users-sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:07:05] <topranks>	 !log disabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.
[00:07:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:12:04] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] check_mw_versions.py: Fix problem induced by recent scap changes [puppet] - 10https://gerrit.wikimedia.org/r/767242 (https://phabricator.wikimedia.org/T302832) (owner: 10Ahmon Dancy)
[00:15:31] <wikibugs>	 (03CR) 10Krinkle: "It is my current understanding and expectation that if I change something in -staging on the deployment host, lock it, and sync this no-wh" [puppet] - 10https://gerrit.wikimedia.org/r/767242 (https://phabricator.wikimedia.org/T302832) (owner: 10Ahmon Dancy)
[00:15:40] <topranks>	 !log Re-enabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.
[00:15:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:25:37] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_wdqs_updater in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[00:32:53] <jinxer-wm>	 (Traffic bill over quota) firing: Alert for device cr2-eqsin.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org
[00:35:48] <wikibugs>	 (03PS1) 10Ebernhardson: query_service: Include scheme and host in X-redirect-url [puppet] - 10https://gerrit.wikimedia.org/r/767259
[00:38:12] <wikibugs>	 (03PS2) 10Ebernhardson: query_service: Include scheme and host in X-redirect-url [puppet] - 10https://gerrit.wikimedia.org/r/767259
[00:39:03] <wikibugs>	 (03CR) 10Ebernhardson: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/767259 (owner: 10Ebernhardson)
[00:52:53] <jinxer-wm>	 (Traffic bill over quota) resolved: Alert for device cr2-eqsin.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org
[01:31:23] <icinga-wm>	 RECOVERY - Disk space on centrallog1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=centrallog1001&var-datasource=eqiad+prometheus/ops
[01:37:32] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_wdqs_updater in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[01:37:39] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp6009 is CRITICAL: reload-vcl failed to run since 4h, 29 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[01:40:41] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_wdqs_updater in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[01:52:59] <icinga-wm>	 PROBLEM - SSH on thumbor2003.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:37:49] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp6009 is CRITICAL: reload-vcl failed to run since 5h, 29 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[02:42:10] <jinxer-wm>	 (ProbeHttpFailed) firing: (2) URL did not return HTTP 2xx or 3xx response (or probe/connection failed) - https://wikitech.wikimedia.org/wiki/Prometheus#Watchrat_Non-23xx_HTTP_response - https://grafana.wikimedia.org/d/GYciEga7z/watchrat - https://alerts.wikimedia.org
[02:43:05] <icinga-wm>	 PROBLEM - traffic_server tls process restarted on cp6009 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=drmrs+prometheus/ops&var-instance=cp6009&var-layer=tls
[02:51:29] <wikibugs>	 10SRE, 10SRE Observability (FY2021/2022-Q3): SLO dashboard refinements - https://phabricator.wikimedia.org/T302842 (10lmata)
[02:51:51] <wikibugs>	 10SRE, 10SRE Observability (FY2021/2022-Q3): SLO dashboard refinements - https://phabricator.wikimedia.org/T302842 (10lmata) p:05Triage→03Medium
[02:52:09] <wikibugs>	 10SRE, 10SRE Observability (FY2021/2022-Q3): SLO dashboard refinements - https://phabricator.wikimedia.org/T302842 (10lmata) a:03herron
[02:54:29] <wikibugs>	 10SRE, 10SRE Observability (FY2021/2022-Q3): SLO dashboard refinements - https://phabricator.wikimedia.org/T302842 (10lmata) Hi @RLazarus,   Will discuss with @herron and address the feedback with any notes. Thanks!
[03:00:54] <wikibugs>	 (03CR) 10TsepoThoabala: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/766882 (https://phabricator.wikimedia.org/T296499) (owner: 10STran)
[03:44:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[03:44:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[03:44:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[03:44:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:50] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[03:44:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21632 and previous config saved to /var/cache/conftool/dbconfig/20220302-034454-ladsgroup.json
[03:44:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
[03:44:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:57] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[03:44:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:58] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
[03:44:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:45:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21633 and previous config saved to /var/cache/conftool/dbconfig/20220302-034502-ladsgroup.json
[03:45:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:45:05] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[03:47:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21634 and previous config saved to /var/cache/conftool/dbconfig/20220302-034715-ladsgroup.json
[03:47:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:48:56] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.reimage for host db1104.eqiad.wmnet with OS bullseye
[03:48:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:57:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage
[03:57:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:00:27] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage
[04:00:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:02:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21635 and previous config saved to /var/cache/conftool/dbconfig/20220302-040220-ladsgroup.json
[04:02:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:15:32] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1104.eqiad.wmnet with OS bullseye
[04:15:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:17:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21636 and previous config saved to /var/cache/conftool/dbconfig/20220302-041725-ladsgroup.json
[04:17:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:20:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21637 and previous config saved to /var/cache/conftool/dbconfig/20220302-042012-ladsgroup.json
[04:20:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:20:15] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[04:25:07] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Accidentally unsubscribed everyone from open-glam mailing list - https://phabricator.wikimedia.org/T302816 (10Ladsgroup) Unfortunately, I don't' think I can get it back from the binlog because the removal queries is not like list_id = 'open-glam.lists.wikim...
[04:32:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21638 and previous config saved to /var/cache/conftool/dbconfig/20220302-043229-ladsgroup.json
[04:32:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[04:32:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:32:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[04:32:33] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[04:32:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:32:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:32:49] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[04:32:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:32:51] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[04:32:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:33:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[04:33:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:33:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[04:33:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:33:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21639 and previous config saved to /var/cache/conftool/dbconfig/20220302-043313-ladsgroup.json
[04:33:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:34:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21640 and previous config saved to /var/cache/conftool/dbconfig/20220302-043433-ladsgroup.json
[04:34:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:35:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21641 and previous config saved to /var/cache/conftool/dbconfig/20220302-043516-ladsgroup.json
[04:35:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:42:28] <icinga-wm>	 RECOVERY - Check systemd state on mirror1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:42:48] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/page/mobile-html-offline-resources/{title} (Get offline resource links to accompany page content HTML for test page) is CRITICAL: Test Get offline resource links to accompany page content HTML for test page returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[04:45:10] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[04:46:32] <icinga-wm>	 PROBLEM - Number of messages locally queued by purged for processing on cp6009 is CRITICAL: cluster=cache_text instance=cp6009 job=purged layer=frontend site=drmrs https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=drmrs+prometheus/ops&var-instance=cp6009
[04:49:02] <icinga-wm>	 RECOVERY - Number of messages locally queued by purged for processing on cp6009 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=drmrs+prometheus/ops&var-instance=cp6009
[04:49:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21642 and previous config saved to /var/cache/conftool/dbconfig/20220302-044938-ladsgroup.json
[04:49:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:50:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21643 and previous config saved to /var/cache/conftool/dbconfig/20220302-045021-ladsgroup.json
[04:50:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:03:44] <wikibugs>	 (03PS1) 10Ladsgroup: db1101: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/767279 (https://phabricator.wikimedia.org/T302185)
[05:04:01] <wikibugs>	 (03PS2) 10Ladsgroup: db1101: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/767279 (https://phabricator.wikimedia.org/T302185)
[05:04:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21644 and previous config saved to /var/cache/conftool/dbconfig/20220302-050442-ladsgroup.json
[05:04:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:05:03] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] db1101: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/767279 (https://phabricator.wikimedia.org/T302185) (owner: 10Ladsgroup)
[05:05:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21645 and previous config saved to /var/cache/conftool/dbconfig/20220302-050526-ladsgroup.json
[05:05:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:05:29] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[05:16:03] <wikibugs>	 10SRE, 10WMF-General-or-Unknown, 10WMF-Legal, 10Documentation, and 2 others: Default license for operations/puppet - https://phabricator.wikimedia.org/T67270 (10Ladsgroup) I hereby license all my existing contributions to the operations/puppet under the Apache 2.0 license
[05:18:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[05:18:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:18:49] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[05:18:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:18:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21646 and previous config saved to /var/cache/conftool/dbconfig/20220302-051853-ladsgroup.json
[05:18:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:18:56] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[05:19:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21647 and previous config saved to /var/cache/conftool/dbconfig/20220302-051947-ladsgroup.json
[05:19:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:19:51] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[05:20:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21648 and previous config saved to /var/cache/conftool/dbconfig/20220302-052033-ladsgroup.json
[05:20:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:23:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.reimage for host db1101.eqiad.wmnet with OS bullseye
[05:23:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:23:48] <icinga-wm>	 PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (phab1001), Fresh: 104 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[05:32:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage
[05:32:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:34:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage
[05:34:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:40:37] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_wdqs_updater in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[05:46:26] <icinga-wm>	 PROBLEM - Number of messages locally queued by purged for processing on cp6009 is CRITICAL: cluster=cache_text instance=cp6009 job=purged layer=frontend site=drmrs https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=drmrs+prometheus/ops&var-instance=cp6009
[05:48:24] <icinga-wm>	 RECOVERY - Number of messages locally queued by purged for processing on cp6009 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=drmrs+prometheus/ops&var-instance=cp6009
[05:48:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1101.eqiad.wmnet with OS bullseye
[05:48:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21649 and previous config saved to /var/cache/conftool/dbconfig/20220302-055419-ladsgroup.json
[05:54:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:23] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[06:09:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21650 and previous config saved to /var/cache/conftool/dbconfig/20220302-060924-ladsgroup.json
[06:09:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:24:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21651 and previous config saved to /var/cache/conftool/dbconfig/20220302-062428-ladsgroup.json
[06:24:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:39:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21652 and previous config saved to /var/cache/conftool/dbconfig/20220302-063933-ladsgroup.json
[06:39:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:39:37] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[06:39:50] <icinga-wm>	 PROBLEM - SSH on mw2258.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:41:20] <icinga-wm>	 PROBLEM - SSH on kubernetes2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:42:10] <jinxer-wm>	 (ProbeHttpFailed) firing: (2) URL did not return HTTP 2xx or 3xx response (or probe/connection failed) - https://wikitech.wikimedia.org/wiki/Prometheus#Watchrat_Non-23xx_HTTP_response - https://grafana.wikimedia.org/d/GYciEga7z/watchrat - https://alerts.wikimedia.org
[06:50:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21653 and previous config saved to /var/cache/conftool/dbconfig/20220302-065056-ladsgroup.json
[06:50:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:51:00] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[07:00:02] <icinga-wm>	 RECOVERY - SSH on thumbor2003.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:01:52] <wikibugs>	 (03PS11) 10Giuseppe Lavagetto: conftool: add request-actions / request-patterns [puppet] - 10https://gerrit.wikimedia.org/r/763486
[07:01:59] <wikibugs>	 (03PS1) 10Ladsgroup: Revert "db1104: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767090
[07:02:26] <wikibugs>	 (03PS1) 10Ladsgroup: Revert "db1114: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767091
[07:02:42] <wikibugs>	 (03PS1) 10Ladsgroup: Revert "db1177: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767092
[07:02:50] <wikibugs>	 (03PS2) 10Ladsgroup: Revert "db1104: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767090
[07:02:55] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Revert "db1104: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767090 (owner: 10Ladsgroup)
[07:03:07] <wikibugs>	 (03PS2) 10Ladsgroup: Revert "db1114: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767091
[07:03:10] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Revert "db1114: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767091 (owner: 10Ladsgroup)
[07:03:23] <wikibugs>	 (03PS2) 10Ladsgroup: Revert "db1177: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767092
[07:03:28] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Revert "db1177: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767092 (owner: 10Ladsgroup)
[07:06:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21654 and previous config saved to /var/cache/conftool/dbconfig/20220302-070601-ladsgroup.json
[07:06:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:09:36] <_joe_>	 !log installing scap 4.4.1 everywhere T302464
[07:09:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:09:40] <stashbot>	 T302464: Deploy Scap version 4.4.1 - https://phabricator.wikimedia.org/T302464
[07:13:14] <wikibugs>	 10SRE-swift-storage, 10MW-on-K8s, 10Shellbox, 10serviceops, and 2 others: Support large files in Shellbox - https://phabricator.wikimedia.org/T292322 (10Joe) 05Open→03Stalled a:05Joe→03None De-assigning from myself as I can't do anything more for this task in its current status.  Also reflecting it...
[07:15:34] <wikibugs>	 (03PS1) 10Ladsgroup: Revert "db1101: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767093
[07:15:36] <wikibugs>	 10SRE, 10Patch-For-Review, 10Tracking-Neverending: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10Joe)
[07:15:38] <wikibugs>	 10SRE, 10discovery-system: confctl SubjectAltNameWarning after python-urllib3 upgrade - https://phabricator.wikimedia.org/T156232 (10Joe) 05Open→03Resolved a:03Joe
[07:16:23] <wikibugs>	 (03PS2) 10Ladsgroup: Revert "db1101: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767093
[07:16:26] <wikibugs>	 10SRE, 10MediaWiki-Configuration, 10discovery-system: Use EtcdConfig in production to allow automation of a datacenter switch - https://phabricator.wikimedia.org/T182597 (10Joe) 05Open→03Resolved a:03Joe
[07:16:32] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Revert "db1101: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767093 (owner: 10Ladsgroup)
[07:16:35] <wikibugs>	 10SRE, 10discovery-system: Replace etcd internal auth mechanism with a frontend proxy - https://phabricator.wikimedia.org/T146355 (10Joe) 05Open→03Resolved a:03Joe This has been implemented years ago.
[07:16:59] <wikibugs>	 10SRE, 10discovery-system: confctl should provide tags information after writing data - https://phabricator.wikimedia.org/T124413 (10Joe) 05Open→03Resolved a:03Joe This has been solved years ago.
[07:18:37] <wikibugs>	 10SRE, 10discovery-system: Create a conftool "agent" that overcomes confd deficiencies - https://phabricator.wikimedia.org/T107285 (10Joe) 05Open→03Declined 7 years later no one is working on this and I doubt it will ever be. Declining the task as a consequence.
[07:18:46] <wikibugs>	 (03PS1) 10Ladsgroup: db1167: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/767440 (https://phabricator.wikimedia.org/T302185)
[07:19:03] <wikibugs>	 10SRE, 10Traffic-Icebox, 10discovery-system, 10services-tooling: Figure out an etcd deploy strategy that includes multi DC failure scenarios. - https://phabricator.wikimedia.org/T98165 (10Joe) 05Open→03Resolved a:03Joe This task was left open by mistake; we've had a multi-dc setup for years now.
[07:19:15] <wikibugs>	 (03PS2) 10Ladsgroup: db1167: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/767440 (https://phabricator.wikimedia.org/T302185)
[07:19:19] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] db1167: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/767440 (https://phabricator.wikimedia.org/T302185) (owner: 10Ladsgroup)
[07:19:47] <wikibugs>	 10SRE, 10Kubernetes, 10discovery-system: Document what #discovery-system is - https://phabricator.wikimedia.org/T282948 (10Joe) 05Open→03Resolved @Aklapper all done. I think we can retire the tag.
[07:21:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21655 and previous config saved to /var/cache/conftool/dbconfig/20220302-072105-ladsgroup.json
[07:21:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:23:40] <wikibugs>	 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10Joe) I would frankly either keep third-party modules under /modules or move them to /vendor/modules.  While I do love r10k as an idea and I even considered it as an option for puppet for cloud...
[07:29:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "Merging as no new comments appeared here or on the design document in the last week or so, and we need to move forward with this." [puppet] - 10https://gerrit.wikimedia.org/r/763486 (owner: 10Giuseppe Lavagetto)
[07:30:16] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] conftool: add request-actions / request-patterns (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/763486 (owner: 10Giuseppe Lavagetto)
[07:35:35] <_joe_>	 !log filling request patterns in etcd
[07:35:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:36:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21656 and previous config saved to /var/cache/conftool/dbconfig/20220302-073610-ladsgroup.json
[07:36:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:36:13] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[07:40:46] <icinga-wm>	 RECOVERY - SSH on mw2258.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:41:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[07:42:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:42:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[07:42:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[07:42:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:42:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:42:06] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[07:42:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:42:10] <icinga-wm>	 RECOVERY - SSH on kubernetes2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:42:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21657 and previous config saved to /var/cache/conftool/dbconfig/20220302-074210-ladsgroup.json
[07:42:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:42:13] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[07:45:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[07:45:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:45:14] <wikibugs>	 10SRE, 10Kubernetes, 10discovery-system: Document what #discovery-system is - https://phabricator.wikimedia.org/T282948 (10RhinosF1) Should it be archived then?
[07:45:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[07:45:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[07:45:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:45:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:45:22] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[07:45:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:45:56] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[07:45:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:45:57] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[07:45:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:46:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21658 and previous config saved to /var/cache/conftool/dbconfig/20220302-074602-ladsgroup.json
[07:46:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:46:05] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[07:48:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21659 and previous config saved to /var/cache/conftool/dbconfig/20220302-074822-ladsgroup.json
[07:48:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:04] <jouncebot>	 Amir1, awight, Urbanecm, and taavi: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T0800).
[08:00:04] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[08:00:19] <urbanecm>	 indeed, nothing to do!
[08:01:34] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /_info (retrieve service info) is CRITICAL: Test retrieve service info returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid
[08:02:20] <Amir1>	 !log killing all entity dumpers of wikidata in snapshot1008 (T300255)
[08:02:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:02:24] <stashbot>	 T300255: Wikidata entity dumper keeps connecting to depooled host for really long time - https://phabricator.wikimedia.org/T300255
[08:03:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21660 and previous config saved to /var/cache/conftool/dbconfig/20220302-080327-ladsgroup.json
[08:03:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:03:35] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Observability-Alerting, 10Patch-For-Review, and 2 others: Spicerack: add support for Alertmanager - https://phabricator.wikimedia.org/T293209 (10fgiunchedi)
[08:04:56] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[08:09:14] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.reimage for host db1167.eqiad.wmnet with OS bullseye
[08:09:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:09:20] <godog>	 !log test thanos 0.24.0 on thanos-fe2001 to check if https://github.com/thanos-io/thanos/issues/4531 is fixed
[08:09:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:15:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: "+ Cole for visibility" [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198) (owner: 10Bking)
[08:18:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21661 and previous config saved to /var/cache/conftool/dbconfig/20220302-081832-ladsgroup.json
[08:18:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:19:50] <wikibugs>	 (03CR) 10Muehlenhoff: Require Python 3.7/buster for logout scripts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/767064 (owner: 10Muehlenhoff)
[08:20:06] <wikibugs>	 10SRE, 10observability, 10serviceops, 10Patch-For-Review: aggregate mismatched wikiversions alert - https://phabricator.wikimedia.org/T302832 (10fgiunchedi) I think a short term easy fix would be to make the check warning (i.e. icinga/alerts.w.o only) instead of critical so it doesn't spam irc, what do you...
[08:20:38] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] alertmanager: open per-device librenms tasks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/767179 (https://phabricator.wikimedia.org/T300836) (owner: 10Filippo Giunchedi)
[08:20:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
[08:20:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
[08:24:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] zuul: gracefully shutdown [puppet] - 10https://gerrit.wikimedia.org/r/732978 (https://phabricator.wikimedia.org/T257040) (owner: 10Hashar)
[08:33:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21662 and previous config saved to /var/cache/conftool/dbconfig/20220302-083338-ladsgroup.json
[08:33:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
[08:33:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:41] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
[08:33:42] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[08:33:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21663 and previous config saved to /var/cache/conftool/dbconfig/20220302-083345-ladsgroup.json
[08:33:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:35:13] <icinga-wm>	 PROBLEM - SSH on bast3005 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:36:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21664 and previous config saved to /var/cache/conftool/dbconfig/20220302-083606-ladsgroup.json
[08:36:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:36:55] <icinga-wm>	 RECOVERY - SSH on bast3005 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:38:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1167.eqiad.wmnet with OS bullseye
[08:38:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:40:42] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] "See minor comment inline." [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198) (owner: 10Bking)
[08:41:11] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] elastic: prevent rundir from deletion (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198) (owner: 10Bking)
[08:44:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove ema from router config [homer/public] - 10https://gerrit.wikimedia.org/r/767083 (owner: 10Muehlenhoff)
[08:45:11] <wikibugs>	 (03PS1) 10Hashar: Revert "zuul: gracefully shutdown" [puppet] - 10https://gerrit.wikimedia.org/r/767094
[08:45:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21665 and previous config saved to /var/cache/conftool/dbconfig/20220302-084513-ladsgroup.json
[08:45:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:45:17] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[08:46:35] <wikibugs>	 (03PS2) 10Hashar: Revert "zuul: gracefully shutdown" [puppet] - 10https://gerrit.wikimedia.org/r/767094 (https://phabricator.wikimedia.org/T257040)
[08:50:22] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Revert "zuul: gracefully shutdown" [puppet] - 10https://gerrit.wikimedia.org/r/767094 (https://phabricator.wikimedia.org/T257040) (owner: 10Hashar)
[08:51:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21666 and previous config saved to /var/cache/conftool/dbconfig/20220302-085111-ladsgroup.json
[08:51:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:58:11] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Replies inline, no blocker for me. The code does what it advertises :)" [puppet] - 10https://gerrit.wikimedia.org/r/767064 (owner: 10Muehlenhoff)
[08:58:58] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Maybe let's add a warning message to the cookbook though, so to not forget" [puppet] - 10https://gerrit.wikimedia.org/r/767064 (owner: 10Muehlenhoff)
[08:59:01] <wikibugs>	 (03CR) 10Muehlenhoff: elastic: prevent rundir from deletion (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198) (owner: 10Bking)
[09:00:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21667 and previous config saved to /var/cache/conftool/dbconfig/20220302-090018-ladsgroup.json
[09:00:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:02:41] <wikibugs>	 (03PS1) 10Elukey: Add kubernetes2018 to wikikube codfw [puppet] - 10https://gerrit.wikimedia.org/r/767465 (https://phabricator.wikimedia.org/T302208)
[09:04:43] <wikibugs>	 (03PS1) 10Ladsgroup: Revert "db1167: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767095
[09:05:14] <wikibugs>	 (03PS2) 10Ladsgroup: Revert "db1167: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767095
[09:05:35] <XioNoX>	 !log push Capirca managed labs-in firewall filter to eqiad routers
[09:05:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:41] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Revert "db1167: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/767095 (owner: 10Ladsgroup)
[09:06:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21668 and previous config saved to /var/cache/conftool/dbconfig/20220302-090615-ladsgroup.json
[09:06:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:07:13] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34021/console" [puppet] - 10https://gerrit.wikimedia.org/r/767465 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[09:08:13] <wikibugs>	 (03PS1) 10David Caro: wmcs: add runbook url to the backup_cinder_volumes alert [puppet] - 10https://gerrit.wikimedia.org/r/767467 (https://phabricator.wikimedia.org/T302855)
[09:09:24] <wikibugs>	 (03CR) 10Elukey: Add kubernetes2018 to wikikube codfw [puppet] - 10https://gerrit.wikimedia.org/r/767465 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[09:09:33] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] gitlab: avoid $realm check, simplify ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/762897 (owner: 10Majavah)
[09:12:52] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] "looks good to me, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/762897 (owner: 10Majavah)
[09:13:05] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] mtail::atstls: Use native histogram type [puppet] - 10https://gerrit.wikimedia.org/r/767069 (owner: 10Vgutierrez)
[09:13:32] <wikibugs>	 (03PS2) 10Jelto: gitlab: avoid $realm check, simplify ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/762897 (owner: 10Majavah)
[09:13:38] <wikibugs>	 (03PS3) 10Vgutierrez: mtail::atstls: Provide trafficserver_tls_client_healthcheck_ttfb [puppet] - 10https://gerrit.wikimedia.org/r/767185
[09:13:40] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Port labs-in4/6 to Capirca [homer/public] - 10https://gerrit.wikimedia.org/r/701347 (https://phabricator.wikimedia.org/T285461) (owner: 10Ayounsi)
[09:14:32] <wikibugs>	 (03Merged) 10jenkins-bot: Port labs-in4/6 to Capirca [homer/public] - 10https://gerrit.wikimedia.org/r/701347 (https://phabricator.wikimedia.org/T285461) (owner: 10Ayounsi)
[09:15:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21669 and previous config saved to /var/cache/conftool/dbconfig/20220302-091523-ladsgroup.json
[09:15:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:15:56] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] mtail::atstls: Provide trafficserver_tls_client_healthcheck_ttfb [puppet] - 10https://gerrit.wikimedia.org/r/767185 (owner: 10Vgutierrez)
[09:16:03] <mmandere>	 !log rolling restart of varnishkafka-* on cp6*
[09:16:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:18] <wikibugs>	 (03PS1) 10Elukey: Add BGP config for kubernetes2018 [homer/public] - 10https://gerrit.wikimedia.org/r/767468 (https://phabricator.wikimedia.org/T302208)
[09:16:55] <wikibugs>	 (03CR) 10Elukey: "Related BGP change for Homer: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/767468" [puppet] - 10https://gerrit.wikimedia.org/r/767465 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[09:21:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21670 and previous config saved to /var/cache/conftool/dbconfig/20220302-092120-ladsgroup.json
[09:21:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[09:21:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:21:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[09:21:24] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[09:21:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:21:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:21:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21671 and previous config saved to /var/cache/conftool/dbconfig/20220302-092128-ladsgroup.json
[09:21:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:23:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21672 and previous config saved to /var/cache/conftool/dbconfig/20220302-092348-ladsgroup.json
[09:23:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:08] <wikibugs>	 (03CR) 10Jelto: [V: 03+1 C: 03+2] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34022/console" [puppet] - 10https://gerrit.wikimedia.org/r/762897 (owner: 10Majavah)
[09:28:38] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to analytics-privatedata-users for Amin Al Hazwani - https://phabricator.wikimedia.org/T302775 (10JMeybohm)
[09:28:59] <wikibugs>	 (03PS1) 10Ayounsi: Add labs-in4/6 to codfw cloud-hosts vlan [homer/public] - 10https://gerrit.wikimedia.org/r/767471
[09:30:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21673 and previous config saved to /var/cache/conftool/dbconfig/20220302-093027-ladsgroup.json
[09:30:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:30:31] <stashbot>	 T302185: Upgrade s8 to Bullseye - https://phabricator.wikimedia.org/T302185
[09:30:53] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Add labs-in4/6 to codfw cloud-hosts vlan [homer/public] - 10https://gerrit.wikimedia.org/r/767471 (owner: 10Ayounsi)
[09:31:23] <wikibugs>	 (03PS1) 10JMeybohm: admin: Add aminalhazwani to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/767472 (https://phabricator.wikimedia.org/T302775)
[09:35:41] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2001.codfw.wmnet
[09:35:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:35:43] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.dns.netbox
[09:35:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:27] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] Add kubernetes2018 to wikikube codfw [puppet] - 10https://gerrit.wikimedia.org/r/767465 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[09:38:04] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] Add BGP config for kubernetes2018 [homer/public] - 10https://gerrit.wikimedia.org/r/767468 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[09:38:24] <wikibugs>	 (03PS1) 10Jelto: gitlab: update sevice_ip and ferm_drange for wmcs [puppet] - 10https://gerrit.wikimedia.org/r/767473 (https://phabricator.wikimedia.org/T302803)
[09:38:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21674 and previous config saved to /var/cache/conftool/dbconfig/20220302-093853-ladsgroup.json
[09:38:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:03] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:39:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:31] <wikibugs>	 (03PS1) 10David Caro: wmcs-cinder-backup-manager: increase individual timeout to 30h [puppet] - 10https://gerrit.wikimedia.org/r/767474 (https://phabricator.wikimedia.org/T302855)
[09:41:54] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (NOOP 2 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34023/console" [puppet] - 10https://gerrit.wikimedia.org/r/767473 (https://phabricator.wikimedia.org/T302803) (owner: 10Jelto)
[09:42:53] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add kubernetes2018 to wikikube codfw [puppet] - 10https://gerrit.wikimedia.org/r/767465 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[09:42:55] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_wdqs_updater in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[09:43:57] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] "PCC SUCCESS (DIFF 2 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34025/console" [puppet] - 10https://gerrit.wikimedia.org/r/767465 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[09:44:21] <logmsgbot>	 !log jgiannelos@deploy1002 Started deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging
[09:44:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:34] <logmsgbot>	 !log jgiannelos@deploy1002 Finished deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging (duration: 02m 13s)
[09:46:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:43] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] "Context: https://phabricator.wikimedia.org/T296706" [deployment-charts] - 10https://gerrit.wikimedia.org/r/751439 (owner: 10PipelineBot)
[09:47:11] <logmsgbot>	 !log jgiannelos@deploy1002 Started deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging
[09:47:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:47] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2001.codfw.wmnet
[09:48:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:49:03] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "With correct floating IP in place (see https://phabricator.wikimedia.org/T302803) we don't need a dedicated ferm_drange in WMCS (beside mi" [puppet] - 10https://gerrit.wikimedia.org/r/767473 (https://phabricator.wikimedia.org/T302803) (owner: 10Jelto)
[09:49:23] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2002.codfw.wmnet
[09:49:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:49:24] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.dns.netbox
[09:49:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:35] <wikibugs>	 (03Merged) 10jenkins-bot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/751439 (owner: 10PipelineBot)
[09:51:37] <logmsgbot>	 !log jgiannelos@deploy1002 Finished deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging (duration: 04m 26s)
[09:51:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:52:20] <wikibugs>	 (03PS1) 10Ayounsi: Rename labs and cloud filters [homer/public] - 10https://gerrit.wikimedia.org/r/767476
[09:53:13] <icinga-wm>	 PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/media-list/{title} (Get media list from test page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[09:53:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21675 and previous config saved to /var/cache/conftool/dbconfig/20220302-095358-ladsgroup.json
[09:53:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:54:06] <wikibugs>	 10SRE, 10Patch-For-Review: migrate services from cumin2001 to cumin2002 - https://phabricator.wikimedia.org/T276589 (10jcrespo) Backups worked without errors tonight, all migration work done and ready to upgrade the backup hosts next.
[09:55:00] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:55:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:31] <icinga-wm>	 RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[09:55:48] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging] START helmfile.d/services/mathoid: apply
[09:55:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:55] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add BGP config for kubernetes2018 [homer/public] - 10https://gerrit.wikimedia.org/r/767468 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[09:56:47] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging] DONE helmfile.d/services/mathoid: apply
[09:56:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:34] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "LGTM,  quesiton inlined." [homer/public] - 10https://gerrit.wikimedia.org/r/767476 (owner: 10Ayounsi)
[09:59:09] <wikibugs>	 (03CR) 10Ayounsi: Rename labs and cloud filters (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/767476 (owner: 10Ayounsi)
[09:59:31] <mbsantos>	 there's a weird behavior on Maps geoshape endpoint that is a possible ddos situation
[10:00:18] <mbsantos>	 it's starving PG connections in maps eqiad https://grafana.wikimedia.org/goto/5V7twIY7k?orgId=1
[10:00:19] <wikibugs>	 (03CR) 10Ayounsi: "Checked the filter and it should works with codfw without changes." [homer/public] - 10https://gerrit.wikimedia.org/r/767471 (owner: 10Ayounsi)
[10:00:31] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Add labs-in4/6 to codfw cloud-hosts vlan [homer/public] - 10https://gerrit.wikimedia.org/r/767471 (owner: 10Ayounsi)
[10:00:52] <wikibugs>	 (03PS6) 10Muehlenhoff: sre.ganeti.addnode: Validate bridge config of the switches [cookbooks] - 10https://gerrit.wikimedia.org/r/765309
[10:01:32] <mbsantos>	 from turnilo web requets sample it seems that geoshapes is being highly requested (and one weird chinese tile) https://w.wiki/4uCj
[10:01:59] <mbsantos>	 it seems that 3rd parties found a way to work around our block
[10:02:23] <mbsantos>	 cc/ _joe_  
[10:02:51] <_joe_>	 mbsantos: 301 traffic :P
[10:03:03] <wikibugs>	 (03CR) 10Jelto: [V: 03+1 C: 03+2] gitlab: update sevice_ip and ferm_drange for wmcs [puppet] - 10https://gerrit.wikimedia.org/r/767473 (https://phabricator.wikimedia.org/T302803) (owner: 10Jelto)
[10:03:09] <_joe_>	 sorry but I have too much on my plate already
[10:03:29] <mbsantos>	 no worries, is there a different channel for traffic?
[10:03:38] <_joe_>	 #wikimedia-traffci
[10:03:51] <_joe_>	 but I would advise opening a restricted task with more information
[10:04:43] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2002.codfw.wmnet
[10:04:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:50] <mbsantos>	 thanks
[10:08:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] sre.ganeti.addnode: Validate bridge config of the switches [cookbooks] - 10https://gerrit.wikimedia.org/r/765309 (owner: 10Muehlenhoff)
[10:09:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21676 and previous config saved to /var/cache/conftool/dbconfig/20220302-100903-ladsgroup.json
[10:09:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:06] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[10:10:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, I'll merge in a bit." [puppet] - 10https://gerrit.wikimedia.org/r/766291 (owner: 10Majavah)
[10:11:40] <logmsgbot>	 !log jgiannelos@deploy1002 Started deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging"
[10:11:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:12:39] <wikibugs>	 (03PS1) 10Ladsgroup: Add --dbgroupdefault=dump to every major dump run [dumps] - 10https://gerrit.wikimedia.org/r/767477 (https://phabricator.wikimedia.org/T138208)
[10:13:17] <logmsgbot>	 !log jgiannelos@deploy1002 Finished deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging" (duration: 01m 36s)
[10:13:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:13:27] <logmsgbot>	 !log jgiannelos@deploy1002 Started deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging"
[10:13:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:23] <logmsgbot>	 !log jayme@deploy1002 helmfile [codfw] START helmfile.d/services/mathoid: apply
[10:14:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:12] <logmsgbot>	 !log jgiannelos@deploy1002 Finished deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging" (duration: 01m 45s)
[10:15:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:31] <logmsgbot>	 !log jayme@deploy1002 helmfile [codfw] DONE helmfile.d/services/mathoid: apply
[10:15:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:16:38] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/767226 (https://phabricator.wikimedia.org/T301679) (owner: 10JMeybohm)
[10:17:34] <wikibugs>	 (03PS1) 10Klausman: Add entries for ML staging control plane VMs [puppet] - 10https://gerrit.wikimedia.org/r/767478 (https://phabricator.wikimedia.org/T302504)
[10:18:23] <logmsgbot>	 !log jayme@deploy1002 helmfile [eqiad] START helmfile.d/services/mathoid: apply
[10:18:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:18:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/767227 (https://phabricator.wikimedia.org/T301659) (owner: 10JMeybohm)
[10:19:43] <wikibugs>	 (03PS2) 10Klausman: Add entries for ML staging control plane VMs [puppet] - 10https://gerrit.wikimedia.org/r/767478 (https://phabricator.wikimedia.org/T302504)
[10:20:08] <logmsgbot>	 !log jayme@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
[10:20:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:20:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/767472 (https://phabricator.wikimedia.org/T302775) (owner: 10JMeybohm)
[10:22:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Extract ssh fingerprint publishing to an independent class [puppet] - 10https://gerrit.wikimedia.org/r/766291 (owner: 10Majavah)
[10:29:28] <wikibugs>	 (03CR) 10JMeybohm: "Deployed everywhere" [deployment-charts] - 10https://gerrit.wikimedia.org/r/751439 (owner: 10PipelineBot)
[10:30:00] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] admin: add tmlt-tmager to krb & analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/767226 (https://phabricator.wikimedia.org/T301679) (owner: 10JMeybohm)
[10:30:04] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] admin: add damiendf to krb & analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/767227 (https://phabricator.wikimedia.org/T301659) (owner: 10JMeybohm)
[10:30:09] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] admin: Add aminalhazwani to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/767472 (https://phabricator.wikimedia.org/T302775) (owner: 10JMeybohm)
[10:30:56] <wikibugs>	 (03PS3) 10Klausman: Add entries for ML staging control plane VMs [puppet] - 10https://gerrit.wikimedia.org/r/767478 (https://phabricator.wikimedia.org/T302504)
[10:31:12] <wikibugs>	 (03PS4) 10Klausman: Add entries for ML staging control plane VMs [puppet] - 10https://gerrit.wikimedia.org/r/767478 (https://phabricator.wikimedia.org/T302504)
[10:31:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[10:31:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:31:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[10:31:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:08] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] Add Cumin alias to match core-test role [puppet] - 10https://gerrit.wikimedia.org/r/765562 (owner: 10Muehlenhoff)
[10:32:43] <wikibugs>	 (03CR) 10Kormat: Add Cumin alias to match core-test role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/765562 (owner: 10Muehlenhoff)
[10:34:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[10:34:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:34:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[10:34:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:34:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21677 and previous config saved to /var/cache/conftool/dbconfig/20220302-103407-ladsgroup.json
[10:34:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:34:10] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[10:38:16] <wikibugs>	 10SRE, 10Kubernetes: Document what #discovery-system is - https://phabricator.wikimedia.org/T282948 (10Aklapper) Project archived - https://www.mediawiki.org/wiki/Phabricator/Project_management#Archiving_a_project
[10:38:30] <wikibugs>	 10SRE, 10Project-Admins, 10Kubernetes: Document what #discovery-system is - https://phabricator.wikimedia.org/T282948 (10Aklapper)
[10:38:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21678 and previous config saved to /var/cache/conftool/dbconfig/20220302-103832-ladsgroup.json
[10:38:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:41] <wikibugs>	 (03PS1) 10Elukey: Add kubernetes20[19-22] to wikikube codfw [puppet] - 10https://gerrit.wikimedia.org/r/767482 (https://phabricator.wikimedia.org/T302208)
[10:39:19] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Add entries for ML staging control plane VMs [puppet] - 10https://gerrit.wikimedia.org/r/767478 (https://phabricator.wikimedia.org/T302504) (owner: 10Klausman)
[10:39:41] <wikibugs>	 (03CR) 10Klausman: [C: 03+2] Add entries for ML staging control plane VMs [puppet] - 10https://gerrit.wikimedia.org/r/767478 (https://phabricator.wikimedia.org/T302504) (owner: 10Klausman)
[10:42:55] <jinxer-wm>	 (ProbeHttpFailed) firing: (2) URL did not return HTTP 2xx or 3xx response (or probe/connection failed) - https://wikitech.wikimedia.org/wiki/Prometheus#Watchrat_Non-23xx_HTTP_response - https://grafana.wikimedia.org/d/GYciEga7z/watchrat - https://alerts.wikimedia.org
[10:42:58] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10observability, and 2 others: Upgrade Kafka Risk Evaluation - https://phabricator.wikimedia.org/T302610 (10JMeybohm) p:05Triage→03Medium
[10:43:32] <wikibugs>	 10SRE: Domain Ownership Verification on Various Search Properties - https://phabricator.wikimedia.org/T302617 (10JMeybohm) p:05Triage→03Medium
[10:44:53] <wikibugs>	 (03CR) 10Elukey: "pcc diff https://puppet-compiler.wmflabs.org/pcc-worker1001/34026/" [puppet] - 10https://gerrit.wikimedia.org/r/767482 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[10:45:00] <wikibugs>	 (03PS2) 10Elukey: Add kubernetes20[19-22] to wikikube codfw [puppet] - 10https://gerrit.wikimedia.org/r/767482 (https://phabricator.wikimedia.org/T302208)
[10:45:23] <wikibugs>	 (03PS1) 10Jelto: gitlab: remove realm check, move listen_addresses to hiera [puppet] - 10https://gerrit.wikimedia.org/r/767484 (https://phabricator.wikimedia.org/T297411)
[10:45:25] <wikibugs>	 (03PS10) 10Filippo Giunchedi: Introduce 'alertmanager' and 'alerting' modules [software/spicerack] - 10https://gerrit.wikimedia.org/r/765480 (https://phabricator.wikimedia.org/T293209)
[10:46:41] <wikibugs>	 (03CR) 10Filippo Giunchedi: Introduce 'alertmanager' and 'alerting' modules (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/765480 (https://phabricator.wikimedia.org/T293209) (owner: 10Filippo Giunchedi)
[10:48:41] <wikibugs>	 (03PS1) 10Elukey: Add BGP config for kubernetes20[19-22] in wikikube codfw [homer/public] - 10https://gerrit.wikimedia.org/r/767485 (https://phabricator.wikimedia.org/T302208)
[10:49:00] <wikibugs>	 (03PS3) 10Muehlenhoff: Add Cumin alias to match core-test role [puppet] - 10https://gerrit.wikimedia.org/r/765562
[10:49:08] <wikibugs>	 (03CR) 10Elukey: "bgp config in https://gerrit.wikimedia.org/r/c/operations/homer/public/+/767485" [puppet] - 10https://gerrit.wikimedia.org/r/767482 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[10:53:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21680 and previous config saved to /var/cache/conftool/dbconfig/20220302-105336-ladsgroup.json
[10:53:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:55:53] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10JMeybohm) p:05Triage→03Medium deployment-mediawiki11 has been replaced by deployment-mediawiki12 (although th...
[10:56:18] <moritzm>	 !log restarting apache2 and mailman3-web on lists.wikimedia.org for expat security update
[10:56:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:59:24] <wikibugs>	 (03PS2) 10Jelto: gitlab: remove realm check, move listen_addresses to hiera [puppet] - 10https://gerrit.wikimedia.org/r/767484 (https://phabricator.wikimedia.org/T297411)
[11:01:05] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to analytics-privatedata-users for Amin Al Hazwani - https://phabricator.wikimedia.org/T302775 (10JMeybohm) 05Open→03Resolved a:03JMeybohm You should be good to go
[11:03:16] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34028/console" [puppet] - 10https://gerrit.wikimedia.org/r/767484 (https://phabricator.wikimedia.org/T297411) (owner: 10Jelto)
[11:04:37] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Accidentally unsubscribed everyone from open-glam mailing list - https://phabricator.wikimedia.org/T302816 (10Ladsgroup) 05Open→03Resolved I re-added everyone from a backup that was made in 2022-03-01 05:53:07 (so anyone subscribing between that time an...
[11:05:50] <moritzm>	 !log installing expat security updates
[11:05:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:06:28] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10Vgutierrez) hmm if that's the case horizon data for deployment-prep-cache needs to be updated as well cause right...
[11:07:08] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "removing one realm check by moving addresses to hiera, similar to I517f1a51b932b933e4ae42ee5a92db32d433b2fc. Should be noop to production." [puppet] - 10https://gerrit.wikimedia.org/r/767484 (https://phabricator.wikimedia.org/T297411) (owner: 10Jelto)
[11:07:58] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for Damiendf - https://phabricator.wikimedia.org/T301659 (10JMeybohm) 05In progress→03Resolved a:03JMeybohm >>! In T301659#7745083, @Damiendf wrote: > Arg sorry, this is the wrong email address. I correct...
[11:08:02] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata for Tom Magerlein - https://phabricator.wikimedia.org/T301679 (10JMeybohm) 05In progress→03Resolved a:03JMeybohm Access has been granted and krb5 principal has been created.
[11:08:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21681 and previous config saved to /var/cache/conftool/dbconfig/20220302-110842-ladsgroup.json
[11:08:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:17] <wikibugs>	 (03PS2) 10Aqu: Set default Airflow concurrency limits [puppet] - 10https://gerrit.wikimedia.org/r/767220 (https://phabricator.wikimedia.org/T300870)
[11:21:42] <logmsgbot>	 !log mbsantos@deploy1002 Started deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e"
[11:21:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:21:48] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10JMeybohm) I was really just relaying from T300525 but it looks like something is off. deployment-mediawiki11 was...
[11:22:21] <mbsantos>	 !log rollback maps eqiad to a previous working state to mitigate geoshape errors
[11:22:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:12] <logmsgbot>	 !log mbsantos@deploy1002 Finished deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e" (duration: 01m 29s)
[11:23:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21682 and previous config saved to /var/cache/conftool/dbconfig/20220302-112347-ladsgroup.json
[11:23:49] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[11:23:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:50] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[11:23:51] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[11:23:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:26:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[11:26:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:26:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[11:26:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:37] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 105 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[11:28:18] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[11:28:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:20] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[11:28:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21683 and previous config saved to /var/cache/conftool/dbconfig/20220302-112824-ladsgroup.json
[11:28:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:30:06] <wikibugs>	 (03PS1) 10Majavah: policies/cr-labs: Include cloudbackup-dev hosts [homer/public] - 10https://gerrit.wikimedia.org/r/767487
[11:32:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21684 and previous config saved to /var/cache/conftool/dbconfig/20220302-113240-ladsgroup.json
[11:32:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:32:44] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[11:37:02] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10Majavah) >>! In T302699#7746972, @JMeybohm wrote: > I was really just relaying from T300525 but it looks like som...
[11:38:10] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable profile::auto_restarts::service for smokeping [puppet] - 10https://gerrit.wikimedia.org/r/767488 (https://phabricator.wikimedia.org/T135991)
[11:43:18] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/767488 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[11:47:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21685 and previous config saved to /var/cache/conftool/dbconfig/20220302-114745-ladsgroup.json
[11:47:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:55:10] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add Cumin alias to match core-test role [puppet] - 10https://gerrit.wikimedia.org/r/765562 (owner: 10Muehlenhoff)
[11:57:45] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp6015 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[11:57:59] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp6009 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[11:59:01] <icinga-wm>	 RECOVERY - traffic_server backend process restarted on cp6010 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=drmrs+prometheus/ops&var-instance=cp6010&var-layer=backend
[12:02:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21686 and previous config saved to /var/cache/conftool/dbconfig/20220302-120250-ladsgroup.json
[12:02:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:03:28] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to analytics-privatedata-users for Amin Al Hazwani - https://phabricator.wikimedia.org/T302775 (10aminalhazwani) Yes, indeed! Thanks @JMeybohm 🙏🏼
[12:04:13] <icinga-wm>	 RECOVERY - Check systemd state on cp6010 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:04:19] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: utils: add script to sync abuse networks with conftool ipblocks [puppet] - 10https://gerrit.wikimedia.org/r/767489 (https://phabricator.wikimedia.org/T302471)
[12:05:12] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp4034 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/767490 (https://phabricator.wikimedia.org/T290005)
[12:07:03] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp4034 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/767490 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[12:09:13] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
[12:09:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:09:25] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster
[12:10:41] <icinga-wm>	 PROBLEM - SSH on analytics1067.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:17:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21687 and previous config saved to /var/cache/conftool/dbconfig/20220302-121754-ladsgroup.json
[12:17:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:17:58] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[12:18:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
[12:18:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:18:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
[12:18:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:18:04] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
[12:18:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:18:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
[12:18:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:18:33] <wikibugs>	 (03PS1) 10Jbond: C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864)
[12:19:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864) (owner: 10Jbond)
[12:20:41] <wikibugs>	 (03PS2) 10Jbond: C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864)
[12:20:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
[12:20:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
[12:20:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21688 and previous config saved to /var/cache/conftool/dbconfig/20220302-122049-ladsgroup.json
[12:20:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:21:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864) (owner: 10Jbond)
[12:21:38] <wikibugs>	 (03PS3) 10Jbond: C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864)
[12:21:53] <wikibugs>	 (03PS4) 10Jbond: C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864)
[12:22:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864) (owner: 10Jbond)
[12:24:10] <wikibugs>	 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10MoritzMuehlenhoff) Thanks for opening this task, having this discussion in seachable, open medium is very useful!  > Based on the discussion so far my inclination is that we stick with our cur...
[12:24:31] <wikibugs>	 (03PS5) 10Jbond: C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864)
[12:25:05] <wikibugs>	 (03PS1) 10Zabe: Change the mwapi host back to mediawiki11 [puppet] - 10https://gerrit.wikimedia.org/r/767492
[12:25:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21689 and previous config saved to /var/cache/conftool/dbconfig/20220302-122510-ladsgroup.json
[12:25:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864) (owner: 10Jbond)
[12:25:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:14] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[12:26:36] <wikibugs>	 (03PS2) 10Zabe: deployment-prep: change the mwapi host back to mediawiki11 [puppet] - 10https://gerrit.wikimedia.org/r/767492
[12:30:07] <wikibugs>	 (03PS6) 10Jbond: C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864)
[12:30:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864) (owner: 10Jbond)
[12:31:52] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] Enable profile::auto_restarts::service for smokeping [puppet] - 10https://gerrit.wikimedia.org/r/767488 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[12:32:26] <icinga-wm>	 RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:33:00] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] Add BGP config for kubernetes20[19-22] in wikikube codfw [homer/public] - 10https://gerrit.wikimedia.org/r/767485 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[12:33:08] <wikibugs>	 (03PS7) 10Jbond: C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864)
[12:34:31] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10netbox: Grant cn=nda some sort of read only access to Netbox - https://phabricator.wikimedia.org/T302870 (10Majavah)
[12:35:19] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34037/console" [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864) (owner: 10Jbond)
[12:37:38] <icinga-wm>	 PROBLEM - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: deploy_to_mwdebug.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:38:24] <icinga-wm>	 RECOVERY - traffic_server tls process restarted on cp6009 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=drmrs+prometheus/ops&var-instance=cp6009&var-layer=tls
[12:38:47] <wikibugs>	 (03PS1) 10Reedy: Delete incorrect en-gb.json [extensions/MassMessage] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/767098 (https://phabricator.wikimedia.org/T302840)
[12:39:30] <Reedy>	 jouncebot: nowandnext
[12:39:31] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 20 minute(s)
[12:39:31] <jouncebot>	 In 1 hour(s) and 20 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T1400)
[12:39:38] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Delete incorrect en-gb.json [extensions/MassMessage] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/767098 (https://phabricator.wikimedia.org/T302840) (owner: 10Reedy)
[12:40:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21690 and previous config saved to /var/cache/conftool/dbconfig/20220302-124014-ladsgroup.json
[12:40:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:42:06] <icinga-wm>	 RECOVERY - traffic_server tls process restarted on cp6014 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=drmrs+prometheus/ops&var-instance=cp6014&var-layer=tls
[12:43:25] <logmsgbot>	 !log vgutierrez@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
[12:43:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:43:38] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster e...
[12:43:42] <wikibugs>	 (03Merged) 10jenkins-bot: Delete incorrect en-gb.json [extensions/MassMessage] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/767098 (https://phabricator.wikimedia.org/T302840) (owner: 10Reedy)
[12:45:50] <logmsgbot>	 !log reedy@deploy1002 Started scap: Fix MassMessage translations T302840
[12:45:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:45:53] <stashbot>	 T302840: Wrong language in en-gb MassMessage interface - https://phabricator.wikimedia.org/T302840
[12:46:44] <icinga-wm>	 RECOVERY - traffic_server tls process restarted on cp6015 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=drmrs+prometheus/ops&var-instance=cp6015&var-layer=tls
[12:47:41] <logmsgbot>	 !log reedy@deploy1002 Finished scap: Fix MassMessage translations T302840 (duration: 01m 50s)
[12:47:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:46] <wikibugs>	 (03CR) 10Tchanders: Add IPInfo viewing rights for certain groups (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/766882 (https://phabricator.wikimedia.org/T296499) (owner: 10STran)
[12:54:20] <wikibugs>	 (03PS3) 10Zabe: deployment-prep: change the mwapi host back to mediawiki11 [puppet] - 10https://gerrit.wikimedia.org/r/767492 (https://phabricator.wikimedia.org/T302699)
[12:55:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21692 and previous config saved to /var/cache/conftool/dbconfig/20220302-125519-ladsgroup.json
[12:55:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:59:14] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Accidentally unsubscribed everyone from open-glam mailing list - https://phabricator.wikimedia.org/T302816 (10Scann) **THANK YOU SO MUCH**, I can't stress enough how grateful I am for all of you solving this issue in such a timely manner.  Here I'm sending...
[13:00:22] <icinga-wm>	 RECOVERY - traffic_server tls process restarted on cp6016 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=drmrs+prometheus/ops&var-instance=cp6016&var-layer=tls
[13:10:24] <icinga-wm>	 PROBLEM - Disk space on centrallog1001 is CRITICAL: DISK CRITICAL - free space: /srv 34196 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=centrallog1001&var-datasource=eqiad+prometheus/ops
[13:10:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21693 and previous config saved to /var/cache/conftool/dbconfig/20220302-131024-ladsgroup.json
[13:10:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
[13:10:27] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
[13:10:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:29] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[13:10:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21694 and previous config saved to /var/cache/conftool/dbconfig/20220302-131032-ladsgroup.json
[13:10:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:12:14] <icinga-wm>	 RECOVERY - SSH on analytics1067.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:13:36] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10observability, and 2 others: Upgrade Kafka Risk Evaluation - https://phabricator.wikimedia.org/T302610 (10elukey) @EChetty hi! Could you add some details about what you expect to see in this task?
[13:15:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21695 and previous config saved to /var/cache/conftool/dbconfig/20220302-131550-ladsgroup.json
[13:15:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:54] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[13:17:06] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] Add BGP config for kubernetes20[19-22] in wikikube codfw [homer/public] - 10https://gerrit.wikimedia.org/r/767485 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[13:18:30] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "just a nit" [puppet] - 10https://gerrit.wikimedia.org/r/767482 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[13:20:24] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Enable profile::auto_restarts::service for smokeping [puppet] - 10https://gerrit.wikimedia.org/r/767488 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[13:24:16] <wikibugs>	 (03PS3) 10Jbond: P:configmaster: parametrise server names [puppet] - 10https://gerrit.wikimedia.org/r/766585 (owner: 10Majavah)
[13:24:59] <wikibugs>	 (03CR) 10Joal: [C: 03+1] "LGTM except for a typo in commit message :) Thanks @Aqu" [puppet] - 10https://gerrit.wikimedia.org/r/767220 (https://phabricator.wikimedia.org/T300870) (owner: 10Aqu)
[13:25:15] <wikibugs>	 (03PS3) 10Elukey: Add kubernetes20[19-22] to wikikube codfw [puppet] - 10https://gerrit.wikimedia.org/r/767482 (https://phabricator.wikimedia.org/T302208)
[13:25:29] <wikibugs>	 (03CR) 10Elukey: Add kubernetes20[19-22] to wikikube codfw (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/767482 (https://phabricator.wikimedia.org/T302208) (owner: 10Elukey)
[13:27:39] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] Enable profile::auto_restarts::service for puppetdb microservice [puppet] - 10https://gerrit.wikimedia.org/r/767174 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[13:27:42] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:configmaster: parametrise server names [puppet] - 10https://gerrit.wikimedia.org/r/766585 (owner: 10Majavah)
[13:27:47] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "lgtm thanks" [puppet] - 10https://gerrit.wikimedia.org/r/766585 (owner: 10Majavah)
[13:30:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21696 and previous config saved to /var/cache/conftool/dbconfig/20220302-133055-ladsgroup.json
[13:30:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:17] <wikibugs>	 (03CR) 10Joal: [C: 04-1] "Duplicate field - Asking for a reorder but this is not mandatory - the duplicated field removal is :)" [puppet] - 10https://gerrit.wikimedia.org/r/765485 (https://phabricator.wikimedia.org/T301238) (owner: 10Phuedx)
[13:42:24] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] C:geoip::data::maxmind: update systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/767491 (https://phabricator.wikimedia.org/T302864) (owner: 10Jbond)
[13:42:55] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_wdqs_updater in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[13:46:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21697 and previous config saved to /var/cache/conftool/dbconfig/20220302-134600-ladsgroup.json
[13:46:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Enable profile::auto_restarts::service for smokeping [puppet] - 10https://gerrit.wikimedia.org/r/767488 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[13:49:09] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Accidentally unsubscribed everyone from open-glam mailing list - https://phabricator.wikimedia.org/T302816 (10Ladsgroup) Glad to be of service ^^
[13:50:39] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
[13:50:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:50:51] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster
[13:52:35] <wikibugs>	 (03PS1) 10Jbond: geoip: add explicit syslog_identifier [puppet] - 10https://gerrit.wikimedia.org/r/767513
[13:54:05] <wikibugs>	 (03PS9) 10Jbond: varnish/frontend: consume etcd data for dynamic banning of requests. [puppet] - 10https://gerrit.wikimedia.org/r/763557 (owner: 10Giuseppe Lavagetto)
[13:55:08] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable profile::auto_restarts::service for klaxon gunicorn webapp [puppet] - 10https://gerrit.wikimedia.org/r/767516 (https://phabricator.wikimedia.org/T135991)
[13:57:48] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] geoip: add explicit syslog_identifier [puppet] - 10https://gerrit.wikimedia.org/r/767513 (owner: 10Jbond)
[14:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, and Urbanecm: How many deployers does it take to do UTC afternoon backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T1400).
[14:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[14:00:14] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/767516 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[14:00:32] <Lucas_WMDE>	 ok
[14:01:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21698 and previous config saved to /var/cache/conftool/dbconfig/20220302-140105-ladsgroup.json
[14:01:06] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[14:01:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:01:08] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[14:01:09] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[14:01:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:01:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:01:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21699 and previous config saved to /var/cache/conftool/dbconfig/20220302-140112-ladsgroup.json
[14:01:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:04:14] <wikibugs>	 (03CR) 10Gmodena: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/767220 (https://phabricator.wikimedia.org/T300870) (owner: 10Aqu)
[14:05:03] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] varnish/frontend: consume etcd data for dynamic banning of requests. [puppet] - 10https://gerrit.wikimedia.org/r/763557 (owner: 10Giuseppe Lavagetto)
[14:05:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21700 and previous config saved to /var/cache/conftool/dbconfig/20220302-140532-ladsgroup.json
[14:05:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:26] <wikibugs>	 (03PS1) 10Ladsgroup: ext.flaggedRevs.review: Restore tolerance when setting "disabled" prop [extensions/FlaggedRevs] (wmf/1.38.0-wmf.23) - 10https://gerrit.wikimedia.org/r/767099
[14:13:03] <wikibugs>	 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10jbond) moving third party modules to /vendor/modules also makes it a bit easier to exclude theses modules from CI which is a nice minor benefit
[14:13:16] <wikibugs>	 (03CR) 10Vgutierrez: varnish/frontend: consume etcd data for dynamic banning of requests. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763557 (owner: 10Giuseppe Lavagetto)
[14:13:43] <mmandere>	 !log pool cp6013
[14:13:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:14:36] <Amir1>	 jouncebot: nowandnext
[14:14:36] <jouncebot>	 For the next 0 hour(s) and 45 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T1400)
[14:14:36] <jouncebot>	 In 4 hour(s) and 45 minute(s): Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T1900)
[14:14:36] <jouncebot>	 In 4 hour(s) and 45 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T1900)
[14:14:47] <Amir1>	 awesome
[14:14:51] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] ext.flaggedRevs.review: Restore tolerance when setting "disabled" prop [extensions/FlaggedRevs] (wmf/1.38.0-wmf.23) - 10https://gerrit.wikimedia.org/r/767099 (owner: 10Ladsgroup)
[14:18:36] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [cookbooks] - 10https://gerrit.wikimedia.org/r/767073 (owner: 10Volans)
[14:18:48] <wikibugs>	 (03Merged) 10jenkins-bot: ext.flaggedRevs.review: Restore tolerance when setting "disabled" prop [extensions/FlaggedRevs] (wmf/1.38.0-wmf.23) - 10https://gerrit.wikimedia.org/r/767099 (owner: 10Ladsgroup)
[14:19:02] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [cookbooks] - 10https://gerrit.wikimedia.org/r/767074 (owner: 10Volans)
[14:20:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21701 and previous config saved to /var/cache/conftool/dbconfig/20220302-142037-ladsgroup.json
[14:20:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:20:43] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable profile::auto_restarts::service for apache/pki discovery [puppet] - 10https://gerrit.wikimedia.org/r/767520 (https://phabricator.wikimedia.org/T135991)
[14:21:42] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.38.0-wmf.23/extensions/FlaggedRevs/modules/ext.flaggedRevs.review/review.js: Backport: [[gerrit:767099|ext.flaggedRevs.review: Restore tolerance when setting "disabled" prop]] (duration: 00m 52s)
[14:21:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:12] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/767071 (owner: 10Volans)
[14:24:28] <logmsgbot>	 !log vgutierrez@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
[14:24:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:40] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster e...
[14:24:52] <vgutierrez>	 grrr
[14:25:16] <icinga-wm>	 RECOVERY - Check systemd state on durum6001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:26:08] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
[14:26:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:25] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster
[14:27:22] <moritzm>	 !log rebalance VMs in Ganeti row A after adding new servers (and decomissioning old ones)
[14:27:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:15] <wikibugs>	 (03CR) 10Volans: [C: 03+2] sre.hosts.provision: retry once on failure [cookbooks] - 10https://gerrit.wikimedia.org/r/767073 (owner: 10Volans)
[14:33:22] <wikibugs>	 (03PS3) 10Volans: sre.hosts.provision: retry once on failure [cookbooks] - 10https://gerrit.wikimedia.org/r/767073
[14:34:55] <logmsgbot>	 !log vgutierrez@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4034.ulsfo.wmnet with OS buster
[14:34:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:07] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster e...
[14:35:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21702 and previous config saved to /var/cache/conftool/dbconfig/20220302-143541-ladsgroup.json
[14:35:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:13] <wikibugs>	 (03CR) 10Volans: [C: 03+2] redfish: DellSCP, allow creation of new entities [software/spicerack] - 10https://gerrit.wikimedia.org/r/767071 (owner: 10Volans)
[14:37:52] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
[14:37:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:01] <logmsgbot>	 !log vgutierrez@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
[14:38:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:05] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster
[14:38:15] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster e...
[14:41:51] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
[14:41:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:59] <logmsgbot>	 !log vgutierrez@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
[14:42:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:03] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster
[14:42:10] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster e...
[14:42:55] <jinxer-wm>	 (ProbeHttpFailed) firing: (2) URL did not return HTTP 2xx or 3xx response (or probe/connection failed) - https://wikitech.wikimedia.org/wiki/Prometheus#Watchrat_Non-23xx_HTTP_response - https://grafana.wikimedia.org/d/GYciEga7z/watchrat - https://alerts.wikimedia.org
[14:43:12] <wikibugs>	 (03Merged) 10jenkins-bot: redfish: DellSCP, allow creation of new entities [software/spicerack] - 10https://gerrit.wikimedia.org/r/767071 (owner: 10Volans)
[14:44:44] <wikibugs>	 (03PS1) 10Hashar: gerrit: use raw subject for Phabricator comments [puppet] - 10https://gerrit.wikimedia.org/r/767521 (https://phabricator.wikimedia.org/T280197)
[14:47:20] <icinga-wm>	 RECOVERY - Check unit status of prune_old_srv_syslog_directories on centrallog2002 is OK: OK: Status of the systemd unit prune_old_srv_syslog_directories https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[14:48:30] <icinga-wm>	 RECOVERY - Check systemd state on centrallog2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:48:36] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp5014 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/767522 (https://phabricator.wikimedia.org/T290005)
[14:50:01] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp5014 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/767522 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[14:50:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21703 and previous config saved to /var/cache/conftool/dbconfig/20220302-145046-ladsgroup.json
[14:50:48] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
[14:50:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:50] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
[14:50:50] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[14:50:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21704 and previous config saved to /var/cache/conftool/dbconfig/20220302-145054-ladsgroup.json
[14:50:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:25] <wikibugs>	 (03PS3) 10Aqu: Set default Airflow concurrency limits [puppet] - 10https://gerrit.wikimedia.org/r/767220 (https://phabricator.wikimedia.org/T300870)
[14:52:11] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp5014.eqsin.wmnet with OS buster
[14:52:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:27] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp5014.eqsin.wmnet with OS buster
[14:54:37] <wikibugs>	 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10akosiaris) >>! In T302423#7744908, @jhathaway wrote: >> On a side note, I see there is a proposal of using /vendor/modules. It seems interesting and I 've never tried it, I am wondering what t...
[14:55:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21705 and previous config saved to /var/cache/conftool/dbconfig/20220302-145510-ladsgroup.json
[14:55:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:38] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Set default Airflow concurrency limits [puppet] - 10https://gerrit.wikimedia.org/r/767220 (https://phabricator.wikimedia.org/T300870) (owner: 10Aqu)
[14:58:24] <wikibugs>	 (03PS1) 10Urbanecm: enwiki: Deploy Growth features to 100% of users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767525 (https://phabricator.wikimedia.org/T302846)
[15:00:32] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/767520 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[15:01:18] <wikibugs>	 (03PS1) 10Ottomata: Ah, this is the wrong file.  My fault!  This is for the search's airflow 1 deployment. [puppet] - 10https://gerrit.wikimedia.org/r/767100
[15:01:26] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Ah, this is the wrong file.  My fault!  This is for the search's airflow 1 deployment. [puppet] - 10https://gerrit.wikimedia.org/r/767100 (owner: 10Ottomata)
[15:06:59] <wikibugs>	 (03PS1) 10Ottomata: Set default Airflow concurrency limits for an- airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/767527 (https://phabricator.wikimedia.org/T300870)
[15:10:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21706 and previous config saved to /var/cache/conftool/dbconfig/20220302-151015-ladsgroup.json
[15:10:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:08] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34040/console" [puppet] - 10https://gerrit.wikimedia.org/r/767527 (https://phabricator.wikimedia.org/T300870) (owner: 10Ottomata)
[15:13:10] <wikibugs>	 (03PS1) 10Hnowlan: maps: enable slow query log in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/767529 (https://phabricator.wikimedia.org/T302862)
[15:13:20] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1 C: 03+2] Set default Airflow concurrency limits for an- airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/767527 (https://phabricator.wikimedia.org/T300870) (owner: 10Ottomata)
[15:13:43] <wikibugs>	 (03PS1) 10Ssingh: icinga: add ssingh to cgi.cfg [puppet] - 10https://gerrit.wikimedia.org/r/767530
[15:14:22] <wikibugs>	 (03CR) 10Hnowlan: [V: 03+1] "PCC SUCCESS (NOOP 2 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34041/console" [puppet] - 10https://gerrit.wikimedia.org/r/767529 (https://phabricator.wikimedia.org/T302862) (owner: 10Hnowlan)
[15:17:07] <wikibugs>	 (03CR) 10Jgiannelos: [C: 03+1] maps: enable slow query log in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/767529 (https://phabricator.wikimedia.org/T302862) (owner: 10Hnowlan)
[15:18:11] <wikibugs>	 (03CR) 10Hnowlan: [V: 03+1 C: 03+2] maps: enable slow query log in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/767529 (https://phabricator.wikimedia.org/T302862) (owner: 10Hnowlan)
[15:18:30] <phuedx>	 o/ I'm looking to deploy a Beta-Cluster-only change. There are no deployments going on at the moment, right?
[15:18:49] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
[15:18:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:18:52] <phuedx>	 ^ urbanecm, Lucas_WMDE: You're marked as the deployers for the last window
[15:18:58] <taavi>	 I'm also around if needed
[15:19:04] <urbanecm>	 phuedx: yeah, go ahead
[15:19:12] <Lucas_WMDE>	 AFAIK we didn’t do anything during the window, but I saw something from Amir1 IIRC
[15:19:15] <Lucas_WMDE>	 (probably done by now)
[15:19:24] <Amir1>	 yeah, done
[15:19:29] <phuedx>	 Great. Thanks!
[15:19:49] <Lucas_WMDE>	 ok :)
[15:23:13] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] icinga: add ssingh to cgi.cfg [puppet] - 10https://gerrit.wikimedia.org/r/767530 (owner: 10Ssingh)
[15:23:31] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
[15:23:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:01] <wikibugs>	 (03CR) 10Phuedx: [C: 03+2] Update Event Stream for IPInfo events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/756635 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte)
[15:24:30] <wikibugs>	 (03PS1) 10Jbond: pontoon: add profile::base::pontoon to list of classes [puppet] - 10https://gerrit.wikimedia.org/r/767533
[15:24:42] <wikibugs>	 (03Merged) 10jenkins-bot: Update Event Stream for IPInfo events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/756635 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte)
[15:25:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] pontoon: add profile::base::pontoon to list of classes [puppet] - 10https://gerrit.wikimedia.org/r/767533 (owner: 10Jbond)
[15:25:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21707 and previous config saved to /var/cache/conftool/dbconfig/20220302-152519-ladsgroup.json
[15:25:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:47] <wikibugs>	 (03CR) 10WMDE-Fisch: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767498 (https://phabricator.wikimedia.org/T280024) (owner: 10WMDE-Fisch)
[15:26:54] <wikibugs>	 (03CR) 10WMDE-Fisch: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767499 (https://phabricator.wikimedia.org/T280023) (owner: 10WMDE-Fisch)
[15:27:02] <wikibugs>	 (03CR) 10WMDE-Fisch: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767508 (https://phabricator.wikimedia.org/T286990) (owner: 10WMDE-Fisch)
[15:27:10] <wikibugs>	 (03CR) 10WMDE-Fisch: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767510 (https://phabricator.wikimedia.org/T286990) (owner: 10WMDE-Fisch)
[15:27:19] <wikibugs>	 (03CR) 10WMDE-Fisch: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767512 (https://phabricator.wikimedia.org/T286991) (owner: 10WMDE-Fisch)
[15:27:46] <wikibugs>	 (03PS1) 10Muehlenhoff: envoy-hot-restart: Switch shebang to /usr/bin/python3 [puppet] - 10https://gerrit.wikimedia.org/r/767536
[15:28:43] <phuedx>	 The Beta Cluster config update Jenkins job has run
[15:28:50] <phuedx>	 I'll pull the change onto the deployment host
[15:28:58] <urbanecm>	 sounds good :)
[15:32:58] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/767520 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[15:35:50] <phuedx>	 Done :)
[15:40:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Enable profile::auto_restarts::service for apache/pki discovery [puppet] - 10https://gerrit.wikimedia.org/r/767520 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[15:40:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21708 and previous config saved to /var/cache/conftool/dbconfig/20220302-154026-ladsgroup.json
[15:40:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:28] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
[15:40:30] <wikibugs>	 (03PS5) 10Bking: elastic: prevent rundir from deletion [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198)
[15:40:30] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
[15:40:30] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[15:40:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[15:40:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:35] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[15:40:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21709 and previous config saved to /var/cache/conftool/dbconfig/20220302-154039-ladsgroup.json
[15:40:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:49] <wikibugs>	 (03CR) 10Bking: elastic: prevent rundir from deletion (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198) (owner: 10Bking)
[15:41:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elastic: prevent rundir from deletion [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198) (owner: 10Bking)
[15:41:54] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
[15:41:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:58] <wikibugs>	 (03PS6) 10Bking: elastic: prevent rundir from deletion [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198)
[15:45:34] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
[15:45:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:06] <vgutierrez>	 !log pool cp5014 running HAProxy as TLS termination layer - T290005 T271421
[15:47:07] <wikibugs>	 (03PS1) 10Jbond: O:idp: correctly escape regex dot in service urls [puppet] - 10https://gerrit.wikimedia.org/r/767540
[15:47:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:10] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[15:47:10] <stashbot>	 T271421: Test envoyproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T271421
[15:48:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21710 and previous config saved to /var/cache/conftool/dbconfig/20220302-154807-ladsgroup.json
[15:48:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:10] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[15:49:25] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] O:idp: correctly escape regex dot in service urls [puppet] - 10https://gerrit.wikimedia.org/r/767540 (owner: 10Jbond)
[15:49:32] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5014.eqsin.wmnet with OS buster
[15:49:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:49:44] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp5014.eqsin.wmnet with OS buster c...
[15:52:10] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp3061 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/767542 (https://phabricator.wikimedia.org/T290005)
[15:52:49] <wikibugs>	 (03CR) 10Jbond: "this fixed WMF-01-015" [puppet] - 10https://gerrit.wikimedia.org/r/767540 (owner: 10Jbond)
[15:55:14] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp3061 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/767542 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[15:56:31] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp3061.esams.wmnet with OS buster
[15:56:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:43] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp3061.esams.wmnet with OS buster
[16:03:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21711 and previous config saved to /var/cache/conftool/dbconfig/20220302-160312-ladsgroup.json
[16:03:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:39] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2019 is CRITICAL: /en.wikipedia.org/v1/page/talk/{title} (Get structured talk page for enwiki Salt article) is CRITICAL: Test Get structured talk page for enwiki Salt article returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[16:08:27] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[16:16:31] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] deployment-prep: change the mwapi host back to mediawiki11 [puppet] - 10https://gerrit.wikimedia.org/r/767492 (https://phabricator.wikimedia.org/T302699) (owner: 10Zabe)
[16:18:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21713 and previous config saved to /var/cache/conftool/dbconfig/20220302-161817-ladsgroup.json
[16:18:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:24:16] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
[16:24:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:25:47] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The following units failed: mediawiki_job_wikidata-updateQueryServiceLag.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:27:36] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
[16:27:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:06] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] add link to status page (031 comment) [software/klaxon] - 10https://gerrit.wikimedia.org/r/766839 (owner: 10CDanis)
[16:30:49] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] kubernetes: Upgrade default envoy version to 1.15.5 [puppet] - 10https://gerrit.wikimedia.org/r/766840 (https://phabricator.wikimedia.org/T300324) (owner: 10RLazarus)
[16:33:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21714 and previous config saved to /var/cache/conftool/dbconfig/20220302-163322-ladsgroup.json
[16:33:23] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
[16:33:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:33:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
[16:33:28] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[16:33:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:33:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21715 and previous config saved to /var/cache/conftool/dbconfig/20220302-163329-ladsgroup.json
[16:33:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:33:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:36:10] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:45:03] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198) (owner: 10Bking)
[16:45:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21716 and previous config saved to /var/cache/conftool/dbconfig/20220302-164550-ladsgroup.json
[16:45:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:45:54] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[16:50:40] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3061.esams.wmnet with OS buster
[16:50:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:51] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp3061.esams.wmnet with OS buster c...
[16:51:35] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198) (owner: 10Bking)
[16:51:42] <vgutierrez>	 !log pool cp3061 running HAProxy as TLS termination layer - T290005 T271421
[16:51:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:51:45] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[16:51:46] <stashbot>	 T271421: Test envoyproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T271421
[16:53:47] <wikibugs>	 (03PS1) 10Ladsgroup: auto_schema: Add support for --check in running schema changes [software] - 10https://gerrit.wikimedia.org/r/767554 (https://phabricator.wikimedia.org/T301896)
[16:54:30] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez)
[17:00:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21717 and previous config saved to /var/cache/conftool/dbconfig/20220302-170055-ladsgroup.json
[17:00:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:51] <wikibugs>	 (03PS2) 10Ladsgroup: auto_schema: Add support for --check in running schema changes [software] - 10https://gerrit.wikimedia.org/r/767554 (https://phabricator.wikimedia.org/T301896)
[17:02:45] <icinga-wm>	 PROBLEM - SSH on db2090.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:11:51] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[17:13:12] <wikibugs>	 (03PS1) 10STran: Revert "Update Event Stream for IPInfo events" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767101
[17:13:48] <wikibugs>	 (03CR) 10Tchanders: [C: 03+1] Revert "Update Event Stream for IPInfo events" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767101 (owner: 10STran)
[17:16:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21718 and previous config saved to /var/cache/conftool/dbconfig/20220302-171559-ladsgroup.json
[17:16:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:19:20] <wikibugs>	 (03CR) 10Phuedx: Update Event Stream for IPInfo events (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/756635 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte)
[17:21:25] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 241, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[17:22:12] <wikibugs>	 (03PS2) 10CDanis: add link to status page [software/klaxon] - 10https://gerrit.wikimedia.org/r/766839
[17:22:49] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] add link to status page [software/klaxon] - 10https://gerrit.wikimedia.org/r/766839 (owner: 10CDanis)
[17:23:52] <wikibugs>	 (03Merged) 10jenkins-bot: add link to status page [software/klaxon] - 10https://gerrit.wikimedia.org/r/766839 (owner: 10CDanis)
[17:27:22] <phuedx>	 https://gerrit.wikimedia.org/r/756635 accidentally overrode the event streams configuration //for the Beta Cluster only//. I merged it and so accept responsibility. The revert is about to be merged and the deployment host updated
[17:30:06] <wikibugs>	 (03PS39) 10Jbond: reposync: add new class to manage syncing repositories [software/spicerack] - 10https://gerrit.wikimedia.org/r/747116 (https://phabricator.wikimedia.org/T229397)
[17:30:09] <wikibugs>	 (03CR) 10Jbond: "done thanks" [software/spicerack] - 10https://gerrit.wikimedia.org/r/747116 (https://phabricator.wikimedia.org/T229397) (owner: 10Jbond)
[17:31:03] <wikibugs>	 (03CR) 10Tchanders: [C: 03+2] Revert "Update Event Stream for IPInfo events" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767101 (owner: 10STran)
[17:31:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21719 and previous config saved to /var/cache/conftool/dbconfig/20220302-173104-ladsgroup.json
[17:31:06] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
[17:31:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:31:07] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
[17:31:08] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[17:31:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:31:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:31:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21720 and previous config saved to /var/cache/conftool/dbconfig/20220302-173112-ladsgroup.json
[17:31:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:32:01] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Update Event Stream for IPInfo events" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767101 (owner: 10STran)
[17:34:43] <RhinosF1>	 Whose syncing that config patch
[17:34:54] <RhinosF1>	 phuedx: !
[17:35:03] <RhinosF1>	 Oh it's labs only
[17:35:13] <phuedx>	 RhinosF1: It's labs only so I presumed no sync
[17:35:27] <AntiComposite>	 unfortunately no stickers will be awarded for breaking and fixing beta
[17:35:34] <RhinosF1>	 phuedx: I missed the -labs
[17:35:54] * bd808 can make stickers if folks will fix beta ;)
[17:36:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21721 and previous config saved to /var/cache/conftool/dbconfig/20220302-173631-ladsgroup.json
[17:36:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:34] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[17:37:05] <taavi>	 bd808: I've fixed it multiple times :-P
[17:38:13] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] sre.puppet.sync-netbox-hiera: Cookbook for syncing netbox puppet data [cookbooks] - 10https://gerrit.wikimedia.org/r/739234 (https://phabricator.wikimedia.org/T229397) (owner: 10Jbond)
[17:38:20] <bd808>	 taavi: indeed! And that has been much appreciated by me. At this point I think you deserve a "real" 'I broke Wikipedia...' sticker 
[17:38:30] <wikibugs>	 (03PS1) 10Tchanders: Define IPInfo event stream on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767558 (https://phabricator.wikimedia.org/T296415)
[17:38:30] <RhinosF1>	 phuedx: does anything need to be done as won't /srv/mediawiki-staging be outdated
[17:38:48] <RhinosF1>	 Oh you said doing
[17:38:59] <RhinosF1>	 I guess I should go back to cooking
[17:39:14] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] gerrit: use raw subject for Phabricator comments [puppet] - 10https://gerrit.wikimedia.org/r/767521 (https://phabricator.wikimedia.org/T280197) (owner: 10Hashar)
[17:39:17] * taavi gets hopeful for an in-person hackathon one day
[17:40:05] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Ship it!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/747116 (https://phabricator.wikimedia.org/T229397) (owner: 10Jbond)
[17:40:14] <phuedx>	 RhinosF1: Just to confirm: Tran has updated the deployment host
[17:40:43] <phuedx>	 bd808: About those stickers... ;)
[17:40:48] <RhinosF1>	 phuedx: good
[17:42:20] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] reposync: add new class to manage syncing repositories [software/spicerack] - 10https://gerrit.wikimedia.org/r/747116 (https://phabricator.wikimedia.org/T229397) (owner: 10Jbond)
[17:42:43] <wikibugs>	 (03CR) 10TsepoThoabala: [C: 03+1] Define IPInfo event stream on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767558 (https://phabricator.wikimedia.org/T296415) (owner: 10Tchanders)
[17:42:55] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_wdqs_updater in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[17:49:19] <wikibugs>	 (03CR) 10STran: [C: 03+2] Define IPInfo event stream on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767558 (https://phabricator.wikimedia.org/T296415) (owner: 10Tchanders)
[17:49:37] <Tran>	 We reverted a config patch and are now deploying the correct patch to beta https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/767558
[17:49:59] <wikibugs>	 (03Merged) 10jenkins-bot: Define IPInfo event stream on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767558 (https://phabricator.wikimedia.org/T296415) (owner: 10Tchanders)
[17:51:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21722 and previous config saved to /var/cache/conftool/dbconfig/20220302-175136-ladsgroup.json
[17:51:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:55:01] <icinga-wm>	 PROBLEM - SSH on kubernetes2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:04:04] <wikibugs>	 (03CR) 10Dzahn: "thank you for merging" [puppet] - 10https://gerrit.wikimedia.org/r/762897 (owner: 10Majavah)
[18:04:25] <icinga-wm>	 RECOVERY - SSH on db2090.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:06:19] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] gerrit: use raw subject for Phabricator comments [puppet] - 10https://gerrit.wikimedia.org/r/767521 (https://phabricator.wikimedia.org/T280197) (owner: 10Hashar)
[18:06:37] <hashar>	 mutante: hopefully that one will not break too many things. Thx!
[18:06:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21723 and previous config saved to /var/cache/conftool/dbconfig/20220302-180640-ladsgroup.json
[18:06:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:56] <hashar>	 worse thing some phab comments look slightly off
[18:07:08] <mutante>	 hashar: ACK, let's test the way you described it, by uploading a patch containing double quotes
[18:07:24] <mutante>	 it's been applied ..now.
[18:07:31] <hashar>	 I might hav eone in test/gerrit-ping 
[18:07:41] <hashar>	 but I gotta focus on my current meeting, will test later this evening :]
[18:08:30] <wikibugs>	 (03PS1) 10Dzahn: ""double quotes"" are 'fun' "fun" ''fun'' \fun [puppet] - 10https://gerrit.wikimedia.org/r/767560 (https://phabricator.wikimedia.org/T281552)
[18:09:09] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/apertium: apply
[18:09:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:09:23] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/apertium: apply
[18:09:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:09:24] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/blubberoid: apply
[18:09:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:09:42] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/blubberoid: apply
[18:09:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:09:43] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/cxserver: apply
[18:09:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:00] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[18:10:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:01] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
[18:10:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:27] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
[18:10:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:28] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
[18:10:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:53] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
[18:10:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:54] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
[18:10:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:18] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
[18:11:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:20] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/eventgate-main: apply
[18:11:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:43] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
[18:11:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:44] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/eventstreams: apply
[18:11:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:12:07] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/eventstreams: apply
[18:12:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:12:08] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
[18:12:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:12:33] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
[18:12:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:12:34] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/linkrecommendation: apply
[18:12:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:12:51] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
[18:12:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:12:52] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/mathoid: apply
[18:12:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:13:11] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/mathoid: apply
[18:13:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:13:12] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/proton: apply
[18:13:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:13:31] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/proton: apply
[18:13:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:13:32] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[18:13:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:24] <wikibugs>	 (03CR) 10Dzahn: "looks good here https://phabricator.wikimedia.org/T281552#7748235" [puppet] - 10https://gerrit.wikimedia.org/r/767560 (https://phabricator.wikimedia.org/T281552) (owner: 10Dzahn)
[18:14:32] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[18:14:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:33] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] START helmfile.d/services/shellbox-media: apply
[18:14:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:35] <wikibugs>	 (03CR) 10Dzahn: "https://phabricator.wikimedia.org/T281552#7748235" [puppet] - 10https://gerrit.wikimedia.org/r/767521 (https://phabricator.wikimedia.org/T280197) (owner: 10Hashar)
[18:14:48] <wikibugs>	 (03Abandoned) 10Dzahn: ""double quotes"" are 'fun' "fun" ''fun'' \fun [puppet] - 10https://gerrit.wikimedia.org/r/767560 (https://phabricator.wikimedia.org/T281552) (owner: 10Dzahn)
[18:14:53] <logmsgbot>	 !log rzl@deploy1002 helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
[18:14:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:15:15] <hashar>	 mutante: ahh thanks for the test. That looks correct :]
[18:15:48] <mutante>	 hashar: :) yep, thanks for confirming
[18:16:01] <wikibugs>	 10SRE, 10Discovery: Test network optimizations in RELForge - https://phabricator.wikimedia.org/T301683 (10bking) 05Open→03Declined
[18:16:08] <wikibugs>	 10SRE, 10Discovery, 10Infrastructure-Foundations, 10netops: Speed up network connections for Elastic hosts - https://phabricator.wikimedia.org/T301577 (10bking)
[18:16:15] <hashar>	 I am marking the task solved again
[18:16:28] <mutante>	 +1
[18:16:31] <wikibugs>	 10SRE, 10Discovery: Test network optimizations in RELForge - https://phabricator.wikimedia.org/T301683 (10bking) Closing for now, will revisit when we have more concrete goals
[18:21:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21724 and previous config saved to /var/cache/conftool/dbconfig/20220302-182145-ladsgroup.json
[18:21:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
[18:21:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
[18:21:49] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[18:21:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21725 and previous config saved to /var/cache/conftool/dbconfig/20220302-182153-ladsgroup.json
[18:21:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:11] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[18:26:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:27:58] <wikibugs>	 (03PS1) 10Cathal Mooney: Adding includes for Netbox-generated zone files for eqiad evpn lb [dns] - 10https://gerrit.wikimedia.org/r/767562 (https://phabricator.wikimedia.org/T299758)
[18:28:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21726 and previous config saved to /var/cache/conftool/dbconfig/20220302-182809-ladsgroup.json
[18:28:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:28:14] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[18:28:34] <wikibugs>	 (03PS2) 10MewOphaswongse: GLAM event: Update wgGECampaigns and wgGECampaignTopics [mediawiki-config] - 10https://gerrit.wikimedia.org/r/766869 (https://phabricator.wikimedia.org/T301029)
[18:28:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Adding includes for Netbox-generated zone files for eqiad evpn lb [dns] - 10https://gerrit.wikimedia.org/r/767562 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[18:30:30] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[18:30:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:32:49] <wikibugs>	 (03PS2) 10Cathal Mooney: Adding includes for Netbox-generated zone files for eqiad evpn lb [dns] - 10https://gerrit.wikimedia.org/r/767562 (https://phabricator.wikimedia.org/T299758)
[18:42:55] <jinxer-wm>	 (ProbeHttpFailed) firing: (2) URL did not return HTTP 2xx or 3xx response (or probe/connection failed) - https://wikitech.wikimedia.org/wiki/Prometheus#Watchrat_Non-23xx_HTTP_response - https://grafana.wikimedia.org/d/GYciEga7z/watchrat - https://alerts.wikimedia.org
[18:43:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21727 and previous config saved to /var/cache/conftool/dbconfig/20220302-184314-ladsgroup.json
[18:43:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:45:01] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: Q3:(Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10Jclark-ctr)
[18:45:55] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work): Q3:(Need By: TBD) rack/setup/install elastic1089-1102 - https://phabricator.wikimedia.org/T299609 (10Jclark-ctr)
[18:46:42] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q3:(Need By: TBD) rack/setup/install parse100[01-24] - https://phabricator.wikimedia.org/T299573 (10Jclark-ctr)
[18:47:57] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: Q3:(Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10Jclark-ctr) a:05Jclark-ctr→03RobH
[18:48:23] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work): Q3:(Need By: TBD) rack/setup/install elastic1089-1102 - https://phabricator.wikimedia.org/T299609 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson
[18:49:28] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q3:(Need By: TBD) rack/setup/install parse100[01-24] - https://phabricator.wikimedia.org/T299573 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson
[18:50:07] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson
[18:56:33] <icinga-wm>	 RECOVERY - SSH on kubernetes2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:58:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21728 and previous config saved to /var/cache/conftool/dbconfig/20220302-185819-ladsgroup.json
[18:58:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:04] <jouncebot>	 brennen and dduvall: It is that lovely time of the day again! You are hereby commanded to deploy Train log triage with CPT. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T1900).
[19:00:05] <jouncebot>	 brennen and dduvall: #bothumor I � Unicode. All rise for MediaWiki train - Utc-7 Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T1900).
[19:01:52] <brennen>	 o/
[19:03:54] <brennen>	 (in a meeting, rolling forward shortly)
[19:08:56] <dduvall>	 brennen: o/ howdy
[19:10:20] <brennen>	 !log 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to group1
[19:10:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:23] <stashbot>	 T300200: 1.38.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T300200
[19:10:42] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install ms-be10[68-71] - https://phabricator.wikimedia.org/T299462 (10Jclark-ctr)
[19:11:32] <wikibugs>	 (03PS1) 10Brennen Bearnes: group1 wikis to 1.38.0-wmf.24  refs T300200 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767569
[19:11:34] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] group1 wikis to 1.38.0-wmf.24  refs T300200 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767569 (owner: 10Brennen Bearnes)
[19:12:18] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.38.0-wmf.24  refs T300200 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767569 (owner: 10Brennen Bearnes)
[19:13:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21729 and previous config saved to /var/cache/conftool/dbconfig/20220302-191323-ladsgroup.json
[19:13:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:13:27] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[19:13:45] <logmsgbot>	 !log brennen@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24  refs T300200
[19:13:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:14:36] <logmsgbot>	 !log brennen@deploy1002 Synchronized php: group1 wikis to 1.38.0-wmf.24  refs T300200 (duration: 00m 50s)
[19:14:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:14:56] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10Jclark-ctr)
[19:20:05] <wikibugs>	 (03PS1) 10Cathal Mooney: Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758)
[19:20:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[19:22:28] <dduvall>	 brennen: i'm seeing a new error `Class 'ApiFeatureUsageQueryEngineElastica' not found`
[19:22:35] <dduvall>	 i'll file a task
[19:22:43] <wikibugs>	 (03PS2) 10Cathal Mooney: Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758)
[19:23:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[19:24:00] <brennen>	 dduvall: just filed
[19:24:05] <brennen>	 sorry, missed the ping a second ago
[19:24:27] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on deploy2002 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:24:28] <dduvall>	 ah ok
[19:24:32] <brennen>	 T302907 - worth a rollback, you think?  fairly low level but it's ticking upwards.
[19:24:33] <stashbot>	 T302907: Error: Class 'ApiFeatureUsageQueryEngineElastica' not found - https://phabricator.wikimedia.org/T302907
[19:25:05] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2318 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1379 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on snapshot1008 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:12] <dancy>	 hmm.. that stuff again!
[19:25:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1339 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mwdebug1001 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1038 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2295 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2321 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2388 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2366 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:22] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2389 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:22] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2258 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:23] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2261 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:31] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on labweb1001 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:33] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1382 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:33] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1380 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:36] <dancy>	 Should clear faster this time. :-/
[19:25:39] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on snapshot1012 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:39] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1448 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:39] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1418 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:41] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1026 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:47] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1396 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:47] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1407 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:47] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2259 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:48] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2289 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:51] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1323 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:53] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2358 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:55] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1427 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:55] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1431 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:57] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2411 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:57] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on parse2004 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:57] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2323 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:58] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2351 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:58] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2352 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:25:58] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2273 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1025 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2378 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2402 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:05] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1415 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:05] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on parse2009 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2296 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2300 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2262 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:13] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2333 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2399 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:26:23] <taavi>	 brennen: I'll have a fix for that in a few moments
[19:26:41] <brennen>	 taavi: cool, ty
[19:27:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1385 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1371 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1332 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on parse2002 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2255 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:22] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2264 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:25] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1317 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:35] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1438 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:35] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1454 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:35] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1375 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:37] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2252 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:43] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2326 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:43] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2304 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:49] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1307 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:49] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1369 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:49] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1029 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:49] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2357 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:49] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2383 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:55] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1321 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:55] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1039 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:27:55] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1033 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1408 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1433 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1393 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1423 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1424 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1036 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:04] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1370 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:04] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1334 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:07] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1333 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:07] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1318 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:08] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1358 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:08] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2327 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:08] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2369 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:08] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2338 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:08] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2391 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:08] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2410 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:09] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2266 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1443 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1455 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1377 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1376 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2301 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:12] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2376 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:12] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2269 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1434 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1450 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1456 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1439 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1419 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:16] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2294 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:16] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2401 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:16] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on parse2007 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:17] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2373 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:17] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on parse2010 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:18] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2272 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:18] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2288 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:19] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mwdebug2002 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:19] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1446 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:20] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1309 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:20] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2395 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1041 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:21] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1040 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:22] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2380 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:22] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2406 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:23] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2375 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:23] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2387 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:24] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on parse2003 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:24] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1047 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:25] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2291 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:25] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2356 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:26] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2283 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:31] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2270 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:38] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1045 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:38] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1304 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:43] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1366 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:43] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1337 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:43] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2308 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:43] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2271 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:43] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2267 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:45] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2355 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:45] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2372 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:45] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2398 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:45] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2408 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:45] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2409 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:47] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1451 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:51] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1322 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:53] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on parse2005 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:53] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1414 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:53] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1406 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:53] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1357 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:53] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2370 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:28:58] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2400 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:01] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2407 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:05] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1034 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1417 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1435 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1312 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:17] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1345 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:19] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1308 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:19] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1356 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:19] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1342 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:23] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1359 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:23] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1413 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:23] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1421 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:23] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1437 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:27] <dancy>	 brennen:  Did the sync-apaches hang?
[19:29:27] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1425 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:28] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1373 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:28] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1348 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:28] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1441 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:29] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2316 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:29] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2319 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:29] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2386 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:29] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2279 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:31] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1311 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:31] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2354 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:31] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2397 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:33] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1326 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:33] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1316 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:35] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1368 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:35] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mwmaint2002 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:35] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1395 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:35] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1411 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:35] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1399 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:35] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1447 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:35] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1403 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:36] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1392 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:36] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1372 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:37] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on snapshot1009 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:37] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1453 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:38] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1330 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:38] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2381 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:39] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2396 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:39] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1388 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:40] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1028 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:41] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1338 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:41] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on parse2011 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:41] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1422 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:42] <hauskatze>	 o_O
[19:29:42] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1440 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:43] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1367 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:43] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1031 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:43] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1306 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:44] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2313 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:44] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2309 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:45] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2336 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:51] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1432 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:51] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2293 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:51] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2292 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:51] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2325 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:51] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2297 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:51] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2353 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:52] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2310 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:52] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2286 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:53] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2284 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:59] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1347 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:29:59] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on parse2001 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:01] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1428 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:01] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on wtp1027 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2374 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:03] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2251 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:09] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2298 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:09] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2306 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:09] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2299 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:11] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1331 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:13] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2339 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:13] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2360 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:13] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2359 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1409 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw1436 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:15] <icinga-wm>	 PROBLEM - Ensure local MW versions match expected deployment on mw2382 is CRITICAL: CRITICAL: 528 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers
[19:30:20] <brennen>	 dancy: sync-apaches: 100% (in-flight: 0, ok: 347; fail: 0; left: 0)                                                                                     
[19:30:21] <mutante>	 !log stopped icinga-wm
[19:30:22] <brennen>	 19:14:29 Finished sync-apaches (duration: 00m 08s)
[19:30:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:23] <wikibugs>	 (03PS3) 10Cathal Mooney: Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758)
[19:30:29] <taavi>	 brennen: dduvall: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ApiFeatureUsage/+/767571/
[19:30:43] <dancy>	 thx mutante
[19:30:48] <mutante>	 it's different this time. not just the 3 test wikis but ALL 528 versions
[19:30:52] <wikibugs>	 (03PS1) 10Majavah: Add a non-namespaced alias for ApiFeatureUsageQueryEngineElastica [extensions/ApiFeatureUsage] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/767103 (https://phabricator.wikimedia.org/T302907)
[19:30:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[19:30:58] <mutante>	 so there is a slightly different issue this time
[19:31:08] * dancy looks around
[19:31:35] <dancy>	 yeah, seems like wikiversions.json isn't being updated on targets.
[19:31:44] <RhinosF1>	 mutante: same issue, different set promoted
[19:31:57] <RhinosF1>	 Only 3 wikis changed version yesterday
[19:32:05] <mutante>	 RhinosF1: ACK
[19:32:20] <RhinosF1>	 All of group 1 just did
[19:32:37] <RhinosF1>	 I assume group1.dblist has 528 wikis in
[19:33:21] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.dns.netbox
[19:33:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:35:06] <wikibugs>	 (03PS4) 10Cathal Mooney: Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758)
[19:35:30] <brennen>	 dancy: i think it's getting updated - i get 659 for `grep -c '[.]24' /srv/mediawiki/wikiversions.json` on m1436, for example
[19:35:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[19:36:10] <dancy>	 brennen: Ok.. that's good..    So it looks like deploy1002's /srv/mediawiki/ dir isn't being updated.  Looking into that.
[19:36:21] <brennen>	 right on, thanks
[19:36:23] <logmsgbot>	 !log robh@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:36:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:37:09] <wikibugs>	 (03PS5) 10Cathal Mooney: Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758)
[19:37:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[19:38:10] <brennen>	 taavi: thanks for patch.  i'll verify the backport on an mwdebug and then sync.
[19:38:45] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Add a non-namespaced alias for ApiFeatureUsageQueryEngineElastica [extensions/ApiFeatureUsage] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/767103 (https://phabricator.wikimedia.org/T302907) (owner: 10Majavah)
[19:40:41] <wikibugs>	 (03Merged) 10jenkins-bot: Add a non-namespaced alias for ApiFeatureUsageQueryEngineElastica [extensions/ApiFeatureUsage] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/767103 (https://phabricator.wikimedia.org/T302907) (owner: 10Majavah)
[19:43:34] <dancy>	 brennen: Bug located.  Working on packaging it up.
[19:44:54] <wikibugs>	 (03PS1) 10Eigyan: wmf-config: Undeploy the fawiki test survey from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767574 (https://phabricator.wikimedia.org/T300291)
[19:45:12] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
[19:45:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:46:27] <logmsgbot>	 !log brennen@deploy1002 Synchronized php-1.38.0-wmf.24/extensions/ApiFeatureUsage: Backport: [[gerrit:767103|Add a non-namespaced alias for ApiFeatureUsageQueryEngineElastica (T302907)]] (duration: 00m 50s)
[19:46:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:46:30] <stashbot>	 T302907: Error: Class 'ApiFeatureUsageQueryEngineElastica' not found - https://phabricator.wikimedia.org/T302907
[19:47:34] <logmsgbot>	 !log robh@cumin1001 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
[19:47:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:49:12] <mutante>	 dancy: should I manually copy wikiversions.json on deploy1002 maybe?
[19:49:34] <mutante>	 seeing fix now
[19:49:38] <dancy>	 No thank you.  I have a fix to test in a bit.
[19:50:56] <mutante>	 I hit +2 on that one. But you would have to deploy again, right?
[19:50:59] <wikibugs>	 (03CR) 10Eigyan: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767574 (https://phabricator.wikimedia.org/T300291) (owner: 10Eigyan)
[19:51:09] <mutante>	 or we can scap pull manually on deploy1002 now
[19:51:48] <dancy>	 Re-running `scap sync-wikiversions` (with the updated scap code) should do the trick.  I asked Brennen to run it when he has a moment.
[19:52:41] <brennen>	 dancy, mutante: running now
[19:52:46] <mutante>	 ok, thanks all!
[19:53:31] <brennen>	 hrm: 19:53:08 ['bin/scap', 'pull', '--no-update-l10n', 'deploy2002.codfw.wmnet', 'deploy1002.eqiad.wmnet', 'deploy1002.eqiad.wmnet'] (ran as mwdeploy@mw1450.eqiad.wmnet) returned [127]: bash: bin/scap: No such file or directory
[19:53:41] <logmsgbot>	 !log brennen@deploy1002 rebuilt and synchronized wikiversions files: (no justification provided)
[19:53:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:43] <dancy>	 ah.. prefix the command with SCAP=scap
[19:54:14] <brennen>	 running
[19:56:09] <mutante>	 rescheduling all the icinga alerts for MW versions
[19:56:11] <wikibugs>	 (03CR) 10Mepps: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767574 (https://phabricator.wikimedia.org/T300291) (owner: 10Eigyan)
[19:57:58] <logmsgbot>	 !log brennen@deploy1002 rebuilt and synchronized wikiversions files: (no justification provided)
[19:58:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:58:13] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] check_mw_versions.py: Fix problem induced by recent scap changes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/767242 (https://phabricator.wikimedia.org/T302832) (owner: 10Ahmon Dancy)
[19:58:22] <brennen>	 finished cleanly that go.
[19:58:23] <wikibugs>	 (03CR) 10Jsn.sherman: [C: 04-1] "This looks like it removes the survey from labs (beta) but not prod?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767574 (https://phabricator.wikimedia.org/T300291) (owner: 10Eigyan)
[19:58:45] <dancy>	 Thanks for testing.  I'll package up a scap release.
[19:58:54] <mutante>	 brennen: ACK, many recoveries in Icinga, I keep telling it to speed up
[19:59:32] <mutante>	 down to 329 from 400
[20:00:11] <brennen>	 dduvall: also filed T302918, not sure if user facing impact at the moment.
[20:00:11] <stashbot>	 T302918: Linter: PHP Warning: in_array() expects parameter 2 to be array, null given - https://phabricator.wikimedia.org/T302918
[20:01:05] <mutante>	 ok, fixed most. "only" 37 CRITs (:/) that are all unrelated though. so I will turn the bot back on
[20:03:26] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.dns.netbox
[20:03:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:58] <dancy>	 https://phabricator.wikimedia.org/T302919 filed to request a new scap release.
[20:04:23] <brennen>	 thanks dancy.  meanwhile i'll use the local checkout if a version change comes up again.
[20:04:33] <dancy>	 👍🏾
[20:07:43] <logmsgbot>	 !log robh@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:07:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:22] <wikibugs>	 (03PS2) 10Eigyan: wmf-config: Undeploy the fawiki test survey from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767574 (https://phabricator.wikimedia.org/T300291)
[20:08:23] <dancy>	 Taking a break now that the chaos has died down.
[20:11:03] <wikibugs>	 (03CR) 10Eigyan: wmf-config: Undeploy the fawiki test survey from production (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767574 (https://phabricator.wikimedia.org/T300291) (owner: 10Eigyan)
[20:11:31] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.provision for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED
[20:11:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:12:42] <wikibugs>	 (03Abandoned) 10Eigyan: wmf-config: Undeploy the fawiki test survey from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767574 (https://phabricator.wikimedia.org/T300291) (owner: 10Eigyan)
[20:12:55] <jinxer-wm>	 (ProbeHttpFailed) firing: (2) URL did not return HTTP 2xx or 3xx response (or probe/connection failed) - https://wikitech.wikimedia.org/wiki/Prometheus#Watchrat_Non-23xx_HTTP_response - https://grafana.wikimedia.org/d/GYciEga7z/watchrat - https://alerts.wikimedia.org
[20:18:36] <wikibugs>	 (03PS1) 10Eigyan: wmf-config: Undeploy the fawiki test survey from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767580 (https://phabricator.wikimedia.org/T300291)
[20:20:41] <logmsgbot>	 !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED
[20:20:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:21:26] <mutante>	 dancy: brennen: new scap built and uploaded, updating ticket and docs ..because new build host
[20:23:14] <wikibugs>	 (03PS2) 10Eigyan: wmf-config: Undeploy the fawiki test survey from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767580 (https://phabricator.wikimedia.org/T300291)
[20:23:52] <mutante>	 not deployed yet,be back after lunch
[20:23:55] <wikibugs>	 (03PS6) 10Cathal Mooney: Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758)
[20:24:30] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[20:24:40] <wikibugs>	 (03CR) 10Eigyan: "Fixed wrong file update 😊" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767580 (https://phabricator.wikimedia.org/T300291) (owner: 10Eigyan)
[20:26:24] <wikibugs>	 (03CR) 10JHathaway: [C: 03+1] firmware fact: drop firmware_bios (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/765574 (owner: 10Jbond)
[20:31:13] <wikibugs>	 (03PS7) 10Cathal Mooney: Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758)
[20:31:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[20:33:16] <wikibugs>	 (03PS8) 10Cathal Mooney: Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758)
[20:33:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[20:35:35] <wikibugs>	 (03PS9) 10Cathal Mooney: Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758)
[20:36:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[20:37:18] <wikibugs>	 (03PS10) 10Cathal Mooney: Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758)
[20:37:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add site variable for EVPN overlay loopback subnets and CR filter [homer/public] - 10https://gerrit.wikimedia.org/r/767570 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[20:38:04] <mutante>	 !log rolling out scap 4.4.2 to A:mw-canary or A:parsoid-canary or A:mw-jobrunner-canary (T302919)
[20:38:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:38:08] <stashbot>	 T302919: Deploy Scap version 4.4.2 - https://phabricator.wikimedia.org/T302919
[20:41:37] <wikibugs>	 (03CR) 10Jsn.sherman: [C: 03+1] "Looks good to me!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767580 (https://phabricator.wikimedia.org/T300291) (owner: 10Eigyan)
[20:44:43] <mutante>	 !log testec 'scap pull' still worked on mwdebug1001; rolling out scap 4.4.2 to A:restbase-canary (T302919)
[20:44:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:44:46] <stashbot>	 T302919: Deploy Scap version 4.4.2 - https://phabricator.wikimedia.org/T302919
[20:45:00] <icinga-wm>	 PROBLEM - Check systemd state on cp6010 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_exim4.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:47:51] <logmsgbot>	 !log dzahn@deploy1002 Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
[20:47:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:48:32] <logmsgbot>	 !log dzahn@deploy1002 Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 41s)
[20:48:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:48:47] <mutante>	 !log running test-deploy to devcluster (restbase) to test new scap version, succesful and then rolled back, as the docs say T302919
[20:48:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:54:54] <dancy>	 Thanks mutante!!
[20:56:33] <brennen>	 ^
[20:57:06] <mutante>	 all is done except the "roll out to all"
[20:57:12] <mutante>	 you have it on canaries
[20:57:29] <dancy>	 The one and only place it is really needed is deploy1002
[20:57:56] <dancy>	 scap clients (all other hosts) are unaffected by the code change
[20:59:11] <eigyan>	 greetings
[20:59:29] <dancy>	 Hello there.
[20:59:44] <wikibugs>	 (03PS1) 10RobH: dumpsdata1007 info [puppet] - 10https://gerrit.wikimedia.org/r/767584 (https://phabricator.wikimedia.org/T299443)
[21:00:05] <jouncebot>	 RoanKattouw and Urbanecm: (Dis)respected human, time to deploy UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T2100). Please do the needful.
[21:00:05] <jouncebot>	 eigyan: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:22] <mutante>	 !log deploy1002 - upgraded scap to 4.4.2-1 T302919
[21:00:23] <eigyan>	 I am here
[21:00:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:24] <wikibugs>	 (03CR) 10RobH: [C: 03+2] dumpsdata1007 info [puppet] - 10https://gerrit.wikimedia.org/r/767584 (https://phabricator.wikimedia.org/T299443) (owner: 10RobH)
[21:00:26] <mutante>	 dancy: done!
[21:00:26] <stashbot>	 T302919: Deploy Scap version 4.4.2 - https://phabricator.wikimedia.org/T302919
[21:00:32] <dancy>	 woooord
[21:00:33] <dancy>	 Thanks!
[21:01:24] <mutante>	 yep, good docs were helpful
[21:02:08] <thcipriani>	 <3 mutante 
[21:03:50] <icinga-wm>	 PROBLEM - SSH on dns5001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:04:22] <dancy>	 I'm going to test it.
[21:05:09] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
[21:05:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:16] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation, 10Patch-For-Review: Q3:(Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bull...
[21:09:45] <RhinosF1>	 eigyan: hi
[21:09:53] <RhinosF1>	 dancy: are you able to help with B&C
[21:10:01] <dancy>	 Sure
[21:10:14] <logmsgbot>	 !log dancy@deploy1002 rebuilt and synchronized wikiversions files: testing scap 4.4.2
[21:10:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:10:33] <dancy>	 What's on the list?
[21:10:37] <RhinosF1>	 dancy: there's just 1 patch from eigyan
[21:10:43] <dancy>	 mutante: Test confirmed.
[21:10:50] <mutante>	 dancy: :) great, thanks
[21:10:58] <RhinosF1>	 https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/767580
[21:11:02] <icinga-wm>	 PROBLEM - Disk space on centrallog1001 is CRITICAL: DISK CRITICAL - free space: /srv 34728 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=centrallog1001&var-datasource=eqiad+prometheus/ops
[21:11:02] <dancy>	 thx
[21:11:23] <wikibugs>	 (03CR) 10RhinosF1: [C: 03+1] wmf-config: Undeploy the fawiki test survey from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767580 (https://phabricator.wikimedia.org/T300291) (owner: 10Eigyan)
[21:11:36] <dancy>	 ok, letting 'er rip
[21:11:48] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+2] wmf-config: Undeploy the fawiki test survey from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767580 (https://phabricator.wikimedia.org/T300291) (owner: 10Eigyan)
[21:12:31] <wikibugs>	 (03Merged) 10jenkins-bot: wmf-config: Undeploy the fawiki test survey from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767580 (https://phabricator.wikimedia.org/T300291) (owner: 10Eigyan)
[21:13:19] <dancy>	 deployed to mwdebug.
[21:13:33] <RhinosF1>	 eigyan: please test ^
[21:13:45] <logmsgbot>	 !log robh@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
[21:13:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:13:49] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: Q3:(Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bullseye executed with errors:...
[21:13:49] <eigyan>	 Thanks RhinosF1will do
[21:13:51] <RhinosF1>	 Just need to make sure it no longer shows when forced I guess
[21:14:29] <RhinosF1>	 eigyan: please ping dancy once you've checked
[21:15:04] <eigyan>	 sure thing RhinosF1
[21:17:59] <eigyan>	 dancy mwdebug looks good on my end
[21:18:10] <dancy>	 ok, rolling out.
[21:19:12] <logmsgbot>	 !log dancy@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:767580|wmf-config: Undeploy the fawiki test survey from production (T300291)]] (duration: 00m 50s)
[21:19:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:19:16] <stashbot>	 T300291: Undeploy the fawiki test survey FROM PRODUCTION - https://phabricator.wikimedia.org/T300291
[21:20:04] <RhinosF1>	 dancy: thanks for helping
[21:20:13] <dancy>	 No problem.  
[21:20:53] <RhinosF1>	 eigyan: it should be live in production now, please let us know if you need anything else / have issues
[21:21:00] <RhinosF1>	 And have a good evening!
[21:21:17] <eigyan>	 thank you so much everyone have a great night
[21:21:58] <RhinosF1>	 :)
[21:35:39] <wikibugs>	 10SRE, 10Security-Team, 10Performance-Team (Radar), 10SecTeam-Processed, 10Security: Security API Storage Needs - https://phabricator.wikimedia.org/T301428 (10sbassett) Hey @Joe - just wondering if you had any thoughts or guidance regarding my previous comment.  If not, I think we'll explore using MySQL...
[21:36:15] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
[21:36:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:36:22] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: Q3:(Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bullseye
[21:38:35] <mutante>	 jouncebot: now
[21:38:35] <jouncebot>	 For the next 0 hour(s) and 21 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T2100)
[21:39:15] <mutante>	 dancy: everything quiet? Then I roll out scap on everything now 
[21:42:55] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job jmx_wdqs_updater in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[21:43:20] <dancy>	 mutante: All is well.
[21:44:39] <mutante>	 !log rolling out scap 4.4.2 on 'all' T302919
[21:44:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:44:42] <stashbot>	 T302919: Deploy Scap version 4.4.2 - https://phabricator.wikimedia.org/T302919
[21:46:14] <mutante>	 dancy: looks alright, it finished and should be done globally
[21:46:30] <dancy>	 Thanks again. That was a fast turnaround.
[21:46:35] <mutante>	 ignores this: 
[21:46:36] <mutante>	 The following hosts were unreachable:
[21:46:36] <mutante>	 puppet
[21:46:37] <mutante>	 :)
[21:46:44] <dancy>	 haha
[21:47:39] <mutante>	 cool, I know it took longer sometimes in the past
[21:49:46] <brennen>	 jouncebot now
[21:49:46] <jouncebot>	 For the next 0 hour(s) and 10 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220302T2100)
[21:50:38] <brennen>	 i'm going to backport https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Linter/+/767582
[21:51:11] <wikibugs>	 (03PS1) 10Brennen Bearnes: Hooks.php: Check for non-array $tags [extensions/Linter] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/767104 (https://phabricator.wikimedia.org/T302918)
[21:51:24] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Hooks.php: Check for non-array $tags [extensions/Linter] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/767104 (https://phabricator.wikimedia.org/T302918) (owner: 10Brennen Bearnes)
[21:52:47] <wikibugs>	 (03PS1) 10Reedy: Use namespaced ApiFeatureUsageQueryEngineElastica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767596 (https://phabricator.wikimedia.org/T301044)
[21:53:23] <ryankemper>	 !log T276198 Disabled puppet across all of elastic*, cloudelastic*, and relforge* to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on a single elastic host
[21:53:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:53:27] <stashbot>	 T276198: /var/run/elasticsearch deleted by elasticsearch - https://phabricator.wikimedia.org/T276198
[21:53:51] <wikibugs>	 (03Merged) 10jenkins-bot: Hooks.php: Check for non-array $tags [extensions/Linter] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/767104 (https://phabricator.wikimedia.org/T302918) (owner: 10Brennen Bearnes)
[21:53:58] <wikibugs>	 (03CR) 10Reedy: [C: 04-2] "Needs to wait till `wmf/1.38.0-wmf.24` is stable and everywhere" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767596 (https://phabricator.wikimedia.org/T301044) (owner: 10Reedy)
[21:55:17] <wikibugs>	 (03CR) 10Bking: [C: 03+2] elastic: prevent rundir from deletion [puppet] - 10https://gerrit.wikimedia.org/r/766876 (https://phabricator.wikimedia.org/T276198) (owner: 10Bking)
[21:57:01] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, one optional nit inline" [dns] - 10https://gerrit.wikimedia.org/r/767562 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[21:59:44] <wikibugs>	 (03PS2) 10Reedy: Use namespaced ApiFeatureUsageQueryEngineElastica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767596 (https://phabricator.wikimedia.org/T302907)
[21:59:44] <logmsgbot>	 !log brennen@deploy1002 Synchronized php-1.38.0-wmf.24/extensions/Linter/includes/Hooks.php: Backport: [[gerrit:767104|Hooks.php: Check for non-array $tags (T302918)]] (duration: 00m 50s)
[21:59:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:59:48] <stashbot>	 T302918: Linter: PHP Warning: in_array() expects parameter 2 to be array, null given - https://phabricator.wikimedia.org/T302918
[22:05:05] <logmsgbot>	 !log robh@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
[22:05:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:05:11] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: Q3:(Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bullseye executed with errors:...
[22:05:32] <icinga-wm>	 RECOVERY - SSH on dns5001.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:12:14] <icinga-wm>	 PROBLEM - Check systemd state on elastic1052 is CRITICAL: CRITICAL - degraded: The following units failed: elasticsearch-disable-readahead.service,elasticsearch_6@production-search-eqiad.service,elasticsearch_6@production-search-psi-eqiad.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:12:42] <RhinosF1>	 inflatador: ^
[22:13:03] <RhinosF1>	 ryankemper: ^
[22:13:22] <ryankemper>	 RhinosF1: thanks
[22:13:29] <ryankemper>	 also reminds me I missed a log message
[22:13:34] <RhinosF1>	 np
[22:16:30] <ryankemper>	 !log T276198 Testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on `elastic1052`; elasticsearch service fails to start. It's expecting to find `/etc/tmpfiles.d/elasticsearch-production-search-psi-eqiad.conf` but the actual filename is `elasticsearch-production-search-psi-eqiad-conf.conf`. Not sure why that trailing `-conf` is there in the filename. It doesn't look like something `systemd::tmpfile` is doing.
[22:16:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:16:36] <stashbot>	 T276198: /var/run/elasticsearch deleted by elasticsearch - https://phabricator.wikimedia.org/T276198
[22:19:42] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on elastic1052 is CRITICAL: CRITICAL - degraded: The following units failed: elasticsearch-disable-readahead.service,elasticsearch_6@production-search-eqiad.service,elasticsearch_6@production-search-psi-eqiad.service Ryan Kemper https://phabricator.wikimedia.org/T276198 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:20:05] <ryankemper>	 acked the alerts, going to downtime now (should have downtimed it earlier)
[22:21:04] <ryankemper>	 !log T276198 Downtimed `elastic1052` for 2 hours while troubleshooting
[22:21:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:28:10] <wikibugs>	 10SRE, 10Discovery-Search (Current work): /var/run/elasticsearch deleted by elasticsearch - https://phabricator.wikimedia.org/T276198 (10MoritzMuehlenhoff) >>! In T276198#7749141, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://sal.toolforge.org/log/r360TH8B8Fs0LH...
[22:35:03] <wikibugs>	 10SRE, 10Discovery-Search (Current work): /var/run/elasticsearch deleted by elasticsearch - https://phabricator.wikimedia.org/T276198 (10RKemper) >>! In T276198#7749192, @MoritzMuehlenhoff wrote: >>>! In T276198#7749141, @Stashbot wrote: >> {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=ht...
[22:35:32] <wikibugs>	 (03PS1) 10Dzahn: devtools: copy yaml key/values over from gitlab-runner project for test [puppet] - 10https://gerrit.wikimedia.org/r/767599 (https://phabricator.wikimedia.org/T297659)
[22:36:36] <wikibugs>	 (03PS1) 10Ryan Kemper: elastic: fix filename of tmpfile [puppet] - 10https://gerrit.wikimedia.org/r/767600 (https://phabricator.wikimedia.org/T276198)
[22:37:05] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/767600 (https://phabricator.wikimedia.org/T276198) (owner: 10Ryan Kemper)
[22:38:08] <wikibugs>	 (03PS2) 10Dzahn: devtools: copy yaml key/values over from gitlab-runner project for test [puppet] - 10https://gerrit.wikimedia.org/r/767599 (https://phabricator.wikimedia.org/T297659)
[22:38:48] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] devtools: copy yaml key/values over from gitlab-runner project for test [puppet] - 10https://gerrit.wikimedia.org/r/767599 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[22:41:03] <wikibugs>	 (03PS2) 10Ryan Kemper: elastic: fix filename of tmpfile [puppet] - 10https://gerrit.wikimedia.org/r/767600 (https://phabricator.wikimedia.org/T276198)
[22:41:19] <wikibugs>	 (03PS3) 10Dzahn: devtools: copy yaml key/values over from gitlab-runner project for test [puppet] - 10https://gerrit.wikimedia.org/r/767599 (https://phabricator.wikimedia.org/T297659)
[22:41:36] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/apertium: apply
[22:41:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:42:18] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/apertium: apply
[22:42:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:42:19] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/blubberoid: apply
[22:42:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:42:50] <wikibugs>	 (03PS3) 10Ryan Kemper: elastic: fix filename of tmpfile [puppet] - 10https://gerrit.wikimedia.org/r/767600 (https://phabricator.wikimedia.org/T276198)
[22:43:05] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
[22:43:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:06] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[22:43:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:54] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[22:43:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:55] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
[22:43:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:44:13] <wikibugs>	 (03PS4) 10Dzahn: devtools: copy yaml key/values over from gitlab-runner project for test [puppet] - 10https://gerrit.wikimedia.org/r/767599 (https://phabricator.wikimedia.org/T297659)
[22:44:33] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] devtools: copy yaml key/values over from gitlab-runner project for test [puppet] - 10https://gerrit.wikimedia.org/r/767599 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[22:45:19] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
[22:45:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:45:21] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
[22:45:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:46:24] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
[22:46:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:46:26] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
[22:46:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:47:00] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] elastic: fix filename of tmpfile [puppet] - 10https://gerrit.wikimedia.org/r/767600 (https://phabricator.wikimedia.org/T276198) (owner: 10Ryan Kemper)
[22:47:18] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
[22:47:20] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/eventgate-main: apply
[22:47:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:47:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:43] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
[22:48:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:44] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/eventstreams: apply
[22:48:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:49:40] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
[22:49:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:49:42] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
[22:49:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:50:30] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
[22:50:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:50:31] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
[22:50:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:51:48] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
[22:51:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:51:49] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/mathoid: apply
[22:51:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:52:44] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/mathoid: apply
[22:52:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:52:45] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/proton: apply
[22:52:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:54:32] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/proton: apply
[22:54:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:54:33] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
[22:54:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:55:45] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
[22:55:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:55:46] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] START helmfile.d/services/shellbox-media: apply
[22:55:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:37] <logmsgbot>	 !log rzl@deploy1002 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
[22:56:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:05:12] <wikibugs>	 (03PS1) 10RobH: dumpsdata1007 raid testing [puppet] - 10https://gerrit.wikimedia.org/r/767602 (https://phabricator.wikimedia.org/T299443)
[23:05:35] <wikibugs>	 (03CR) 10RobH: [C: 03+2] dumpsdata1007 raid testing [puppet] - 10https://gerrit.wikimedia.org/r/767602 (https://phabricator.wikimedia.org/T299443) (owner: 10RobH)
[23:06:34] <icinga-wm>	 PROBLEM - Check systemd state on snapshot1008 is CRITICAL: CRITICAL - degraded: The following units failed: wikidatardf-truthy-dumps.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:06:46] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] check_mw_versions.py: Fix problem induced by recent scap changes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/767242 (https://phabricator.wikimedia.org/T302832) (owner: 10Ahmon Dancy)
[23:07:26] <wikibugs>	 (03PS1) 10Ryan Kemper: elastic: disable readahead script needs new fp [puppet] - 10https://gerrit.wikimedia.org/r/767603 (https://phabricator.wikimedia.org/T276198)
[23:08:41] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] elastic: disable readahead script needs new fp [puppet] - 10https://gerrit.wikimedia.org/r/767603 (https://phabricator.wikimedia.org/T276198) (owner: 10Ryan Kemper)
[23:08:57] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
[23:08:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:09:05] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation, 10Patch-For-Review: Q3:(Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bull...
[23:10:56] <icinga-wm>	 RECOVERY - Check systemd state on elastic1052 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:15:20] <logmsgbot>	 !log robh@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
[23:15:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:15:26] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: Q3:(Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bullseye executed with errors:...
[23:17:41] <wikibugs>	 10SRE, 10DC-Ops: datadumps1007 test installs - https://phabricator.wikimedia.org/T302937 (10RobH)
[23:20:55] <wikibugs>	 (03PS1) 10Dzahn: aptrepo: import gitlab-runner package for bullseye [puppet] - 10https://gerrit.wikimedia.org/r/767604 (https://phabricator.wikimedia.org/T297659)
[23:21:08] <ryankemper>	 !log T276198 https://gerrit.wikimedia.org/r/c/operations/puppet/+/767600 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/767603/ fixed all the problems. Re-enabling puppet on elastic*, cloudelastic*, and relforge* shortly
[23:21:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:21:11] <stashbot>	 T276198: /var/run/elasticsearch deleted by elasticsearch - https://phabricator.wikimedia.org/T276198
[23:21:56] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
[23:21:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:22:00] <wikibugs>	 10SRE, 10DC-Ops: datadumps1007 test installs - https://phabricator.wikimedia.org/T302937 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bullseye
[23:24:00] <wikibugs>	 (03PS2) 10Dzahn: aptrepo: import gitlab-runner package for bullseye [puppet] - 10https://gerrit.wikimedia.org/r/767604 (https://phabricator.wikimedia.org/T297659)
[23:25:15] <ryankemper>	 !log T276198 Re-enabled puppet across fleet: `ryankemper@cumin1001:~$ sudo -E cumin 'R:Elasticsearch::instance' 'enable-puppet "deploy fix from T276198"'`
[23:25:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:32:48] <wikibugs>	 10SRE, 10WMF-General-or-Unknown, 10WMF-Legal, 10Documentation, and 2 others: Default license for operations/puppet - https://phabricator.wikimedia.org/T67270 (10Dzahn) I hereby license all my existing contributions to the operations/puppet under the Apache 2.0 license.  ---  Maybe we can get the patch from...
[23:32:58] <logmsgbot>	 !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
[23:32:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:37:28] <logmsgbot>	 !log robh@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
[23:37:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:47:04] <logmsgbot>	 !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
[23:47:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:47:08] <wikibugs>	 10SRE, 10DC-Ops: datadumps1007 test installs - https://phabricator.wikimedia.org/T302937 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bullseye completed: - dumpsdata1007 (**WARN**)   - Removed from Puppet and PuppetDB if presen...
[23:49:42] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "nice! looks good, removes "labs" etc :)" [puppet] - 10https://gerrit.wikimedia.org/r/767484 (https://phabricator.wikimedia.org/T297411) (owner: 10Jelto)
[23:50:40] <icinga-wm>	 PROBLEM - Check correctness of the icinga configuration on alert1001 is CRITICAL: Icinga configuration contains errors https://wikitech.wikimedia.org/wiki/Icinga
[23:52:30] <mutante>	 robh: icinga config does not like dumpsdata1007 right now ..because of:
[23:52:37] <mutante>	 Error: 'lsw1-f1-eqiad.mgmt.eqiad.wmnet' is not a valid parent for host 'dumpsdata1007'
[23:52:41] <wikibugs>	 10SRE, 10DC-Ops: datadumps1007 test installs - https://phabricator.wikimedia.org/T302937 (10RobH) so this is installed now with hwraid1 single disk setup just to see if it even works within the OS.  When I then launch the OS, it loads, but any megacli commands hang it.
[23:53:09] <robh>	 interesting, perhaps a new issue due to new row?
[23:53:17] <mutante>	 seems like it, yea
[23:53:32] <mutante>	 as if the new "parent interface" needs to be added somewhere
[23:53:42] <wikibugs>	 10SRE, 10DC-Ops: datadumps1007 test installs - https://phabricator.wikimedia.org/T302937 (10RobH) >  > 15:52 mutante: > robh: icinga config does not like dumpsdata1007 right now ..because of: Error: 'lsw1-f1-eqiad.mgmt.eqiad.wmnet' is not a valid parent for host 'dumpsdata1007'
[23:53:42] <robh>	 appending to the test task
[23:53:54] <robh>	 thx for the heads up, ill just maint mode it for now
[23:53:59] <mutante>	 ACK
[23:54:19] <mutante>	 so the thing is ..nothing happens as long as Icinga does not get restarted but if it does then it would go down
[23:54:48] <robh>	 oh wait
[23:54:50] <robh>	 i misunderstood
[23:54:53] <robh>	 icinga CONFIG
[23:55:00] <mutante>	 yea, the config check
[23:55:13] <robh>	 mutante: hrmmm, ok, so i guess .... i have no idea who would go about fixing that
[23:55:33] <robh>	 i dont understand the parent thing, like i guess other servers have their switches as parents?
[23:56:27] <mutante>	 yea, so some icinga hosts or services can have "parents" in the sense that children are not supposed to alert if the parent is down
[23:56:42] <mutante>	 like "if the whole switch is down dont flood the channel with all the HOST down messages"
[23:57:09] <mutante>	 somewhere we must have the switches itself in icinga
[23:57:12] <mutante>	 taking a look
[23:57:23] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH) dumpsdata1007 is online with OS but doesnlt seem megacli works for it?  robh@dumpsdata1007:~$ sudo megacli -LDInfo -Lall -aALL                                        Adapter 0 -- Virtual Drive Info...
[23:58:02] <mutante>	 icinga does not know "lsw1-f1-eqiad.mgmt.eqiad.wmnet" but for some reason the host is already trying to say that is its parent
[23:59:06] <robh>	 that is totally the switch it connects to
[23:59:12] <mutante>	 except it's right there... lsw1-f1-eqiad.mgmt.eqiad.wmnet
[23:59:14] <robh>	 so that makes sense but i didnt realize that icinga didnt know what it was
[23:59:31] <mutante>	 now I thought I found it and we just have to add the new switch
[23:59:41] <mutante>	 in hieradata/common/monitoring.yaml