[00:00:15] <icinga-wm>	 RECOVERY - Check systemd state on puppetmaster1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:00:51] <icinga-wm>	 RECOVERY - Check systemd state on puppetmaster2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:03:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P26958 and previous config saved to /var/cache/conftool/dbconfig/20220429-000327-ladsgroup.json
[00:03:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:13:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1138 (T298565)', diff saved to https://phabricator.wikimedia.org/P26959 and previous config saved to /var/cache/conftool/dbconfig/20220429-001320-ladsgroup.json
[00:13:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:13:27] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[00:13:27] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
[00:13:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
[00:13:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:13:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1138 (T298565)', diff saved to https://phabricator.wikimedia.org/P26960 and previous config saved to /var/cache/conftool/dbconfig/20220429-001333-ladsgroup.json
[00:13:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:13:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T306560)', diff saved to https://phabricator.wikimedia.org/P26961 and previous config saved to /var/cache/conftool/dbconfig/20220429-001832-ladsgroup.json
[00:18:34] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[00:18:35] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[00:18:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:39] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[00:18:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3312 (T306560)', diff saved to https://phabricator.wikimedia.org/P26962 and previous config saved to /var/cache/conftool/dbconfig/20220429-001840-ladsgroup.json
[00:18:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:25:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1138 (T298565)', diff saved to https://phabricator.wikimedia.org/P26963 and previous config saved to /var/cache/conftool/dbconfig/20220429-002518-ladsgroup.json
[00:25:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:25:26] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[00:35:08] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-03-08 phpfpm worker saturation - https://phabricator.wikimedia.org/T307165 (10lmata)
[00:36:04] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-03-10_MediaWiki_availability - https://phabricator.wikimedia.org/T307166 (10lmata)
[00:36:38] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-03-27 api - https://phabricator.wikimedia.org/T307167 (10lmata)
[00:36:41] <icinga-wm>	 PROBLEM - SSH on pki2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:37:00] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: Incidents/2022-03-27 wdqs outage - https://phabricator.wikimedia.org/T307168 (10lmata)
[00:37:34] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-03-29 network - https://phabricator.wikimedia.org/T307169 (10lmata)
[00:37:46] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
[00:37:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:38:07] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-03-31 api errors - https://phabricator.wikimedia.org/T307170 (10lmata)
[00:38:40] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
[00:38:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:40:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P26964 and previous config saved to /var/cache/conftool/dbconfig/20220429-004023-ladsgroup.json
[00:40:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:41:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T306560)', diff saved to https://phabricator.wikimedia.org/P26965 and previous config saved to /var/cache/conftool/dbconfig/20220429-004157-ladsgroup.json
[00:42:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:42:04] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[00:44:02] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-03-10_MediaWiki_availability - https://phabricator.wikimedia.org/T307166 (10lmata)
[00:44:17] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
[00:44:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:44:28] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3), 10Data-Persistence (Consultation), 10Platform Engineering, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Incident: 2022-03-10 MediaWiki availability affected due to a database query processing slowdown affecting ... - https://phabricator.wikimedia.org/T303499
[00:45:21] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-03-27 wdqs outage - https://phabricator.wikimedia.org/T307168 (10lmata)
[00:46:02] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-02-22 eqiad-eqord saturation - https://phabricator.wikimedia.org/T307158 (10lmata)
[00:46:39] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs2005.mgmt.codfw.wmnet with reboot policy FORCED
[00:46:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:47:01] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host aqs2007.mgmt.codfw.wmnet with reboot policy FORCED
[00:47:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:47:28] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): incidents occurring in Q3 have been scored with the scorecard - https://phabricator.wikimedia.org/T299977 (10lmata) 05Open→03In progress p:05Triage→03Medium
[00:50:46] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-02-06_wdqs_updater - https://phabricator.wikimedia.org/T307156 (10lmata) 05Open→03Resolved a:03lmata scorecard and document on wikitech, resolving
[00:51:22] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-02-10 Envoy overflow - https://phabricator.wikimedia.org/T307157 (10lmata)
[00:52:50] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-02-10 Envoy overflow - https://phabricator.wikimedia.org/T307157 (10lmata) 05Open→03In progress missing metadata
[00:53:11] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
[00:53:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:54:01] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1080 is CRITICAL: CRITICAL - degraded: The following units failed: hadoop-yarn-nodemanager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:54:02] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-02-22 eqiad-eqord saturation - https://phabricator.wikimedia.org/T307158 (10lmata) scorecard done, missing metadata
[00:54:17] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1080 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[00:55:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P26966 and previous config saved to /var/cache/conftool/dbconfig/20220429-005528-ladsgroup.json
[00:55:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:55:39] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-02-22_vrts - https://phabricator.wikimedia.org/T307159 (10lmata) 05Open→03In progress scorecard complete, missing metadata.
[00:56:19] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1080 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:56:35] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1080 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[00:56:37] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
[00:56:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:56:55] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host aqs2008.mgmt.codfw.wmnet with reboot policy FORCED
[00:56:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:57:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P26967 and previous config saved to /var/cache/conftool/dbconfig/20220429-005702-ladsgroup.json
[00:57:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:58:55] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-03-01_ulsfo_network - https://phabricator.wikimedia.org/T307161 (10lmata) 05Open→03In progress scorecard complete, missing metadata.
[01:03:21] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:04:54] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs2007.mgmt.codfw.wmnet with reboot policy FORCED
[01:04:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:06:22] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-03-29 network - https://phabricator.wikimedia.org/T307169 (10lmata) scorecard is incomplete
[01:10:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1138 (T298565)', diff saved to https://phabricator.wikimedia.org/P26968 and previous config saved to /var/cache/conftool/dbconfig/20220429-011033-ladsgroup.json
[01:10:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:10:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[01:10:41] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[01:10:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[01:10:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:10:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1147 (T298565)', diff saved to https://phabricator.wikimedia.org/P26969 and previous config saved to /var/cache/conftool/dbconfig/20220429-011046-ladsgroup.json
[01:10:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:10:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:12:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P26970 and previous config saved to /var/cache/conftool/dbconfig/20220429-011207-ladsgroup.json
[01:12:08] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): incidents occurring in Q3 have been scored with the scorecard - https://phabricator.wikimedia.org/T299977 (10lmata)
[01:12:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:12:37] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[01:21:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298565)', diff saved to https://phabricator.wikimedia.org/P26971 and previous config saved to /var/cache/conftool/dbconfig/20220429-012137-ladsgroup.json
[01:21:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:21:44] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[01:27:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T306560)', diff saved to https://phabricator.wikimedia.org/P26972 and previous config saved to /var/cache/conftool/dbconfig/20220429-012713-ladsgroup.json
[01:27:14] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[01:27:16] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[01:27:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:27:20] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[01:27:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:27:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[01:27:27] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[01:27:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:27:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:27:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:27:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[01:27:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[01:27:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:27:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:27:52] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[01:27:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[01:27:55] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[01:27:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:28:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:28:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[01:28:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:28:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:28:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[01:28:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[01:28:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:28:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1182 (T306560)', diff saved to https://phabricator.wikimedia.org/P26973 and previous config saved to /var/cache/conftool/dbconfig/20220429-012827-ladsgroup.json
[01:28:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:28:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:31:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T306560)', diff saved to https://phabricator.wikimedia.org/P26974 and previous config saved to /var/cache/conftool/dbconfig/20220429-013141-ladsgroup.json
[01:31:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:35:43] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs2008.mgmt.codfw.wmnet with reboot policy FORCED
[01:35:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:36:37] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host aqs2009.mgmt.codfw.wmnet with reboot policy FORCED
[01:36:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:36:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P26975 and previous config saved to /var/cache/conftool/dbconfig/20220429-013642-ladsgroup.json
[01:36:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:37:01] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host aqs2010.mgmt.codfw.wmnet with reboot policy FORCED
[01:37:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:39:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:46:12] <wikibugs>	 (03PS20) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[01:46:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P26976 and previous config saved to /var/cache/conftool/dbconfig/20220429-014646-ladsgroup.json
[01:46:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:48:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[01:49:45] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:51:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P26977 and previous config saved to /var/cache/conftool/dbconfig/20220429-015147-ladsgroup.json
[01:51:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:51:58] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.07 ms
[01:52:11] <wikibugs>	 (03PS21) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[01:54:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[01:56:18] <icinga-wm>	 RECOVERY - SSH on labweb1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:01:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P26978 and previous config saved to /var/cache/conftool/dbconfig/20220429-020151-ladsgroup.json
[02:01:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:02:20] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Papaul) @Cmjohnson what i am seeing in the partman recipe that the server is using ,line 10  is removing any existing LVM  ` 10...
[02:06:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298565)', diff saved to https://phabricator.wikimedia.org/P26979 and previous config saved to /var/cache/conftool/dbconfig/20220429-020652-ladsgroup.json
[02:06:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:06:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[02:06:59] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[02:07:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[02:07:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:07:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1147 (T298565)', diff saved to https://phabricator.wikimedia.org/P26980 and previous config saved to /var/cache/conftool/dbconfig/20220429-020705-ladsgroup.json
[02:07:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:07:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:07:48] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs2009.mgmt.codfw.wmnet with reboot policy FORCED
[02:07:50] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs2010.mgmt.codfw.wmnet with reboot policy FORCED
[02:07:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:07:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:08:44] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host aqs2011.mgmt.codfw.wmnet with reboot policy FORCED
[02:08:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:09:20] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host aqs2012.mgmt.codfw.wmnet with reboot policy FORCED
[02:09:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:16:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T306560)', diff saved to https://phabricator.wikimedia.org/P26981 and previous config saved to /var/cache/conftool/dbconfig/20220429-021657-ladsgroup.json
[02:16:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[02:17:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[02:17:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[02:17:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:17:04] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[02:17:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[02:17:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:17:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1156 (T306560)', diff saved to https://phabricator.wikimedia.org/P26982 and previous config saved to /var/cache/conftool/dbconfig/20220429-021710-ladsgroup.json
[02:17:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:17:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:17:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:17:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:17:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298565)', diff saved to https://phabricator.wikimedia.org/P26983 and previous config saved to /var/cache/conftool/dbconfig/20220429-021735-ladsgroup.json
[02:17:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:17:42] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[02:19:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T306560)', diff saved to https://phabricator.wikimedia.org/P26984 and previous config saved to /var/cache/conftool/dbconfig/20220429-021924-ladsgroup.json
[02:19:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:32:37] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs2011.mgmt.codfw.wmnet with reboot policy FORCED
[02:32:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P26985 and previous config saved to /var/cache/conftool/dbconfig/20220429-023240-ladsgroup.json
[02:32:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:32:44] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs2012.mgmt.codfw.wmnet with reboot policy FORCED
[02:32:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:32:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:34:10] <wikibugs>	 (03PS22) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[02:34:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P26986 and previous config saved to /var/cache/conftool/dbconfig/20220429-023429-ladsgroup.json
[02:34:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:35:19] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10Papaul)
[02:36:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[02:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[02:44:01] <wikibugs>	 (03PS23) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[02:45:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[02:47:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P26987 and previous config saved to /var/cache/conftool/dbconfig/20220429-024745-ladsgroup.json
[02:47:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:49:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P26988 and previous config saved to /var/cache/conftool/dbconfig/20220429-024934-ladsgroup.json
[02:49:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:51:32] <wikibugs>	 (03PS24) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[02:52:33] <wikibugs>	 (03PS25) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[02:53:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[02:54:56] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): incidents occurring in Q3 have been scored with the scorecard - https://phabricator.wikimedia.org/T299977 (10lmata)
[02:55:33] <wikibugs>	 (03PS26) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[02:57:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[03:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[03:02:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298565)', diff saved to https://phabricator.wikimedia.org/P26989 and previous config saved to /var/cache/conftool/dbconfig/20220429-030250-ladsgroup.json
[03:02:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:02:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[03:02:58] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[03:02:59] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[03:03:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:03:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1147 (T298565)', diff saved to https://phabricator.wikimedia.org/P26990 and previous config saved to /var/cache/conftool/dbconfig/20220429-030303-ladsgroup.json
[03:03:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:03:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:04:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T306560)', diff saved to https://phabricator.wikimedia.org/P26991 and previous config saved to /var/cache/conftool/dbconfig/20220429-030439-ladsgroup.json
[03:04:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
[03:04:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
[03:04:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:04:46] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[03:04:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1129 (T306560)', diff saved to https://phabricator.wikimedia.org/P26992 and previous config saved to /var/cache/conftool/dbconfig/20220429-030447-ladsgroup.json
[03:04:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:04:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:05:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:09:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T306560)', diff saved to https://phabricator.wikimedia.org/P26993 and previous config saved to /var/cache/conftool/dbconfig/20220429-030900-ladsgroup.json
[03:09:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:13:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298565)', diff saved to https://phabricator.wikimedia.org/P26994 and previous config saved to /var/cache/conftool/dbconfig/20220429-031328-ladsgroup.json
[03:13:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:13:35] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[03:14:55] <wikibugs>	 (03PS27) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[03:17:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[03:22:45] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[03:24:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P26995 and previous config saved to /var/cache/conftool/dbconfig/20220429-032405-ladsgroup.json
[03:24:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:28:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P26996 and previous config saved to /var/cache/conftool/dbconfig/20220429-032833-ladsgroup.json
[03:28:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:39:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P26997 and previous config saved to /var/cache/conftool/dbconfig/20220429-033910-ladsgroup.json
[03:39:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:43:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P26998 and previous config saved to /var/cache/conftool/dbconfig/20220429-034338-ladsgroup.json
[03:43:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:50:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[03:54:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T306560)', diff saved to https://phabricator.wikimedia.org/P26999 and previous config saved to /var/cache/conftool/dbconfig/20220429-035415-ladsgroup.json
[03:54:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:54:22] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[03:58:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298565)', diff saved to https://phabricator.wikimedia.org/P27000 and previous config saved to /var/cache/conftool/dbconfig/20220429-035843-ladsgroup.json
[03:58:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:58:49] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[04:10:03] <icinga-wm>	 PROBLEM - Query Service HTTP Port on wdqs1006 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 298 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[05:06:49] <icinga-wm>	 PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[05:08:47] <icinga-wm>	 PROBLEM - BGP status on cr3-eqsin is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[05:17:27] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:25:01] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10Marostegui) I am seeing the procurement tasks being processed already, does that mean we have established that this controller will work 100% for us then?
[05:27:48] <wikibugs>	 (03PS1) 10Marostegui: pc2012: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/787580
[05:28:36] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] pc2012: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/787580 (owner: 10Marostegui)
[06:30:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1132 T301879', diff saved to https://phabricator.wikimedia.org/P27001 and previous config saved to /var/cache/conftool/dbconfig/20220429-063019-marostegui.json
[06:30:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:30:28] <stashbot>	 T301879: Test MariaDB 10.6 on Bullseye - https://phabricator.wikimedia.org/T301879
[06:31:19] <wikibugs>	 (03PS1) 10Marostegui: db1132: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/787583
[06:31:57] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1132: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/787583 (owner: 10Marostegui)
[06:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[06:58:36] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is CRITICAL: 42.5 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6
[07:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220429T0700)
[07:00:50] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6
[07:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[07:07:56] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Reimaging db2103 T303171
[07:08:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:08:03] <stashbot>	 T303171: Upgrade s1 to Bullseye - https://phabricator.wikimedia.org/T303171
[07:08:05] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Reimaging db2103 T303171
[07:08:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:09:01] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db2103.codfw.wmnet with reason: Rebooting for T303171
[07:09:03] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2103.codfw.wmnet with reason: Rebooting for T303171
[07:09:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:09:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:17:41] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "LGTM, see a minor (optional) improvement to the template." [puppet] - 10https://gerrit.wikimedia.org/r/784798 (https://phabricator.wikimedia.org/T305581) (owner: 10RLazarus)
[07:18:04] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10MoritzMuehlenhoff) >>! In T297913#7890055, @Marostegui wrote: > I am seeing the procurement tasks being processed already, does that mean we have established that this controller will work 100% for us th...
[07:22:21] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove expiry date for kcv-wikimf [puppet] - 10https://gerrit.wikimedia.org/r/787686
[07:22:45] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[07:26:07] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reimage for host ms-be2040.codfw.wmnet with OS bullseye
[07:26:11] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1001 for host ms-be2040.codfw.wmnet with OS bullseye
[07:26:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:27:08] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Add the needed relationship chain." [puppet] - 10https://gerrit.wikimedia.org/r/784761 (https://phabricator.wikimedia.org/T305581) (owner: 10RLazarus)
[07:27:17] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.reimage for host db2103.codfw.wmnet with OS bullseye
[07:27:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:31:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove expiry date for kcv-wikimf [puppet] - 10https://gerrit.wikimedia.org/r/787686 (owner: 10Muehlenhoff)
[07:34:50] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "The code LGTM; however I'd prefer a different solution, where we use a single header for this." [puppet] - 10https://gerrit.wikimedia.org/r/784774 (https://phabricator.wikimedia.org/T305581) (owner: 10RLazarus)
[07:37:29] <wikibugs>	 (03CR) 10Awight: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787689 (https://phabricator.wikimedia.org/T307110) (owner: 10Awight)
[07:40:55] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db2103.codfw.wmnet with reason: host reimage
[07:41:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:43:47] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2103.codfw.wmnet with reason: host reimage
[07:43:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:45:42] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10Marostegui) Thanks @MoritzMuehlenhoff!
[07:46:30] <wikibugs>	 (03CR) 10Awight: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787690 (https://phabricator.wikimedia.org/T306867) (owner: 10Awight)
[07:49:06] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Enable the versioned mapdata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787689 (https://phabricator.wikimedia.org/T307110) (owner: 10Awight)
[07:49:28] <wikibugs>	 (03CR) 10Filippo Giunchedi: "The basic idea (if I understood correctly) looks good to me, thank you! I propose a quick chat next week to hash out ideas/thoughts in a h" [puppet] - 10https://gerrit.wikimedia.org/r/787067 (owner: 10Jbond)
[07:49:38] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Enable CodeMirror colorblind-friendly palette [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787690 (https://phabricator.wikimedia.org/T306867) (owner: 10Awight)
[07:50:01] <logmsgbot>	 !log mvernon@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2040.codfw.wmnet with OS bullseye
[07:50:06] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1001 for host ms-be2040.codfw.wmnet with OS bullseye executed with errors: - ms-be2040 (**FAIL**)...
[07:50:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:50:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[07:51:01] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reimage for host ms-be2040.codfw.wmnet with OS bullseye
[07:51:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:51:06] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1001 for host ms-be2040.codfw.wmnet with OS bullseye
[07:53:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, modulo possibly unrelated change" [puppet] - 10https://gerrit.wikimedia.org/r/787109 (https://phabricator.wikimedia.org/T301017) (owner: 10Cwhite)
[07:55:08] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2103.codfw.wmnet with OS bullseye
[07:55:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:31] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1163.eqiad.wmnet with reason: Rebooting for T303171
[08:00:33] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1163.eqiad.wmnet with reason: Rebooting for T303171
[08:00:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:38] <stashbot>	 T303171: Upgrade s1 to Bullseye - https://phabricator.wikimedia.org/T303171
[08:00:38] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1163 depooling: Rebooting for T303171', diff saved to https://phabricator.wikimedia.org/P27003 and previous config saved to /var/cache/conftool/dbconfig/20220429-080038-kormat.json
[08:00:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:01:23] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
[08:01:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:01:58] <wikibugs>	 (03CR) 10Volans: "replies inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/775904 (owner: 10Volans)
[08:02:12] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.reimage for host db1163.eqiad.wmnet with OS bullseye
[08:02:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:03:14] <logmsgbot>	 !log jelto@cumin1001 conftool action : set/pooled=no; selector: name=mw1323.eqiad.wmnet
[08:03:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:00] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
[08:10:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:13:28] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: host reimage
[08:13:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:14:55] <logmsgbot>	 !log mvernon@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2040.codfw.wmnet with OS bullseye
[08:14:59] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1001 for host ms-be2040.codfw.wmnet with OS bullseye executed with errors: - ms-be2040 (**FAIL**)...
[08:15:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:16:25] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1163.eqiad.wmnet with reason: host reimage
[08:16:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:18:41] <wikibugs>	 (03CR) 10WMDE-Fisch: [C: 03+1] Enable the versioned mapdata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787689 (https://phabricator.wikimedia.org/T307110) (owner: 10Awight)
[08:21:43] <jelto>	 !log scap pull on mw1323
[08:21:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:05] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reimage for host ms-be2040.codfw.wmnet with OS bullseye
[08:27:10] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1001 for host ms-be2040.codfw.wmnet with OS bullseye
[08:27:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:53] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1163.eqiad.wmnet with OS bullseye
[08:27:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:28:50] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Reboot T303171', diff saved to https://phabricator.wikimedia.org/P27004 and previous config saved to /var/cache/conftool/dbconfig/20220429-082850-kormat.json
[08:28:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:28:57] <stashbot>	 T303171: Upgrade s1 to Bullseye - https://phabricator.wikimedia.org/T303171
[08:30:41] <wikibugs>	 (03CR) 10WMDE-Fisch: [C: 03+1] Enable CodeMirror colorblind-friendly palette [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787690 (https://phabricator.wikimedia.org/T306867) (owner: 10Awight)
[08:33:33] <logmsgbot>	 !log jelto@cumin1001 conftool action : set/pooled=yes; selector: name=mw1323.eqiad.wmnet
[08:33:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:58] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-02-01_ulsfo_network - https://phabricator.wikimedia.org/T307154 (10jcrespo) Done, the incident actually happened on the 2022-01-31 UTC, which confused me.
[08:34:06] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-02-01_ulsfo_network - https://phabricator.wikimedia.org/T307154 (10jcrespo) a:03jcrespo
[08:34:47] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): incidents occurring in Q3 have been scored with the scorecard - https://phabricator.wikimedia.org/T299977 (10jcrespo)
[08:35:11] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): Incident: 2022-02-01_ulsfo_network - https://phabricator.wikimedia.org/T307154 (10jcrespo) 05Open→03Resolved
[08:36:00] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): incidents occurring in Q3 have been scored with the scorecard - https://phabricator.wikimedia.org/T299977 (10jcrespo)
[08:37:32] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): incidents occurring in Q3 have been scored with the scorecard - https://phabricator.wikimedia.org/T299977 (10jcrespo)
[08:43:54] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Reboot T303171', diff saved to https://phabricator.wikimedia.org/P27005 and previous config saved to /var/cache/conftool/dbconfig/20220429-084354-kormat.json
[08:44:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:44:03] <stashbot>	 T303171: Upgrade s1 to Bullseye - https://phabricator.wikimedia.org/T303171
[08:46:39] <dcausse>	 !log restarting blazegraph on wdqs1006 (deadlocked for 18hours, T242453) 
[08:46:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:45] <stashbot>	 T242453: Detect and alert and/or remediate Blazegraph deadlocks - https://phabricator.wikimedia.org/T242453
[08:47:04] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1006 is OK: HTTP OK: HTTP/1.1 200 OK - 690 bytes in 1.117 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[08:47:16] <icinga-wm>	 RECOVERY - Query Service HTTP Port on wdqs1006 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.036 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[08:58:46] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2040.codfw.wmnet with reason: host reimage
[08:58:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:58:58] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Reboot T303171', diff saved to https://phabricator.wikimedia.org/P27006 and previous config saved to /var/cache/conftool/dbconfig/20220429-085858-kormat.json
[08:59:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:59:05] <stashbot>	 T303171: Upgrade s1 to Bullseye - https://phabricator.wikimedia.org/T303171
[09:02:07] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2040.codfw.wmnet with reason: host reimage
[09:02:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:03:03] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): incidents occurring in Q3 have been scored with the scorecard - https://phabricator.wikimedia.org/T299977 (10jcrespo) FYI: this template needs update for the new scoring: https://docs.google.com/document/d/1uZgzqURvPGdTw-GaG7gcIcStSR1bXBfsDszQ2M3tuWo/edit
[09:04:56] <jinxer-wm>	 (Storage /var over 50%) firing: Alert for device cloudsw1-e4-eqiad.mgmt.eqiad.wmnet - Storage /var over 50%   - https://alerts.wikimedia.org/?q=alertname%3DStorage+%2Fvar+over+50%25
[09:09:34] <wikibugs>	 10SRE-swift-storage, 10Maps, 10Product-Infrastructure-Team-Backlog: Followups for Tegola and Swift interactions - https://phabricator.wikimedia.org/T307184 (10fgiunchedi)
[09:13:16] <wikibugs>	 (03CR) 10Jbond: "thanks both ill mark this as WIP until we have discussed in the meeting" [puppet] - 10https://gerrit.wikimedia.org/r/787067 (owner: 10Jbond)
[09:14:02] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Reboot T303171', diff saved to https://phabricator.wikimedia.org/P27007 and previous config saved to /var/cache/conftool/dbconfig/20220429-091401-kormat.json
[09:14:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:14:09] <stashbot>	 T303171: Upgrade s1 to Bullseye - https://phabricator.wikimedia.org/T303171
[09:14:56] <jinxer-wm>	 (Storage /var over 50%) firing: (2) Alert for device cloudsw1-e4-eqiad.mgmt.eqiad.wmnet - Storage /var over 50%   - https://alerts.wikimedia.org/?q=alertname%3DStorage+%2Fvar+over+50%25
[09:20:12] <logmsgbot>	 !log dcaro@cumin1001 START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bullseye
[09:20:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:20] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by dcaro@cumin1001 for host cloudnet2006-dev.codfw.wmne...
[09:20:28] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:24:48] <wikibugs>	 (03CR) 10Jbond: service: add new module to expose service::catalog (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/775904 (owner: 10Volans)
[09:25:35] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.hosts.reboot-cluster
[09:25:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:53] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2040.codfw.wmnet with OS bullseye
[09:27:56] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1001 for host ms-be2040.codfw.wmnet with OS bullseye completed: - ms-be2040 (**PASS**)   - Removed...
[09:27:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:33:03] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2040 is CRITICAL: CRITICAL - load average: 200.20, 104.28, 43.74 https://wikitech.wikimedia.org/wiki/Swift
[09:33:07] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti-test2002.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[09:33:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:33:56] <wikibugs>	 10SRE-swift-storage, 10Maps, 10Product-Infrastructure-Team-Backlog, 10User-fgiunchedi: Followups for Tegola and Swift interactions - https://phabricator.wikimedia.org/T307184 (10fgiunchedi)
[09:35:11] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2002.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[09:35:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:06] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1164.eqiad.wmnet with reason: Rebooting for T303171
[09:36:08] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1164.eqiad.wmnet with reason: Rebooting for T303171
[09:36:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:13] <stashbot>	 T303171: Upgrade s1 to Bullseye - https://phabricator.wikimedia.org/T303171
[09:36:14] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1164 depooling: Rebooting for T303171', diff saved to https://phabricator.wikimedia.org/P27008 and previous config saved to /var/cache/conftool/dbconfig/20220429-093613-kormat.json
[09:36:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:10] <logmsgbot>	 !log dcaro@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
[09:39:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:16] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure: deployment-cache-upload05: Several millions of logstash error entries - https://phabricator.wikimedia.org/T243129 (10fgiunchedi) 05Open→03Declined Can't find any occurrence now, declining
[09:39:56] <jinxer-wm>	 (Storage /var over 50%) resolved: (2) Device cloudsw1-e4-eqiad.mgmt.eqiad.wmnet recovered from Storage /var over 50%   - https://alerts.wikimedia.org/?q=alertname%3DStorage+%2Fvar+over+50%25
[09:41:36] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] auto_schema: Wrap starting replication with finally [software] - 10https://gerrit.wikimedia.org/r/775895 (owner: 10Ladsgroup)
[09:42:12] <wikibugs>	 (03PS4) 10Kormat: mariadb: Use ROW binlog_format for db_inventory. [puppet] - 10https://gerrit.wikimedia.org/r/775330 (https://phabricator.wikimedia.org/T301315)
[09:42:12] <logmsgbot>	 !log dcaro@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
[09:42:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:44] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.reimage for host db1164.eqiad.wmnet with OS bullseye
[09:42:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:06] <wikibugs>	 10SRE, 10audits-data-retention: Implement Data Retention Guidelines - https://phabricator.wikimedia.org/T83531 (10fgiunchedi)
[09:43:37] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: Use ROW binlog_format for db_inventory. [puppet] - 10https://gerrit.wikimedia.org/r/775330 (https://phabricator.wikimedia.org/T301315) (owner: 10Kormat)
[09:44:57] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] mariadb: Use ROW binlog_format for db_inventory. [puppet] - 10https://gerrit.wikimedia.org/r/775330 (https://phabricator.wikimedia.org/T301315) (owner: 10Kormat)
[09:45:38] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "see comment for -1, otherwise lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[09:47:44] <wikibugs>	 (03PS6) 10Kormat: mariadb: Reference the actual VRTS passwords in the m2 grants file. [puppet] - 10https://gerrit.wikimedia.org/r/764744 (https://phabricator.wikimedia.org/T303272)
[09:47:48] <wikibugs>	 (03CR) 10Awight: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787699 (https://phabricator.wikimedia.org/T296759) (owner: 10Awight)
[09:47:59] <wikibugs>	 (03CR) 10Awight: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787698 (owner: 10Awight)
[09:48:52] <wikibugs>	 10SRE: mwdeploy does not have the same user ID on all Apaches - https://phabricator.wikimedia.org/T79786 (10fgiunchedi) 05Open→03Invalid I'm boldly resolving this old task since AFAICS the infra and deploy methods have changed enough that this hasn't surfaced as a problem anymore.
[09:51:28] <wikibugs>	 (03CR) 10Kormat: [V: 03+1] "PCC SUCCESS (NOOP 1 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35004/console" [puppet] - 10https://gerrit.wikimedia.org/r/764744 (https://phabricator.wikimedia.org/T303272) (owner: 10Kormat)
[09:52:41] <wikibugs>	 (03CR) 10Kormat: [V: 03+1 C: 03+2] mariadb: Reference the actual VRTS passwords in the m2 grants file. [puppet] - 10https://gerrit.wikimedia.org/r/764744 (https://phabricator.wikimedia.org/T303272) (owner: 10Kormat)
[09:54:11] <wikibugs>	 10SRE: Make inventory of (private) data backups on all systems - https://phabricator.wikimedia.org/T83522 (10fgiunchedi) 05Open→03Invalid Resolving this old task since likely obsolete at this point
[09:54:13] <wikibugs>	 10SRE, 10audits-data-retention: Implement Data Retention Guidelines - https://phabricator.wikimedia.org/T83531 (10fgiunchedi)
[09:56:50] <logmsgbot>	 !log dcaro@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bullseye
[09:56:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:00] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dcaro@cumin1001 for host cloudnet2006-dev.codfw.wmnet wi...
[09:57:20] <moritzm>	 !log drain ganeti-test2003 T306499
[09:57:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:26] <stashbot>	 T306499: Upgrade ganeti-test to Bullseye - https://phabricator.wikimedia.org/T306499
[09:57:32] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reimage for host ms-be1040.eqiad.wmnet with OS bullseye
[09:57:36] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1001 for host ms-be1040.eqiad.wmnet with OS bullseye
[09:57:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:50] <wikibugs>	 10SRE, 10Patch-For-Review: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937 (10fgiunchedi)
[09:57:52] <wikibugs>	 10SRE: Setup basic infrastructure services in codfw - https://phabricator.wikimedia.org/T84350 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Resolving as completed since codfw has been up and running for years now
[09:58:43] <Lucas_WMDE>	 ^ \o/
[09:59:09] <godog>	 hehehe, the joys of SRE clinic duty
[10:03:37] <wikibugs>	 10SRE: Make ops-l a list for humans again (no cheating) - https://phabricator.wikimedia.org/T117508 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi I think nowadays `ops@` is in a pretty good place WRT automated emails, boldly resolving!
[10:05:08] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1064 is CRITICAL: CRITICAL - degraded: The following units failed: swift-drive-audit.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:06:13] <wikibugs>	 10SRE: Weak digest algorithm (SHA1) used to sign InRelease on apt.wikimedia.org - https://phabricator.wikimedia.org/T132325 (10fgiunchedi) 05Open→03Declined I don't think this is relevant anymore (and I personally haven't seen the warning either), therefore I'm declining the task
[10:06:57] <logmsgbot>	 !log kormat@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1164.eqiad.wmnet with OS bullseye
[10:07:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:07:45] <wikibugs>	 10SRE, 10Technical-Debt: Reduce etcd technical debt - https://phabricator.wikimedia.org/T135122 (10fgiunchedi)
[10:07:51] <wikibugs>	 10SRE: Install a second etcd cluster in codfw - https://phabricator.wikimedia.org/T135125 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi We do have conf2* up and running nowadays, resolving
[10:08:49] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1040.eqiad.wmnet with reason: host reimage
[10:08:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:42] <wikibugs>	 (03PS2) 10Jbond: puppetdb: add query_facts function [puppet] - 10https://gerrit.wikimedia.org/r/787547
[10:11:48] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1040.eqiad.wmnet with reason: host reimage
[10:11:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:56] <wikibugs>	 (03PS3) 10Jbond: C:ssh::publish_fingerprints: update to use new_query facts function [puppet] - 10https://gerrit.wikimedia.org/r/787548
[10:13:07] <wikibugs>	 10SRE, 10Phabricator: Phabricator leaving old files in /tmp - https://phabricator.wikimedia.org/T150396 (10fgiunchedi) 05Open→03Invalid Doesn't seem to be an issue anymore, must have got fixed at some point in Phab's release cycle
[10:14:16] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Clean up unnecessary two-level setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787698 (owner: 10Awight)
[10:15:55] <wikibugs>	 10SRE: Internal PKI for secure communication - Barcelona Ops offsite 2016 - https://phabricator.wikimedia.org/T150822 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi We do have a PKI now (e.g. T194031) so I'm going to resolve this task. cc @jbond in case there's material (hah!) here that could be useful as...
[10:19:24] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-2] "I 've been meaning to remove this module finally now that kubernetes hosts (the only users I know of) are no longer using the docker devic" [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[10:19:37] <wikibugs>	 (03PS1) 10Muehlenhoff: Update d-i setting to not prompt for firmware [puppet] - 10https://gerrit.wikimedia.org/r/787704
[10:20:09] <wikibugs>	 (03PS3) 10Jbond: puppetdb: add query_facts function [puppet] - 10https://gerrit.wikimedia.org/r/787547
[10:20:42] <wikibugs>	 (03PS4) 10Jbond: C:ssh::publish_fingerprints: update to use new_query facts function [puppet] - 10https://gerrit.wikimedia.org/r/787548
[10:27:19] <wikibugs>	 10ops-eqiad, 10DBA: db1164 fails to POST/boot/etc - https://phabricator.wikimedia.org/T307198 (10Kormat)
[10:30:50] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1040.eqiad.wmnet with OS bullseye
[10:30:54] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1001 for host ms-be1040.eqiad.wmnet with OS bullseye completed: - ms-be1040 (**PASS**)   - Downtim...
[10:30:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:17] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA: db1164 fails to POST/boot/etc - https://phabricator.wikimedia.org/T307198 (10Kormat) On the web console, i can see it get as far as this before resetting: {F35073130}
[10:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[10:38:43] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Remove the puppet lvm module [puppet] - 10https://gerrit.wikimedia.org/r/787708 (https://phabricator.wikimedia.org/T67270)
[10:40:08] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "sudo cumin 'R:class=lvm'" [puppet] - 10https://gerrit.wikimedia.org/r/787708 (https://phabricator.wikimedia.org/T67270) (owner: 10Alexandros Kosiaris)
[10:40:45] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-2] "Started the removal in https://gerrit.wikimedia.org/r/c/operations/puppet/+/787708" [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[10:42:23] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Fleet wide PCC running as id: 35008" [puppet] - 10https://gerrit.wikimedia.org/r/787708 (https://phabricator.wikimedia.org/T67270) (owner: 10Alexandros Kosiaris)
[10:43:30] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2040 is CRITICAL: CRITICAL - degraded: The following units failed: swift-container.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:43:59] <wikibugs>	 (03Abandoned) 10Jgiannelos: Reduce production image size [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/702661 (owner: 10Jgiannelos)
[10:44:17] <wikibugs>	 (03PS4) 10Jgiannelos: Deprecate unused maps event stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/747196 (https://phabricator.wikimedia.org/T293366)
[10:44:30] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be1040 is CRITICAL: CRITICAL - load average: 285.39, 175.65, 84.65 https://wikitech.wikimedia.org/wiki/Swift
[10:45:15] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "I see profile::swift::storage::labs uses this too. I 'll reach out to WMCS to make sure no VM uses this (nothing in production seems to us" [puppet] - 10https://gerrit.wikimedia.org/r/787708 (https://phabricator.wikimedia.org/T67270) (owner: 10Alexandros Kosiaris)
[10:45:22] <wikibugs>	 (03CR) 10Jgiannelos: [C: 03+1] Deprecate unused maps event stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/747196 (https://phabricator.wikimedia.org/T293366) (owner: 10Jgiannelos)
[10:47:27] <wikibugs>	 (03Abandoned) 10Jgiannelos: Add changeprop rules for DelayeEchoNotificationJob [deployment-charts] - 10https://gerrit.wikimedia.org/r/636416 (owner: 10Jgiannelos)
[10:49:40] <wikibugs>	 10SRE, 10WMF-General-or-Unknown, 10WMF-Legal, 10Documentation, and 2 others: Default license for operations/puppet - https://phabricator.wikimedia.org/T67270 (10akosiaris) https://gerrit.wikimedia.org/r/787708 removes the puppet lvm module which is GPL-2 and incompatible with apache 2.0. So that removes an...
[10:50:50] <icinga-wm>	 PROBLEM - Host ms-be2040 is DOWN: PING CRITICAL - Packet loss = 100%
[10:52:31] <logmsgbot>	 !log jelto@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
[10:52:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:53:17] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be2040 is OK: OK - load average: 13.89, 3.55, 1.20 https://wikitech.wikimedia.org/wiki/Swift
[10:53:19] <icinga-wm>	 RECOVERY - Host ms-be2040 is UP: PING OK - Packet loss = 0%, RTA = 31.55 ms
[10:55:48] <wikibugs>	 (03PS1) 10Ayounsi: Remove QFX5120 hack [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/787709
[10:56:45] <wikibugs>	 (03PS4) 10Jbond: puppetdb: add query_facts function [puppet] - 10https://gerrit.wikimedia.org/r/787547
[10:58:04] <wikibugs>	 (03PS5) 10Jbond: C:ssh::publish_fingerprints: update to use new_query facts function [puppet] - 10https://gerrit.wikimedia.org/r/787548
[11:00:02] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35009/console" [puppet] - 10https://gerrit.wikimedia.org/r/787548 (owner: 10Jbond)
[11:01:25] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be1040 is OK: OK - load average: 13.18, 4.15, 1.47 https://wikitech.wikimedia.org/wiki/Swift
[11:01:27] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1040 is CRITICAL: CRITICAL - degraded: The following units failed: swift-container-reconciler.service,swift-container-sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[11:02:17] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1064 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:04:24] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM!" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/787709 (owner: 10Ayounsi)
[11:05:21] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Enable new template dialog sidebar everywhere except enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787699 (https://phabricator.wikimedia.org/T296759) (owner: 10Awight)
[11:08:43] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2040 is CRITICAL: CRITICAL - degraded: The following units failed: swift-container-reconciler.service,swift-container-sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:13:53] <wikibugs>	 10SRE, 10SRE-swift-storage, 10Patch-For-Review: 'swift' user/group IDs should be consistent across the fleet - https://phabricator.wikimedia.org/T123918 (10MatthewVernon)
[11:14:22] <wikibugs>	 10SRE, 10SRE-swift-storage: reimaging swift backends should set swift UID/GID to match filesystems - https://phabricator.wikimedia.org/T300057 (10MatthewVernon) 05Resolved→03Open This doesn't work, because while busybox seems to have `stat` the installer can't find it?!? ` Apr 29 10:05:33 log-output: + mkt...
[11:15:26] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Remove QFX5120 hack [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/787709 (owner: 10Ayounsi)
[11:16:04] <wikibugs>	 (03Merged) 10jenkins-bot: Remove QFX5120 hack [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/787709 (owner: 10Ayounsi)
[11:19:14] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] C:ssh::publish_fingerprints: update to use new_query facts function [puppet] - 10https://gerrit.wikimedia.org/r/787548 (owner: 10Jbond)
[11:19:19] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppetdb: add query_facts function [puppet] - 10https://gerrit.wikimedia.org/r/787547 (owner: 10Jbond)
[11:22:45] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[11:24:19] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q2): ONFIRE Q2 Undisclosed incidents scoring (NDA) - https://phabricator.wikimedia.org/T307202 (10jcrespo)
[11:24:46] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q2): ONFIRE Q2 Undisclosed incidents scoring (NDA) - https://phabricator.wikimedia.org/T307202 (10jcrespo)
[11:28:51] <wikibugs>	 (03PS1) 10Jbond: puppetdb::query_facts: add return type hint [puppet] - 10https://gerrit.wikimedia.org/r/787713
[11:29:18] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q2): ONFIRE Q2 Undisclosed incidents scoring (NDA) - https://phabricator.wikimedia.org/T307202 (10jcrespo)
[11:30:17] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1063 is CRITICAL: CRITICAL - degraded: The following units failed: session-325816.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:30:42] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppetdb::query_facts: add return type hint [puppet] - 10https://gerrit.wikimedia.org/r/787713 (owner: 10Jbond)
[11:33:11] <icinga-wm>	 PROBLEM - Disk space on ms-be1040 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdc1 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ms-be1040&var-datasource=eqiad+prometheus/ops
[11:34:18] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q2): Incident: 2021-10-14_Heavy_outbound_traffic - https://phabricator.wikimedia.org/T307149 (10jcrespo)
[11:36:25] <wikibugs>	 (03PS1) 10Jbond: Revert "C:ssh::publish_fingerprints: update to use new_query fac..." [puppet] - 10https://gerrit.wikimedia.org/r/787740
[11:36:28] <wikibugs>	 (03PS1) 10Jbond: Revert "puppetdb: add query_facts function" [puppet] - 10https://gerrit.wikimedia.org/r/787741
[11:36:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Update d-i setting to not prompt for firmware [puppet] - 10https://gerrit.wikimedia.org/r/787704 (owner: 10Muehlenhoff)
[11:36:57] <wikibugs>	 (03PS2) 10Jbond: Revert "C:ssh::publish_fingerprints: update to use new_query fac..." [puppet] - 10https://gerrit.wikimedia.org/r/787740
[11:37:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "puppetdb: add query_facts function" [puppet] - 10https://gerrit.wikimedia.org/r/787741 (owner: 10Jbond)
[11:37:11] <wikibugs>	 (03Abandoned) 10Jbond: Revert "puppetdb: add query_facts function" [puppet] - 10https://gerrit.wikimedia.org/r/787741 (owner: 10Jbond)
[11:37:40] <wikibugs>	 (03CR) 10David Caro: Remove the puppet lvm module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787708 (https://phabricator.wikimedia.org/T67270) (owner: 10Alexandros Kosiaris)
[11:37:45] <jinxer-wm>	 (Storage /var over 50%) firing: Alert for device asw1-b12-drmrs.mgmt.drmrs.wmnet - Storage /var over 50%   - https://alerts.wikimedia.org/?q=alertname%3DStorage+%2Fvar+over+50%25
[11:38:07] <wikibugs>	 (03CR) 10David Caro: lvm::volume: add createonly flag and use in cinder backups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[11:39:53] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] Revert "C:ssh::publish_fingerprints: update to use new_query fac..." [puppet] - 10https://gerrit.wikimedia.org/r/787740 (owner: 10Jbond)
[11:40:54] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] auto_schema: Wrap starting replication with finally [software] - 10https://gerrit.wikimedia.org/r/775895 (owner: 10Ladsgroup)
[11:41:26] <wikibugs>	 (03Merged) 10jenkins-bot: auto_schema: Wrap starting replication with finally [software] - 10https://gerrit.wikimedia.org/r/775895 (owner: 10Ladsgroup)
[11:42:45] <jinxer-wm>	 (Storage /var over 50%) firing: (2) Device asw1-b12-drmrs.mgmt.drmrs.wmnet recovered from Storage /var over 50%   - https://alerts.wikimedia.org/?q=alertname%3DStorage+%2Fvar+over+50%25
[11:43:50] <wikibugs>	 (03CR) 10David Caro: lvm::volume: add createonly flag and use in cinder backups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[11:47:45] <jinxer-wm>	 (Storage /var over 50%) resolved: Device asw1-b13-drmrs.mgmt.drmrs.wmnet recovered from Storage /var over 50%   - https://alerts.wikimedia.org/?q=alertname%3DStorage+%2Fvar+over+50%25
[11:49:06] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host gitlab1003.mgmt.eqiad.wmnet with reboot policy FORCED
[11:49:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:49:53] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host gitlab1004.mgmt.eqiad.wmnet with reboot policy FORCED
[11:49:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:44] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host gitlab-runner1002.mgmt.eqiad.wmnet with reboot policy FORCED
[11:50:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[11:51:19] <logmsgbot>	 !log dcaro@cumin1001 START - Cookbook sre.hosts.reimage for host cloudnet2005-dev.codfw.wmnet with OS bullseye
[11:51:24] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host gitlab-runner1003.mgmt.eqiad.wmnet with reboot policy FORCED
[11:51:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:51:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:51:29] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by dcaro@cumin1001 for host cloudnet2005-dev.codfw.wmne...
[11:56:49] <wikibugs>	 (03PS1) 10Jbond: puppetdb::query_facts: try to optimize [puppet] - 10https://gerrit.wikimedia.org/r/787716
[11:57:17] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gitlab1003.mgmt.eqiad.wmnet with reboot policy FORCED
[11:57:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppetdb::query_facts: try to optimize [puppet] - 10https://gerrit.wikimedia.org/r/787716 (owner: 10Jbond)
[11:58:13] <icinga-wm>	 RECOVERY - SSH on pki2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:59:15] <wikibugs>	 (03PS2) 10Jbond: puppetdb::query_facts: try to optimize [puppet] - 10https://gerrit.wikimedia.org/r/787716
[12:00:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppetdb::query_facts: try to optimize [puppet] - 10https://gerrit.wikimedia.org/r/787716 (owner: 10Jbond)
[12:01:43] <wikibugs>	 (03PS2) 10David Caro: Move from deprecated icinga_hosts to alerting_hosts [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/786960 (https://phabricator.wikimedia.org/T304533)
[12:02:23] <wikibugs>	 (03CR) 10David Caro: Move from deprecated icinga_hosts to alerting_hosts (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/786960 (https://phabricator.wikimedia.org/T304533) (owner: 10David Caro)
[12:02:37] <wikibugs>	 (03PS3) 10David Caro: Move from deprecated icinga_hosts to alerting_hosts [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/786960 (https://phabricator.wikimedia.org/T304533)
[12:03:05] <wikibugs>	 (03PS1) 10MVernon: install_server install coreutils so we have stat available [puppet] - 10https://gerrit.wikimedia.org/r/787717 (https://phabricator.wikimedia.org/T300057)
[12:03:59] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:04:04] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gitlab1004.mgmt.eqiad.wmnet with reboot policy FORCED
[12:04:08] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gitlab-runner1002.mgmt.eqiad.wmnet with reboot policy FORCED
[12:04:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:04:12] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gitlab-runner1003.mgmt.eqiad.wmnet with reboot policy FORCED
[12:04:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:04:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:04:21] <icinga-wm>	 PROBLEM - SSH on wtp1045.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:05:31] <wikibugs>	 (03PS1) 10Aqu: Update analytics refine job version in test cluster [puppet] - 10https://gerrit.wikimedia.org/r/787718
[12:09:08] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA: db1164 fails to POST/boot/etc - https://phabricator.wikimedia.org/T307198 (10Cmjohnson) This is the sign of a failed DIMM, during post it's failing during the checking memory phase.   I attempted to reboot the system to "self-heal" but that failed, The SEL shows. I will request a D...
[12:09:51] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10Cmjohnson)
[12:09:59] <logmsgbot>	 !log dcaro@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
[12:10:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:12:37] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host restbase-dev2001.codfw.wmnet
[12:12:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:12:53] <logmsgbot>	 !log dcaro@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
[12:12:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:40] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti-test2003.codfw.wmnet with OS bullseye
[12:14:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:47] <wikibugs>	 10SRE: Upgrade ganeti-test to Bullseye - https://phabricator.wikimedia.org/T306499 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti-test2003.codfw.wmnet with OS bullseye
[12:16:31] <wikibugs>	 10SRE: phedenskog uses the same SSH key(s) in WMCS and production - https://phabricator.wikimedia.org/T307079 (10Peter) Thanks @fgiunchedi and @Dzahn I'll do that first thing next week.
[12:16:52] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase-dev2001.codfw.wmnet
[12:16:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:19:21] <wikibugs>	 (03PS3) 10Jbond: puppetdb::query_facts: try to optimize [puppet] - 10https://gerrit.wikimedia.org/r/787716
[12:21:28] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppetdb::query_facts: try to optimize [puppet] - 10https://gerrit.wikimedia.org/r/787716 (owner: 10Jbond)
[12:22:49] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/787717 (https://phabricator.wikimedia.org/T300057) (owner: 10MVernon)
[12:23:53] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host restbase-dev2002.codfw.wmnet
[12:23:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:16] <wikibugs>	 (03PS1) 10Jbond: puppetdb::query_facts: update return type [puppet] - 10https://gerrit.wikimedia.org/r/787719
[12:25:38] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] puppetdb::query_facts: update return type [puppet] - 10https://gerrit.wikimedia.org/r/787719 (owner: 10Jbond)
[12:27:43] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase-dev2002.codfw.wmnet
[12:27:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:27:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[12:28:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[12:28:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:28:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1147 (T306560)', diff saved to https://phabricator.wikimedia.org/P27010 and previous config saved to /var/cache/conftool/dbconfig/20220429-122805-ladsgroup.json
[12:28:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:28:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:28:15] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[12:28:27] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Remove the puppet lvm module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787708 (https://phabricator.wikimedia.org/T67270) (owner: 10Alexandros Kosiaris)
[12:29:01] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: host reimage
[12:29:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:29:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Two non blocking nits, LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/787717 (https://phabricator.wikimedia.org/T300057) (owner: 10MVernon)
[12:30:59] <logmsgbot>	 !log dcaro@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2005-dev.codfw.wmnet with OS bullseye
[12:31:02] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Heh, I was just reminded of this as well. https://tickets.puppetlabs.com/browse/MODULES-5307" [puppet] - 10https://gerrit.wikimedia.org/r/787708 (https://phabricator.wikimedia.org/T67270) (owner: 10Alexandros Kosiaris)
[12:31:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:31:09] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dcaro@cumin1001 for host cloudnet2005-dev.codfw.wmnet wi...
[12:32:20] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] lvm::volume: add createonly flag and use in cinder backups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[12:32:30] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: host reimage
[12:32:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:33:29] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host restbase-dev2003.codfw.wmnet
[12:33:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:37:28] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase-dev2003.codfw.wmnet
[12:37:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:37:41] <wikibugs>	 (03CR) 10MVernon: install_server install coreutils so we have stat available (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787717 (https://phabricator.wikimedia.org/T300057) (owner: 10MVernon)
[12:40:44] <wikibugs>	 10SRE: Internal PKI for secure communication - Barcelona Ops offsite 2016 - https://phabricator.wikimedia.org/T150822 (10fgiunchedi)
[12:40:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T306560)', diff saved to https://phabricator.wikimedia.org/P27011 and previous config saved to /var/cache/conftool/dbconfig/20220429-124056-ladsgroup.json
[12:40:58] <wikibugs>	 10SRE: Puppet CA rollover - https://phabricator.wikimedia.org/T150823 (10fgiunchedi) 05Open→03Invalid IIRC we did a "rollover" (extend certificate expiration, keeping the private key) for puppet CA, resolving this old task
[12:41:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:03] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[12:41:06] <wikibugs>	 (03PS2) 10MVernon: install_server install coreutils so we have stat available [puppet] - 10https://gerrit.wikimedia.org/r/787717 (https://phabricator.wikimedia.org/T300057)
[12:43:33] <wikibugs>	 10SRE: Run systematic 2FA availability tests - https://phabricator.wikimedia.org/T151049 (10fgiunchedi)
[12:44:27] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2003.codfw.wmnet with OS bullseye
[12:44:31] <wikibugs>	 10SRE: Upgrade ganeti-test to Bullseye - https://phabricator.wikimedia.org/T306499 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti-test2003.codfw.wmnet with OS bullseye completed: - ganeti-test2003 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled P...
[12:44:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:46:56] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10dcaro) Adding a comment here for visibility and for the future :) the cloudnet hosts should have been prepared with 'insetup...
[12:47:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] install_server install coreutils so we have stat available (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787717 (https://phabricator.wikimedia.org/T300057) (owner: 10MVernon)
[12:47:17] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10dcaro)
[12:47:51] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10dcaro)
[12:48:40] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-2] lvm::volume: add createonly flag and use in cinder backups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[12:49:16] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
[12:49:17] <wikibugs>	 10SRE: Extending Yubico 2FA for production use (meta bug) - https://phabricator.wikimedia.org/T151045 (10MoritzMuehlenhoff) 05Open→03Declined This can be closed, the old Yubikey-specific setup has been shutdown, partly replaced by the CAS setup.
[12:49:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:22] <wikibugs>	 10SRE: logrotate failing with $FILE.1.gz: File exists - https://phabricator.wikimedia.org/T151314 (10fgiunchedi) 05Open→03Declined I couldn't find any recent instance of this problem, boldly resolving
[12:49:26] <wikibugs>	 10SRE, 10Patch-For-Review, 10Tracking-Neverending: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10fgiunchedi)
[12:49:52] <wikibugs>	 (03CR) 10David Caro: Remove the puppet lvm module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787708 (https://phabricator.wikimedia.org/T67270) (owner: 10Alexandros Kosiaris)
[12:51:08] <wikibugs>	 10SRE, 10ops-esams, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (10ayounsi) I believe the anchors are linked to Faidon's RIPE account. @faidon, could you take care of it?
[12:51:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[12:51:41] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[12:51:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:51:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1168 (T298295)', diff saved to https://phabricator.wikimedia.org/P27012 and previous config saved to /var/cache/conftool/dbconfig/20220429-125146-ladsgroup.json
[12:51:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:51:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:51:56] <stashbot>	 T298295: Fix length of columns page_restrictions.pr_level/pr_type on wmf wikis - https://phabricator.wikimedia.org/T298295
[12:51:56] <wikibugs>	 10SRE: Run systematic 2FA availability tests - https://phabricator.wikimedia.org/T151049 (10MoritzMuehlenhoff) 05Open→03Declined This can be closed, the old Yubikey-specific setup has been shutdown
[12:51:58] <wikibugs>	 10SRE: Extending Yubico 2FA for production use (meta bug) - https://phabricator.wikimedia.org/T151045 (10MoritzMuehlenhoff)
[12:52:27] <wikibugs>	 10SRE, 10Documentation: Proper documentation for Yubico 2FA for production use - https://phabricator.wikimedia.org/T151050 (10MoritzMuehlenhoff) 05Open→03Declined This can be closed, the old Yubikey-specific setup has been shutdown
[12:52:29] <wikibugs>	 10SRE: Extending Yubico 2FA for production use (meta bug) - https://phabricator.wikimedia.org/T151045 (10MoritzMuehlenhoff)
[12:52:31] <wikibugs>	 10SRE: Fully puppetise yubikey-val - https://phabricator.wikimedia.org/T151046 (10MoritzMuehlenhoff) 05Open→03Declined This can be closed, the old Yubikey-specific setup has been shutdown
[12:52:33] <wikibugs>	 10SRE: Extending Yubico 2FA for production use (meta bug) - https://phabricator.wikimedia.org/T151045 (10MoritzMuehlenhoff)
[12:52:45] <wikibugs>	 10SRE: Extending Yubico 2FA for production use (meta bug) - https://phabricator.wikimedia.org/T151045 (10MoritzMuehlenhoff)
[12:53:07] <wikibugs>	 10SRE: Integrate Yubikey into data.yaml - https://phabricator.wikimedia.org/T151047 (10MoritzMuehlenhoff) 05Open→03Declined This can be closed, the old Yubikey-specific setup has been shutdown
[12:53:21] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
[12:53:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:56:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P27013 and previous config saved to /var/cache/conftool/dbconfig/20220429-125601-ladsgroup.json
[12:56:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:57:17] <wikibugs>	 (03CR) 10David Caro: lvm::volume: add createonly flag and use in cinder backups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[12:58:05] <wikibugs>	 (03CR) 10Muehlenhoff: "On a related note: The current plan is to no longer apply a default license to puppet.git at large, but rather approach this on a per modu" [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[13:00:38] <wikibugs>	 (03CR) 10Alexandros Kosiaris: lvm::volume: add createonly flag and use in cinder backups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[13:01:35] <wikibugs>	 10SRE: stat user crontab on stat hosts for old file removal - https://phabricator.wikimedia.org/T151317 (10fgiunchedi) 05Open→03Declined It looks like we got rid of these crons over time, resolving but let's reopen if this comes back
[13:01:39] <wikibugs>	 10SRE, 10Patch-For-Review, 10Tracking-Neverending: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10fgiunchedi)
[13:05:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298295)', diff saved to https://phabricator.wikimedia.org/P27014 and previous config saved to /var/cache/conftool/dbconfig/20220429-130556-ladsgroup.json
[13:06:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:03] <stashbot>	 T298295: Fix length of columns page_restrictions.pr_level/pr_type on wmf wikis - https://phabricator.wikimedia.org/T298295
[13:06:50] <wikibugs>	 10SRE, 10serviceops: Provide node14 images for running production node-based services - https://phabricator.wikimedia.org/T306996 (10Esanders) > Within days of an LTS reaching EOL major nodejs libraries will be looking to remove support for it from their releases.  Indeed, and many don't even wait for the EOL...
[13:07:48] <wikibugs>	 10SRE: Add Prometheus collector for Tor - https://phabricator.wikimedia.org/T188098 (10fgiunchedi) 05Open→03Declined We're not running a Tor relay at least since Ib59edbb8e, resolving
[13:09:22] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1033 is CRITICAL: CRITICAL - degraded: The following units failed: session-326228.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:11:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P27015 and previous config saved to /var/cache/conftool/dbconfig/20220429-131106-ladsgroup.json
[13:11:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:15] <wikibugs>	 10SRE, 10serviceops: Provide node14 images for running production node-based services - https://phabricator.wikimedia.org/T306996 (10MoritzMuehlenhoff) >> We can import the nodesource packages into separate repository components, e.g. thirdparty/node14 and thirdparty/node16.  This way applications have the fle...
[13:18:46] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "+1 as we are all agreeing and the actual Cr looks fine" [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[13:21:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27016 and previous config saved to /var/cache/conftool/dbconfig/20220429-132101-ladsgroup.json
[13:21:02] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
[13:21:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:21:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:16] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2040 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:25:54] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
[13:25:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T306560)', diff saved to https://phabricator.wikimedia.org/P27017 and previous config saved to /var/cache/conftool/dbconfig/20220429-132611-ladsgroup.json
[13:26:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
[13:26:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
[13:26:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:18] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[13:26:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1138 (T306560)', diff saved to https://phabricator.wikimedia.org/P27018 and previous config saved to /var/cache/conftool/dbconfig/20220429-132619-ladsgroup.json
[13:26:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:55] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
[13:27:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:31:25] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] lvm::volume: add createonly flag and use in cinder backups [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117) (owner: 10David Caro)
[13:31:53] <wikibugs>	 (03PS5) 10David Caro: lvm::volume: add createonly flag and use in cinder backups [puppet] - 10https://gerrit.wikimedia.org/r/787527 (https://phabricator.wikimedia.org/T307117)
[13:32:44] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
[13:32:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:33:13] <icinga-wm>	 ACKNOWLEDGEMENT - Router interfaces on cr3-knams is CRITICAL: CRITICAL: host 91.198.174.246, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: ayounsi T307121 - The acknowledgement expires at: 2022-05-04 13:32:47. https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:35:05] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
[13:35:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27019 and previous config saved to /var/cache/conftool/dbconfig/20220429-133606-ladsgroup.json
[13:36:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:56] <wikibugs>	 (03PS18) 10Herron: prometheus: enable prometheus web access via proxy with IDP [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944)
[13:37:47] <wikibugs>	 (03CR) 10Herron: "Thanks for the reviews! I think we're in a good place to try a deploy so I'll plan to do that early next week" [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944) (owner: 10Herron)
[13:41:44] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
[13:41:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:42:11] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[13:42:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:43:06] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
[13:43:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:46] <wikibugs>	 10SRE: Update prometheus-varnish-exporter on debian to 1.4 - https://phabricator.wikimedia.org/T195252 (10fgiunchedi) 05Open→03Invalid Debian ships with prometheus-varnish-exporter 1.6 nowadays
[13:46:49] <wikibugs>	 (03CR) 10MVernon: install_server install coreutils so we have stat available (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787717 (https://phabricator.wikimedia.org/T300057) (owner: 10MVernon)
[13:48:14] <wikibugs>	 10SRE: Upgrade Ganeti clusters to 2.15.2-7+deb9u3 - https://phabricator.wikimedia.org/T210289 (10fgiunchedi) 05Open→03Invalid Minimum ganeti version on the fleet is `2.16.0-5` nowadays, resolving
[13:49:03] <wikibugs>	 (03PS1) 10JMeybohm: Add -ro and -rw discovery records for k8s-ingress-wikikube [dns] - 10https://gerrit.wikimedia.org/r/787747 (https://phabricator.wikimedia.org/T305358)
[13:49:10] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/787717 (https://phabricator.wikimedia.org/T300057) (owner: 10MVernon)
[13:49:45] <wikibugs>	 (03PS1) 10JMeybohm: Add -ro and -rw discovery records for k8s-ingress-wikikube [puppet] - 10https://gerrit.wikimedia.org/r/787748 (https://phabricator.wikimedia.org/T305358)
[13:50:35] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] install_server install coreutils so we have stat available [puppet] - 10https://gerrit.wikimedia.org/r/787717 (https://phabricator.wikimedia.org/T300057) (owner: 10MVernon)
[13:51:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298295)', diff saved to https://phabricator.wikimedia.org/P27020 and previous config saved to /var/cache/conftool/dbconfig/20220429-135111-ladsgroup.json
[13:51:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[13:51:16] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[13:51:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1180 (T298295)', diff saved to https://phabricator.wikimedia.org/P27021 and previous config saved to /var/cache/conftool/dbconfig/20220429-135121-ladsgroup.json
[13:51:22] <stashbot>	 T298295: Fix length of columns page_restrictions.pr_level/pr_type on wmf wikis - https://phabricator.wikimedia.org/T298295
[13:51:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:52:11] <wikibugs>	 (03PS1) 10Luke Bowmaker: Image Suggestions Feedback [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749
[13:52:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Image Suggestions Feedback [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749 (owner: 10Luke Bowmaker)
[13:53:25] <wikibugs>	 10SRE: /dev/log symlink to /run/systemd/journal/dev-log disappeared on kubernetes1001 - https://phabricator.wikimedia.org/T212681 (10fgiunchedi) 05Open→03Declined I couldn't find any more occurrences of this, tentatively resolving
[13:53:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298295)', diff saved to https://phabricator.wikimedia.org/P27022 and previous config saved to /var/cache/conftool/dbconfig/20220429-135329-ladsgroup.json
[13:53:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1138 (T306560)', diff saved to https://phabricator.wikimedia.org/P27023 and previous config saved to /var/cache/conftool/dbconfig/20220429-135511-ladsgroup.json
[13:55:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:19] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[13:55:45] <wikibugs>	 10SRE: Create cookbook to add a node to a Ganeti cluster - https://phabricator.wikimedia.org/T274527 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff The cookbook has been written and is in active use for several months, various fine-tuning has been done to it (and will continue to get applied...
[13:55:47] <wikibugs>	 10SRE: Cookbooks for Ganeti maintenance tasks - https://phabricator.wikimedia.org/T283319 (10MoritzMuehlenhoff)
[13:56:25] <wikibugs>	 (03PS2) 10Luke Bowmaker: Image Suggestions Feedback Stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749
[13:56:50] <wikibugs>	 10SRE, 10WMF-JobQueue, 10serviceops, 10Sustainability (Incident Followup): Videoscalers fail health checks while CPU is maxed - https://phabricator.wikimedia.org/T306860 (10akosiaris) > As a starting point: @jhathaway noted that we're running ffmpeg at niceness -19, which is quite assertive; raising that v...
[13:56:50] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reimage for host ms-be2041.codfw.wmnet with OS bullseye
[13:56:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Image Suggestions Feedback Stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749 (owner: 10Luke Bowmaker)
[13:56:54] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1001 for host ms-be2041.codfw.wmnet with OS bullseye
[13:56:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:19] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Add -ro and -rw discovery records for k8s-ingress-wikikube [dns] - 10https://gerrit.wikimedia.org/r/787747 (https://phabricator.wikimedia.org/T305358) (owner: 10JMeybohm)
[13:57:24] <wikibugs>	 10SRE, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi @Dzahn @Papaul I'm tentatively resolving this old task, most/all hosts mentioned here have been decom'd
[13:57:26] <wikibugs>	 10SRE: IPMI Audit 2018-04 - https://phabricator.wikimedia.org/T193155 (10fgiunchedi)
[13:57:29] <wikibugs>	 10SRE, 10Documentation: Document how to fix IPMI issues on Wikitech - https://phabricator.wikimedia.org/T191956 (10fgiunchedi)
[13:57:33] <wikibugs>	 10SRE, 10observability: Remote IPMI doesn't work for ~2% of the fleet - https://phabricator.wikimedia.org/T150160 (10fgiunchedi)
[13:57:49] <wikibugs>	 (03PS3) 10Luke Bowmaker: Image Suggestions Feedback Stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749
[13:57:56] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Add -ro and -rw discovery records for k8s-ingress-wikikube [puppet] - 10https://gerrit.wikimedia.org/r/787748 (https://phabricator.wikimedia.org/T305358) (owner: 10JMeybohm)
[13:58:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Image Suggestions Feedback Stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749 (owner: 10Luke Bowmaker)
[14:00:51] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "fleet wide PCC output indeed identified just cloudbackups breaking. I 'll try to fix the cloud backup stuff in a separate change." [puppet] - 10https://gerrit.wikimedia.org/r/787708 (https://phabricator.wikimedia.org/T67270) (owner: 10Alexandros Kosiaris)
[14:01:45] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Add -ro and -rw discovery records for k8s-ingress-wikikube [puppet] - 10https://gerrit.wikimedia.org/r/787748 (https://phabricator.wikimedia.org/T305358) (owner: 10JMeybohm)
[14:02:10] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA: db1164 fails to POST/boot/etc - https://phabricator.wikimedia.org/T307198 (10fgiunchedi) p:05Triage→03Medium
[14:02:19] <wikibugs>	 (03PS4) 10Luke Bowmaker: Image Suggestions Feedback Stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749
[14:03:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Image Suggestions Feedback Stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749 (owner: 10Luke Bowmaker)
[14:04:06] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:05:44] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:06:13] <wikibugs>	 10SRE, 10Security: Disable agent forwarding to important hosts - https://phabricator.wikimedia.org/T198138 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi AFAICS we're disabling agent forwarding across the board (production and wmcs), resolving
[14:08:26] <wikibugs>	 10SRE, 10Security: Disable agent forwarding to important hosts - https://phabricator.wikimedia.org/T198138 (10MoritzMuehlenhoff) 05Resolved→03Open No, it's still enabled in Cloud VPS.
[14:08:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P27024 and previous config saved to /var/cache/conftool/dbconfig/20220429-140834-ladsgroup.json
[14:08:38] <wikibugs>	 (03PS1) 10JMeybohm: Add desired state for k8s-ingress-wikikube -ro and -rw discovery records [puppet] - 10https://gerrit.wikimedia.org/r/787750 (https://phabricator.wikimedia.org/T305358)
[14:08:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P27025 and previous config saved to /var/cache/conftool/dbconfig/20220429-141017-ladsgroup.json
[14:10:22] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Add -ro and -rw discovery records for k8s-ingress-wikikube [dns] - 10https://gerrit.wikimedia.org/r/787747 (https://phabricator.wikimedia.org/T305358) (owner: 10JMeybohm)
[14:10:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:12:18] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) I am now investigating by capturing network traffic from the eventgate-analytics-external pods and looking...
[14:16:20] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2041.codfw.wmnet with reason: host reimage
[14:16:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[14:16:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[14:16:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1098:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P27026 and previous config saved to /var/cache/conftool/dbconfig/20220429-141633-ladsgroup.json
[14:16:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:43] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[14:17:36] <wikibugs>	 10SRE, 10ops-esams, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (10cmooney) From the experience with the one in codfw I think the process is to delete and then re-add.  If @faidon can remove our existing one we can take care of...
[14:19:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P27027 and previous config saved to /var/cache/conftool/dbconfig/20220429-141902-ladsgroup.json
[14:19:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:09] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2041.codfw.wmnet with reason: host reimage
[14:19:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:18] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Add desired state for k8s-ingress-wikikube -ro and -rw discovery records [puppet] - 10https://gerrit.wikimedia.org/r/787750 (https://phabricator.wikimedia.org/T305358) (owner: 10JMeybohm)
[14:21:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[14:21:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[14:21:34] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[14:21:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:38] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[14:21:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27028 and previous config saved to /var/cache/conftool/dbconfig/20220429-142142-ladsgroup.json
[14:21:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:56] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[14:23:38] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on authdns1001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:23:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P27029 and previous config saved to /var/cache/conftool/dbconfig/20220429-142339-ladsgroup.json
[14:23:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:54] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns2002 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:23:56] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns4002 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:25:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P27030 and previous config saved to /var/cache/conftool/dbconfig/20220429-142523-ladsgroup.json
[14:25:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:49] <jayme>	 confd template stuff is me
[14:26:26] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns4002 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:28:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27031 and previous config saved to /var/cache/conftool/dbconfig/20220429-142806-ladsgroup.json
[14:28:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:28:13] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[14:29:08] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns1001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:29:10] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns2001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:29:42] <logmsgbot>	 !log jayme@cumin1001 conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-wikikube-ro
[14:29:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:30:38] <wikibugs>	 10ops-drmrs, 10ops-esams, 10Infrastructure-Foundations, 10netops: drmrs-esams wave provisioning - https://phabricator.wikimedia.org/T307221 (10ayounsi)
[14:31:44] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns1001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:31:46] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns2001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:31:46] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns4001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:31:57] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.dns.netbox
[14:32:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:32:27] <wikibugs>	 10ops-drmrs, 10ops-esams, 10Infrastructure-Foundations, 10netops: drmrs-esams wave provisioning - https://phabricator.wikimedia.org/T307221 (10ayounsi)
[14:32:33] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2041.codfw.wmnet with OS bullseye
[14:32:36] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1001 for host ms-be2041.codfw.wmnet with OS bullseye completed: - ms-be2041 (**PASS**)   - Downtim...
[14:32:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P27032 and previous config saved to /var/cache/conftool/dbconfig/20220429-143407-ladsgroup.json
[14:34:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:32] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns4001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:34:32] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns6001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:35:04] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:35:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:33] <wikibugs>	 10SRE, 10SRE-swift-storage, 10Patch-For-Review: 'swift' user/group IDs should be consistent across the fleet - https://phabricator.wikimedia.org/T123918 (10MatthewVernon)
[14:36:46] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on authdns2001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:37:01] <wikibugs>	 10SRE, 10SRE-swift-storage: reimaging swift backends should set swift UID/GID to match filesystems - https://phabricator.wikimedia.org/T300057 (10MatthewVernon) 05Open→03Resolved Revised version (using coreutils' `stat` from  `/target`) worked with ms-be2041.
[14:37:08] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns6001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[14:38:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298295)', diff saved to https://phabricator.wikimedia.org/P27033 and previous config saved to /var/cache/conftool/dbconfig/20220429-143844-ladsgroup.json
[14:38:46] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[14:38:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[14:38:49] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[14:38:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:51] <stashbot>	 T298295: Fix length of columns page_restrictions.pr_level/pr_type on wmf wikis - https://phabricator.wikimedia.org/T298295
[14:38:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[14:38:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1165 (T298295)', diff saved to https://phabricator.wikimedia.org/P27034 and previous config saved to /var/cache/conftool/dbconfig/20220429-143857-ladsgroup.json
[14:39:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:00] <wikibugs>	 10SRE: DNS repo: add Jenkins job to ensure there are no duplicates - https://phabricator.wikimedia.org/T155761 (10fgiunchedi) AFAICT nowadays `zone_validator.py` will fail on duplicate records. Ok to resolve this @Volans ?
[14:39:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:24] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on authdns2001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:39:40] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns1002 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:39:42] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns6002 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:40:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1138 (T306560)', diff saved to https://phabricator.wikimedia.org/P27035 and previous config saved to /var/cache/conftool/dbconfig/20220429-144028-ladsgroup.json
[14:40:30] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[14:40:31] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[14:40:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:36] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[14:40:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:58] <wikibugs>	 (03PS1) 10JMeybohm: Add -ro and -rw discovery records for k8s-ingress-wikikube [puppet] - 10https://gerrit.wikimedia.org/r/787752 (https://phabricator.wikimedia.org/T305358)
[14:41:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298295)', diff saved to https://phabricator.wikimedia.org/P27036 and previous config saved to /var/cache/conftool/dbconfig/20220429-144105-ladsgroup.json
[14:41:08] <wikibugs>	 10SRE, 10ops-esams: wipe backup-array1 - https://phabricator.wikimedia.org/T237041 (10fgiunchedi) @Papaul @robh I couldn't find any trace of this host in netbox, has it been decom'd and thus we can close the task ?
[14:41:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:55] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Add -ro and -rw discovery records for k8s-ingress-wikikube [puppet] - 10https://gerrit.wikimedia.org/r/787752 (https://phabricator.wikimedia.org/T305358) (owner: 10JMeybohm)
[14:42:12] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns1002 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:42:14] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns3002 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:42:16] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns6002 is CRITICAL: File not found: /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:42:34] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:42:52] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:43:00] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on authdns2001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:06] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns4001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:06] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns6001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:08] <icinga-wm>	 PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:43:10] <logmsgbot>	 !log jayme@cumin1001 conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-wikikube-ro
[14:43:10] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on authdns1001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27037 and previous config saved to /var/cache/conftool/dbconfig/20220429-144311-ladsgroup.json
[14:43:12] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns2002 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:16] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns4002 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:16] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns4002 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:16] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns1002 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:22] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns6002 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:22] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns1001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:26] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns2001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:28] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns4001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:34] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns1001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:34] <logmsgbot>	 !log jayme@cumin1001 conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-wikikube-rw,name=eqiad
[14:43:36] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns2001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:43:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:56] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on authdns2001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:44:10] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns1002 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:44:12] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-ro.state on dns3002 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:44:12] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns6002 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:44:20] <icinga-wm>	 RECOVERY - Confd template for /var/lib/gdnsd/discovery-k8s-ingress-wikikube-rw.state on dns6001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring
[14:44:31] <jayme>	 sorry...
[14:44:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[14:44:49] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[14:44:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:09] <godog>	 no worries jayme, it happens
[14:48:23] <wikibugs>	 10SRE, 10ops-eqiad, 10Cassandra, 10DC-Ops, 10User-Eevans: Relocate hosts: aqs10[3-5] - https://phabricator.wikimedia.org/T307035 (10Eevans) @Cmjohnson if we wait until T305570 is complete, we should have the capacity to do it whenever is convenient -and provided we do it one machine at a time- without an...
[14:49:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P27038 and previous config saved to /var/cache/conftool/dbconfig/20220429-144912-ladsgroup.json
[14:49:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[14:49:43] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[14:49:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1166 (T306560)', diff saved to https://phabricator.wikimedia.org/P27039 and previous config saved to /var/cache/conftool/dbconfig/20220429-144947-ladsgroup.json
[14:49:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:57] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[14:50:37] <wikibugs>	 10SRE: Upgrade ganeti-test to Bullseye - https://phabricator.wikimedia.org/T306499 (10MoritzMuehlenhoff)
[14:51:42] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[14:51:43] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[14:51:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3314 (T306560)', diff saved to https://phabricator.wikimedia.org/P27040 and previous config saved to /var/cache/conftool/dbconfig/20220429-145148-ladsgroup.json
[14:51:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:34] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host gitlab-runner1004.mgmt.eqiad.wmnet with reboot policy FORCED
[14:52:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:57] <wikibugs>	 10SRE: Upgrade ganeti-test to Bullseye - https://phabricator.wikimedia.org/T306499 (10MoritzMuehlenhoff) The following upgrade steps were done in the Ganeti test cluster for the 3.0 update:  We'll be keeping the "kvm:machine_version=pc-i440fx-2.8" KVM machine type which was applied as part of the buster update f...
[14:56:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P27041 and previous config saved to /var/cache/conftool/dbconfig/20220429-145610-ladsgroup.json
[14:56:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T306560)', diff saved to https://phabricator.wikimedia.org/P27042 and previous config saved to /var/cache/conftool/dbconfig/20220429-145730-ladsgroup.json
[14:57:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:37] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[14:58:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27043 and previous config saved to /var/cache/conftool/dbconfig/20220429-145816-ladsgroup.json
[14:58:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[15:02:15] <wikibugs>	 (03PS1) 10JMeybohm: Switch miscweb and datahub-gms to new discovery records [dns] - 10https://gerrit.wikimedia.org/r/787753 (https://phabricator.wikimedia.org/T305358)
[15:04:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P27044 and previous config saved to /var/cache/conftool/dbconfig/20220429-150417-ladsgroup.json
[15:04:21] <wikibugs>	 (03PS1) 10Stang: zhwiki: Update zh-hans version tagline and wordmark files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787754 (https://phabricator.wikimedia.org/T276694)
[15:04:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:24] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[15:04:41] <wikibugs>	 10SRE, 10Traffic-Icebox: Create a second text-lb IP address for test purposes - https://phabricator.wikimedia.org/T237492 (10fgiunchedi) We have `test-lb` nowadays, ok to resolve @BBlack or there's sth missing ?
[15:04:46] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Switch miscweb and datahub-gms to new discovery records [dns] - 10https://gerrit.wikimedia.org/r/787753 (https://phabricator.wikimedia.org/T305358) (owner: 10JMeybohm)
[15:06:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T306560)', diff saved to https://phabricator.wikimedia.org/P27045 and previous config saved to /var/cache/conftool/dbconfig/20220429-150619-ladsgroup.json
[15:06:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:21] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gitlab-runner1004.mgmt.eqiad.wmnet with reboot policy FORCED
[15:07:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:35] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reimage for host ms-be2042.codfw.wmnet with OS bullseye
[15:07:38] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1001 for host ms-be2042.codfw.wmnet with OS bullseye
[15:07:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P27046 and previous config saved to /var/cache/conftool/dbconfig/20220429-151115-ladsgroup.json
[15:11:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:50] <wikibugs>	 (03PS1) 10JMeybohm: Remove k8s-ingress-wikikube.discovery.wmnet [dns] - 10https://gerrit.wikimedia.org/r/787756 (https://phabricator.wikimedia.org/T305358)
[15:12:09] <wikibugs>	 (03PS1) 10Cwhite: Revert "beta-logs: temporarily undefine cluster jobs_host" [puppet] - 10https://gerrit.wikimedia.org/r/787773
[15:12:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P27047 and previous config saved to /var/cache/conftool/dbconfig/20220429-151235-ladsgroup.json
[15:12:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:59] <wikibugs>	 10SRE, 10ops-esams: wipe backup-array1 - https://phabricator.wikimedia.org/T237041 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Resolving as per @Papaul this host is no longer  ` 15:06  <papaul> godog: hey looks like you are doing some tasks clean up thanks for that                  for T237041 i thin...
[15:13:21] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] Revert "beta-logs: temporarily undefine cluster jobs_host" [puppet] - 10https://gerrit.wikimedia.org/r/787773 (owner: 10Cwhite)
[15:13:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27048 and previous config saved to /var/cache/conftool/dbconfig/20220429-151321-ladsgroup.json
[15:13:25] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10Cmjohnson) >>! In T301177#7886110, @Dzahn wrote: > confirming that the "gitlab" hosts should use a public I...
[15:13:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:28] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:13:46] <wikibugs>	 (03PS3) 10Cwhite: opensearch: ensure curator is >=5.8.1 [puppet] - 10https://gerrit.wikimedia.org/r/787109 (https://phabricator.wikimedia.org/T301017)
[15:14:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[15:14:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[15:14:16] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[15:14:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[15:14:21] <wikibugs>	 (03CR) 10Cwhite: opensearch: ensure curator is >=5.8.1 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787109 (https://phabricator.wikimedia.org/T301017) (owner: 10Cwhite)
[15:14:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27049 and previous config saved to /var/cache/conftool/dbconfig/20220429-151424-ladsgroup.json
[15:14:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:58] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudcontrol100[6-7].wikimedia.org - https://phabricator.wikimedia.org/T306853 (10ayounsi) @Andrew, thanks, I'm still in my quest of reducing our public vlans usage  ;) Could those hosts use private IPs (li...
[15:15:02] <wikibugs>	 (03PS2) 10Stang: zhwiki: Update zh-hans version tagline and wordmark files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787754 (https://phabricator.wikimedia.org/T276694)
[15:15:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/787109 (https://phabricator.wikimedia.org/T301017) (owner: 10Cwhite)
[15:16:34] <wikibugs>	 (03PS1) 10JMeybohm: Remove k8s-ingress-wikikube.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/787757 (https://phabricator.wikimedia.org/T305358)
[15:16:44] <wikibugs>	 (03PS5) 10Luke Bowmaker: Image Suggestions Feedback Stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749
[15:19:13] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ayounsi) a:05ayounsi→03None
[15:20:16] <wikibugs>	 (03PS1) 10Cmjohnson: adding gitlab and gitlab runner hosts to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/787758 (https://phabricator.wikimedia.org/T301177)
[15:20:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27050 and previous config saved to /var/cache/conftool/dbconfig/20220429-152046-ladsgroup.json
[15:20:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:20:55] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:21:09] <wikibugs>	 (03CR) 10Cmjohnson: [C: 03+2] adding gitlab and gitlab runner hosts to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/787758 (https://phabricator.wikimedia.org/T301177) (owner: 10Cmjohnson)
[15:21:22] <wikibugs>	 (03CR) 10Luke Bowmaker: "Hi Andrew," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749 (owner: 10Luke Bowmaker)
[15:21:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P27051 and previous config saved to /var/cache/conftool/dbconfig/20220429-152124-ladsgroup.json
[15:21:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:21:49] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2042.codfw.wmnet with reason: host reimage
[15:21:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:22:41] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, and 2 others: Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10Cmjohnson)
[15:22:45] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[15:22:48] <wikibugs>	 (03PS2) 10JMeybohm: Remove k8s-ingress-wikikube.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/787757 (https://phabricator.wikimedia.org/T305358)
[15:22:50] <wikibugs>	 (03PS1) 10JMeybohm: trafficserver: Switch datahub to new k8s-ingress-wikikube discovery [puppet] - 10https://gerrit.wikimedia.org/r/787759 (https://phabricator.wikimedia.org/T305358)
[15:24:59] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[15:25:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:07] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2042.codfw.wmnet with reason: host reimage
[15:25:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:27] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] docker: ensure apparmor package is installed if on bullseye [puppet] - 10https://gerrit.wikimedia.org/r/785226 (owner: 10Dzahn)
[15:26:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298295)', diff saved to https://phabricator.wikimedia.org/P27052 and previous config saved to /var/cache/conftool/dbconfig/20220429-152620-ladsgroup.json
[15:26:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
[15:26:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
[15:26:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:27] <stashbot>	 T298295: Fix length of columns page_restrictions.pr_level/pr_type on wmf wikis - https://phabricator.wikimedia.org/T298295
[15:26:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1131 (T298295)', diff saved to https://phabricator.wikimedia.org/P27053 and previous config saved to /var/cache/conftool/dbconfig/20220429-152628-ladsgroup.json
[15:26:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:13] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] helmfile.d: add developer-portal [deployment-charts] - 10https://gerrit.wikimedia.org/r/773995 (https://phabricator.wikimedia.org/T297140) (owner: 10Majavah)
[15:27:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P27054 and previous config saved to /var/cache/conftool/dbconfig/20220429-152740-ladsgroup.json
[15:27:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:28:35] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] trafficserver: Switch datahub to new k8s-ingress-wikikube discovery [puppet] - 10https://gerrit.wikimedia.org/r/787759 (https://phabricator.wikimedia.org/T305358) (owner: 10JMeybohm)
[15:28:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298295)', diff saved to https://phabricator.wikimedia.org/P27055 and previous config saved to /var/cache/conftool/dbconfig/20220429-152836-ladsgroup.json
[15:28:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:28:42] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:28:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:06] <jynus>	 !log update NIC firmware for backup1002 T286722 T305446
[15:29:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:13] <stashbot>	 T305446: Upgrade backup* hosts to bullseye - https://phabricator.wikimedia.org/T305446
[15:29:13] <stashbot>	 T286722: Broadcom BCM57412 10G NIC and Bullseye installer - https://phabricator.wikimedia.org/T286722
[15:31:03] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "I fear we don't have the cycles to do a more thorough review of this one right now. Given the hackathon timeline we probably want to move " [deployment-charts] - 10https://gerrit.wikimedia.org/r/773994 (https://phabricator.wikimedia.org/T297140) (owner: 10Majavah)
[15:31:10] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] opensearch: ensure curator is >=5.8.1 [puppet] - 10https://gerrit.wikimedia.org/r/787109 (https://phabricator.wikimedia.org/T301017) (owner: 10Cwhite)
[15:31:35] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[15:31:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:40] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:34:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:35:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27056 and previous config saved to /var/cache/conftool/dbconfig/20220429-153551-ladsgroup.json
[15:35:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P27057 and previous config saved to /var/cache/conftool/dbconfig/20220429-153629-ladsgroup.json
[15:36:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:35] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
[15:36:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:21] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[15:37:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:46] <wikibugs>	 (03PS1) 10Cwhite: Revert "opensearch: ensure curator is >=5.8.1" [puppet] - 10https://gerrit.wikimedia.org/r/787774
[15:38:37] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] Revert "opensearch: ensure curator is >=5.8.1" [puppet] - 10https://gerrit.wikimedia.org/r/787774 (owner: 10Cwhite)
[15:39:37] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[15:39:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:41:20] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[15:41:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:41:45] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA: db1164 fails to POST/boot/etc - https://phabricator.wikimedia.org/T307198 (10Kormat) I've set the host to 'failed' in netbox: https://netbox.wikimedia.org/dcim/devices/2999/
[15:42:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T306560)', diff saved to https://phabricator.wikimedia.org/P27058 and previous config saved to /var/cache/conftool/dbconfig/20220429-154245-ladsgroup.json
[15:42:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[15:42:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[15:42:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:42:52] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[15:42:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1179 (T306560)', diff saved to https://phabricator.wikimedia.org/P27059 and previous config saved to /var/cache/conftool/dbconfig/20220429-154253-ladsgroup.json
[15:42:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P27060 and previous config saved to /var/cache/conftool/dbconfig/20220429-154341-ladsgroup.json
[15:43:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:45:15] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:45:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[15:50:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27061 and previous config saved to /var/cache/conftool/dbconfig/20220429-155057-ladsgroup.json
[15:51:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T306560)', diff saved to https://phabricator.wikimedia.org/P27062 and previous config saved to /var/cache/conftool/dbconfig/20220429-155134-ladsgroup.json
[15:51:36] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[15:51:37] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[15:51:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:41] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[15:51:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3314 (T306560)', diff saved to https://phabricator.wikimedia.org/P27063 and previous config saved to /var/cache/conftool/dbconfig/20220429-155142-ladsgroup.json
[15:51:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:53:29] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[15:53:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:43] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:56:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:58:43] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs-novastats-dnsleaks.py: make slightly better at handling codfw1dev (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/786307 (owner: 10Andrew Bogott)
[15:58:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P27064 and previous config saved to /var/cache/conftool/dbconfig/20220429-155846-ladsgroup.json
[15:58:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:21] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[15:59:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:47] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-novastats/wmcs-novastats-dnsleaks.py: minor fix to .svc exclusion [puppet] - 10https://gerrit.wikimedia.org/r/787765
[16:02:16] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs-novastats/wmcs-novastats-dnsleaks.py: minor fix to .svc exclusion [puppet] - 10https://gerrit.wikimedia.org/r/787765 (owner: 10Andrew Bogott)
[16:02:38] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host gitlab-runner1002.eqiad.wmnet with OS bullseye
[16:02:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:02:48] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host gitlab...
[16:03:31] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host gitlab-runner1003.eqiad.wmnet with OS bullseye
[16:03:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:40] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host gitlab...
[16:04:03] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:04:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T306560)', diff saved to https://phabricator.wikimedia.org/P27065 and previous config saved to /var/cache/conftool/dbconfig/20220429-160520-ladsgroup.json
[16:05:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:26] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[16:06:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27066 and previous config saved to /var/cache/conftool/dbconfig/20220429-160602-ladsgroup.json
[16:06:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:09] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:06:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[16:06:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[16:06:54] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[16:06:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:58] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[16:07:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27067 and previous config saved to /var/cache/conftool/dbconfig/20220429-160702-ladsgroup.json
[16:07:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:03] <wikibugs>	 (03PS1) 10Cmjohnson: add gitlab-runner1004 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/787787 (https://phabricator.wikimedia.org/T301177)
[16:09:34] <wikibugs>	 (03PS2) 10Cmjohnson: add gitlab-runner1004 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/787787 (https://phabricator.wikimedia.org/T301177)
[16:10:19] <wikibugs>	 (03CR) 10Cmjohnson: [C: 03+2] add gitlab-runner1004 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/787787 (https://phabricator.wikimedia.org/T301177) (owner: 10Cmjohnson)
[16:12:28] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host gitlab-runner1004.eqiad.wmnet with OS bullseye
[16:12:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:38] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, and 2 others: Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host gitlab-runner1004.eqi...
[16:13:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27068 and previous config saved to /var/cache/conftool/dbconfig/20220429-161323-ladsgroup.json
[16:13:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:30] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:13:33] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
[16:13:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298295)', diff saved to https://phabricator.wikimedia.org/P27069 and previous config saved to /var/cache/conftool/dbconfig/20220429-161352-ladsgroup.json
[16:13:53] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[16:13:55] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[16:13:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:59] <stashbot>	 T298295: Fix length of columns page_restrictions.pr_level/pr_type on wmf wikis - https://phabricator.wikimedia.org/T298295
[16:14:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3316 (T298295)', diff saved to https://phabricator.wikimedia.org/P27070 and previous config saved to /var/cache/conftool/dbconfig/20220429-161400-ladsgroup.json
[16:14:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:39] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1003.eqiad.wmnet with reason: host reimage
[16:14:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:47] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host gitlab1003.wikimedia.org with OS bullseye
[16:14:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:58] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, and 2 others: Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host gitlab1003.wikimedia....
[16:16:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T298295)', diff saved to https://phabricator.wikimedia.org/P27071 and previous config saved to /var/cache/conftool/dbconfig/20220429-161610-ladsgroup.json
[16:16:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:17:10] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
[16:17:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:28] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye
[16:18:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:39] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, and 2 others: Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host gitlab1004.wikimedia....
[16:19:59] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1003.eqiad.wmnet with reason: host reimage
[16:20:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P27072 and previous config saved to /var/cache/conftool/dbconfig/20220429-162025-ladsgroup.json
[16:20:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:23:29] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1004.eqiad.wmnet with reason: host reimage
[16:23:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:25:45] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
[16:25:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:26:56] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1004.eqiad.wmnet with reason: host reimage
[16:27:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27073 and previous config saved to /var/cache/conftool/dbconfig/20220429-162828-ladsgroup.json
[16:28:30] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1002.eqiad.wmnet with OS bullseye
[16:28:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:41] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, and 2 others: Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host gitlab-runner1002.eqiad.w...
[16:29:08] <logmsgbot>	 !log jynus@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS bullseye
[16:29:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:25] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
[16:29:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:46] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
[16:29:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:58] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2042.codfw.wmnet with OS bullseye
[16:30:02] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1001 for host ms-be2042.codfw.wmnet with OS bullseye completed: - ms-be2042 (**PASS**)   - Downtim...
[16:30:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:12] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS buster
[16:30:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:31:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P27074 and previous config saved to /var/cache/conftool/dbconfig/20220429-163115-ladsgroup.json
[16:31:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:31:31] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1003.eqiad.wmnet with OS bullseye
[16:31:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T306560)', diff saved to https://phabricator.wikimedia.org/P27075 and previous config saved to /var/cache/conftool/dbconfig/20220429-163135-ladsgroup.json
[16:31:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:31:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:31:42] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[16:31:44] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host gitlab-run...
[16:32:49] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
[16:32:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:36] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10Dzahn) >>! In T301177#7891791, @Cmjohnson wrote: >>>! In T301177#7886110, @Dzahn wrote: >> confirming that...
[16:35:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P27076 and previous config saved to /var/cache/conftool/dbconfig/20220429-163530-ladsgroup.json
[16:35:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:38] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1004.eqiad.wmnet with OS bullseye
[16:37:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:47] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host gitlab-run...
[16:41:41] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1003.wikimedia.org with OS bullseye
[16:41:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:50] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host gitlab1003...
[16:43:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27077 and previous config saved to /var/cache/conftool/dbconfig/20220429-164333-ladsgroup.json
[16:43:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:54] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye
[16:43:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:44:01] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host gitlab1004...
[16:44:58] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10Cmjohnson)
[16:46:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P27078 and previous config saved to /var/cache/conftool/dbconfig/20220429-164620-ladsgroup.json
[16:46:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:38] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10Cmjohnson) 05Open→03Resolved @Dzahn These have all been installed and resolving the task
[16:46:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P27079 and previous config saved to /var/cache/conftool/dbconfig/20220429-164640-ladsgroup.json
[16:46:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:47:40] <wikibugs>	 (03PS1) 10Vivian Rook: adding rook removing mdipietro [labs/private] - 10https://gerrit.wikimedia.org/r/787789
[16:49:01] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] adding rook removing mdipietro [labs/private] - 10https://gerrit.wikimedia.org/r/787789 (owner: 10Vivian Rook)
[16:50:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T306560)', diff saved to https://phabricator.wikimedia.org/P27080 and previous config saved to /var/cache/conftool/dbconfig/20220429-165035-ladsgroup.json
[16:50:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
[16:50:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
[16:50:42] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[16:50:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
[16:50:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:52] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
[16:50:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:51:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[16:53:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[16:53:25] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[16:53:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[16:53:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1156 (T276292)', diff saved to https://phabricator.wikimedia.org/P27081 and previous config saved to /var/cache/conftool/dbconfig/20220429-165333-ladsgroup.json
[16:53:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:48] <stashbot>	 T276292: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292
[16:56:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T276292)', diff saved to https://phabricator.wikimedia.org/P27082 and previous config saved to /var/cache/conftool/dbconfig/20220429-165613-ladsgroup.json
[16:56:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:56:31] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hieradata: swap remaining ldap-labs names to ldap-rw [puppet] - 10https://gerrit.wikimedia.org/r/786265 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[16:58:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27083 and previous config saved to /var/cache/conftool/dbconfig/20220429-165839-ladsgroup.json
[16:58:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:58:46] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:59:28] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[16:59:29] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[16:59:30] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[16:59:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:34] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[16:59:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27084 and previous config saved to /var/cache/conftool/dbconfig/20220429-165939-ladsgroup.json
[16:59:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T298295)', diff saved to https://phabricator.wikimedia.org/P27085 and previous config saved to /var/cache/conftool/dbconfig/20220429-170125-ladsgroup.json
[17:01:27] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[17:01:29] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[17:01:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:33] <stashbot>	 T298295: Fix length of columns page_restrictions.pr_level/pr_type on wmf wikis - https://phabricator.wikimedia.org/T298295
[17:01:37] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
[17:01:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:38] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
[17:01:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[17:01:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[17:01:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P27086 and previous config saved to /var/cache/conftool/dbconfig/20220429-170145-ladsgroup.json
[17:01:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:54] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[17:01:55] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
[17:01:55] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[17:01:56] <logmsgbot>	 !log jynus@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS buster
[17:01:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
[17:01:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[17:02:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[17:02:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[17:02:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:02] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[17:02:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1121 (T306560)', diff saved to https://phabricator.wikimedia.org/P27087 and previous config saved to /var/cache/conftool/dbconfig/20220429-170205-ladsgroup.json
[17:02:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:55] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[17:05:23] <wikibugs>	 (03PS1) 10Andrew Bogott: codfw1dev: standardize ldap servers [puppet] - 10https://gerrit.wikimedia.org/r/787790
[17:05:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27088 and previous config saved to /var/cache/conftool/dbconfig/20220429-170559-ladsgroup.json
[17:06:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:06:06] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:07:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] codfw1dev: standardize ldap servers [puppet] - 10https://gerrit.wikimedia.org/r/787790 (owner: 10Andrew Bogott)
[17:08:20] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) Well, this is a bit confusing.  I've examined packet captures from two pods in eqiad and another in codfw....
[17:11:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P27089 and previous config saved to /var/cache/conftool/dbconfig/20220429-171118-ladsgroup.json
[17:11:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298295)', diff saved to https://phabricator.wikimedia.org/P27090 and previous config saved to /var/cache/conftool/dbconfig/20220429-171318-ladsgroup.json
[17:13:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:25] <stashbot>	 T298295: Fix length of columns page_restrictions.pr_level/pr_type on wmf wikis - https://phabricator.wikimedia.org/T298295
[17:13:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121 (T306560)', diff saved to https://phabricator.wikimedia.org/P27091 and previous config saved to /var/cache/conftool/dbconfig/20220429-171339-ladsgroup.json
[17:13:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:47] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[17:16:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T306560)', diff saved to https://phabricator.wikimedia.org/P27092 and previous config saved to /var/cache/conftool/dbconfig/20220429-171650-ladsgroup.json
[17:16:52] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[17:16:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[17:16:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:16:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1157 (T306560)', diff saved to https://phabricator.wikimedia.org/P27093 and previous config saved to /var/cache/conftool/dbconfig/20220429-171658-ladsgroup.json
[17:17:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:17:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:17:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27094 and previous config saved to /var/cache/conftool/dbconfig/20220429-172104-ladsgroup.json
[17:21:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:25] <wikibugs>	 (03PS1) 10Andrew Bogott: wmfkeystonehooks: modest improvement to exception handling [puppet] - 10https://gerrit.wikimedia.org/r/787791
[17:23:11] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1054 is CRITICAL: CRITICAL - degraded: The following units failed: session-326117.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:23:14] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmfkeystonehooks: modest improvement to exception handling [puppet] - 10https://gerrit.wikimedia.org/r/787791 (owner: 10Andrew Bogott)
[17:25:05] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:26:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P27095 and previous config saved to /var/cache/conftool/dbconfig/20220429-172623-ladsgroup.json
[17:26:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:28:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P27096 and previous config saved to /var/cache/conftool/dbconfig/20220429-172823-ladsgroup.json
[17:28:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:28:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P27097 and previous config saved to /var/cache/conftool/dbconfig/20220429-172845-ladsgroup.json
[17:28:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:29:58] <wikibugs>	 (03PS1) 10Tchanders: Add QuickSurveys survey for the SimilarEditors feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787793 (https://phabricator.wikimedia.org/T307025)
[17:31:54] <wikibugs>	 (03CR) 10Tchanders: "This can be tested locally by adding the config to LocalSettings.php and pulling down I0b508faf3445c6e7caffc964a5aa67231a01da9b" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787793 (https://phabricator.wikimedia.org/T307025) (owner: 10Tchanders)
[17:35:46] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) I have a few errors logged by ats-be attempting to connect to `eventgate-analytics-external.discovery.wmne...
[17:36:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27098 and previous config saved to /var/cache/conftool/dbconfig/20220429-173609-ladsgroup.json
[17:36:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:48] <Amir1>	 !log killed bnwiki's refresh links recommendation (T299021) 
[17:36:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:54] <stashbot>	 T299021: Shorten running time of refreshLinkRecommendations.php - https://phabricator.wikimedia.org/T299021
[17:39:18] <jinxer-wm>	 (ProbeDown) firing: Service thumbor:8800 has failed probes (http_thumbor_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=http - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:39:25] <wikibugs>	 (03PS1) 10Andrew Bogott: wmfkeystonehooks: raise loglevel for ldap failures [puppet] - 10https://gerrit.wikimedia.org/r/787794
[17:39:43] <wikibugs>	 (03CR) 10Ottomata: Image Suggestions Feedback Stream (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749 (owner: 10Luke Bowmaker)
[17:40:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmfkeystonehooks: raise loglevel for ldap failures [puppet] - 10https://gerrit.wikimedia.org/r/787794 (owner: 10Andrew Bogott)
[17:41:22] <wikibugs>	 (03PS2) 10Andrew Bogott: wmfkeystonehooks: raise loglevel for ldap failures [puppet] - 10https://gerrit.wikimedia.org/r/787794
[17:41:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T276292)', diff saved to https://phabricator.wikimedia.org/P27099 and previous config saved to /var/cache/conftool/dbconfig/20220429-174129-ladsgroup.json
[17:41:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:41:36] <stashbot>	 T276292: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292
[17:41:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T306560)', diff saved to https://phabricator.wikimedia.org/P27100 and previous config saved to /var/cache/conftool/dbconfig/20220429-174136-ladsgroup.json
[17:41:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:41:43] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[17:43:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P27101 and previous config saved to /var/cache/conftool/dbconfig/20220429-174328-ladsgroup.json
[17:43:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:43:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P27102 and previous config saved to /var/cache/conftool/dbconfig/20220429-174350-ladsgroup.json
[17:43:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:44:18] <jinxer-wm>	 (ProbeDown) resolved: Service thumbor:8800 has failed probes (http_thumbor_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=http - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:50:03] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmfkeystonehooks: raise loglevel for ldap failures [puppet] - 10https://gerrit.wikimedia.org/r/787794 (owner: 10Andrew Bogott)
[17:51:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27103 and previous config saved to /var/cache/conftool/dbconfig/20220429-175114-ladsgroup.json
[17:51:16] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[17:51:17] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[17:51:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:51:22] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:51:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P27104 and previous config saved to /var/cache/conftool/dbconfig/20220429-175122-ladsgroup.json
[17:51:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:51:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:51:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:12] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10Ottomata) > perhaps this is a client browser opening a connection but sending an empty POST body This seems likely,...
[17:56:12] <wikibugs>	 (03PS6) 10Luke Bowmaker: Image Suggestions Feedback Stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787749
[17:56:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27105 and previous config saved to /var/cache/conftool/dbconfig/20220429-175642-ladsgroup.json
[17:56:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:57:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P27106 and previous config saved to /var/cache/conftool/dbconfig/20220429-175757-ladsgroup.json
[17:58:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:58:04] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:58:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298295)', diff saved to https://phabricator.wikimedia.org/P27107 and previous config saved to /var/cache/conftool/dbconfig/20220429-175833-ladsgroup.json
[17:58:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[17:58:37] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[17:58:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:58:41] <stashbot>	 T298295: Fix length of columns page_restrictions.pr_level/pr_type on wmf wikis - https://phabricator.wikimedia.org/T298295
[17:58:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3316 (T298295)', diff saved to https://phabricator.wikimedia.org/P27108 and previous config saved to /var/cache/conftool/dbconfig/20220429-175841-ladsgroup.json
[17:58:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:58:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:58:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121 (T306560)', diff saved to https://phabricator.wikimedia.org/P27109 and previous config saved to /var/cache/conftool/dbconfig/20220429-175855-ladsgroup.json
[17:58:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:58:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
[17:58:58] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
[17:59:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:02] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[17:59:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1141 (T306560)', diff saved to https://phabricator.wikimedia.org/P27110 and previous config saved to /var/cache/conftool/dbconfig/20220429-175903-ladsgroup.json
[17:59:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298295)', diff saved to https://phabricator.wikimedia.org/P27111 and previous config saved to /var/cache/conftool/dbconfig/20220429-175951-ladsgroup.json
[17:59:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:02:07] <icinga-wm>	 PROBLEM - SSH on pki2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:09:17] <icinga-wm>	 RECOVERY - SSH on wtp1045.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:11:30] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
[18:11:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141 (T306560)', diff saved to https://phabricator.wikimedia.org/P27112 and previous config saved to /var/cache/conftool/dbconfig/20220429-181145-ladsgroup.json
[18:11:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:52] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[18:11:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27113 and previous config saved to /var/cache/conftool/dbconfig/20220429-181153-ladsgroup.json
[18:11:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:13:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27114 and previous config saved to /var/cache/conftool/dbconfig/20220429-181302-ladsgroup.json
[18:13:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P27115 and previous config saved to /var/cache/conftool/dbconfig/20220429-181456-ladsgroup.json
[18:15:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:17:23] <icinga-wm>	 RECOVERY - Disk space on ms-be1040 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ms-be1040&var-datasource=eqiad+prometheus/ops
[18:21:37] <logmsgbot>	 !log mvernon@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be1040.eqiad.wmnet
[18:21:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:59] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1040 is CRITICAL: CRITICAL - degraded: The following units failed: srv-swift\x2dstorage-sdc1.mount,srv-swift\x2dstorage-sdf1.mount,srv-swift\x2dstorage-sdh1.mount,srv-swift\x2dstorage-sdk1.mount,srv-swift\x2dstorage-sdm1.mount https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:25:55] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on ms-be1040 is CRITICAL: CRITICAL - degraded: The following units failed: srv-swift\x2dstorage-sdc1.mount,srv-swift\x2dstorage-sdf1.mount,srv-swift\x2dstorage-sdh1.mount,srv-swift\x2dstorage-sdk1.mount,srv-swift\x2dstorage-sdm1.mount MVernon filesystems are sad system is attempting repair. - The acknowledgement expires at: 2022-05-03 10:25:05. https://wikitech.wikimedia.org/wiki/Monitoring/check_syst
[18:25:55] <icinga-wm>	 e
[18:26:15] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:26:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P27116 and previous config saved to /var/cache/conftool/dbconfig/20220429-182653-ladsgroup.json
[18:27:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:27:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T306560)', diff saved to https://phabricator.wikimedia.org/P27117 and previous config saved to /var/cache/conftool/dbconfig/20220429-182700-ladsgroup.json
[18:27:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[18:27:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[18:27:06] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[18:27:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:27:08] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[18:27:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[18:27:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:27:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1112 (T306560)', diff saved to https://phabricator.wikimedia.org/P27118 and previous config saved to /var/cache/conftool/dbconfig/20220429-182714-ladsgroup.json
[18:27:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:27:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:27:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:27:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:28:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27119 and previous config saved to /var/cache/conftool/dbconfig/20220429-182807-ladsgroup.json
[18:28:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:30:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P27120 and previous config saved to /var/cache/conftool/dbconfig/20220429-183001-ladsgroup.json
[18:30:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:30:52] <wikibugs>	 (03PS5) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[18:31:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[18:32:57] <icinga-wm>	 PROBLEM - SSH on analytics1061.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:37:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
[18:37:35] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
[18:37:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:37:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[18:38:24] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[18:38:26] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[18:38:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:38:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:39:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
[18:39:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
[18:39:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:39:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:42:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P27121 and previous config saved to /var/cache/conftool/dbconfig/20220429-184200-ladsgroup.json
[18:42:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P27122 and previous config saved to /var/cache/conftool/dbconfig/20220429-184313-ladsgroup.json
[18:43:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:20] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[18:44:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[18:44:02] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[18:44:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[18:44:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:44:07] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[18:44:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:44:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27123 and previous config saved to /var/cache/conftool/dbconfig/20220429-184411-ladsgroup.json
[18:44:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:44:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:44:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:45:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298295)', diff saved to https://phabricator.wikimedia.org/P27124 and previous config saved to /var/cache/conftool/dbconfig/20220429-184506-ladsgroup.json
[18:45:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:45:13] <stashbot>	 T298295: Fix length of columns page_restrictions.pr_level/pr_type on wmf wikis - https://phabricator.wikimedia.org/T298295
[18:48:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
[18:48:41] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
[18:48:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:48:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27125 and previous config saved to /var/cache/conftool/dbconfig/20220429-185034-ladsgroup.json
[18:50:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:41] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[18:50:50] <wikibugs>	 (03PS6) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[18:51:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T306560)', diff saved to https://phabricator.wikimedia.org/P27126 and previous config saved to /var/cache/conftool/dbconfig/20220429-185109-ladsgroup.json
[18:51:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:16] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[18:51:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[18:56:22] <wikibugs>	 (03PS7) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[18:57:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141 (T306560)', diff saved to https://phabricator.wikimedia.org/P27127 and previous config saved to /var/cache/conftool/dbconfig/20220429-185705-ladsgroup.json
[18:57:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[18:57:08] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[18:57:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:57:13] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[18:57:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:57:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:57:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[19:00:44] <wikibugs>	 (03PS8) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[19:01:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[19:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[19:03:15] <icinga-wm>	 RECOVERY - SSH on pki2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:05:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27128 and previous config saved to /var/cache/conftool/dbconfig/20220429-190539-ladsgroup.json
[19:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P27129 and previous config saved to /var/cache/conftool/dbconfig/20220429-190614-ladsgroup.json
[19:06:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:08:12] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH)
[19:08:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[19:08:22] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[19:08:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:08:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:04] <wikibugs>	 (03PS9) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[19:18:30] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[19:19:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
[19:19:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
[19:19:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:19:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1142 (T306560)', diff saved to https://phabricator.wikimedia.org/P27130 and previous config saved to /var/cache/conftool/dbconfig/20220429-191932-ladsgroup.json
[19:19:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:19:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:19:44] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[19:19:50] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[19:20:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27131 and previous config saved to /var/cache/conftool/dbconfig/20220429-192044-ladsgroup.json
[19:20:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:21:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P27132 and previous config saved to /var/cache/conftool/dbconfig/20220429-192119-ladsgroup.json
[19:21:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:22:45] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[19:25:54] <wikibugs>	 (03PS1) 10Papaul: Add new aqs node to site.pp and to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/787812 (https://phabricator.wikimedia.org/T305568)
[19:26:35] <wikibugs>	 (03PS10) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[19:27:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[19:29:35] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] Add new aqs node to site.pp and to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/787812 (https://phabricator.wikimedia.org/T305568) (owner: 10Papaul)
[19:29:47] <wikibugs>	 (03PS2) 10Papaul: Add new aqs node to site.pp and to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/787812 (https://phabricator.wikimedia.org/T305568)
[19:31:21] <wikibugs>	 (03PS11) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[19:32:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[19:32:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142 (T306560)', diff saved to https://phabricator.wikimedia.org/P27133 and previous config saved to /var/cache/conftool/dbconfig/20220429-193230-ladsgroup.json
[19:32:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:32:38] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[19:33:32] <wikibugs>	 (03PS12) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[19:33:34] <wikibugs>	 (03PS1) 10Cwhite: opensearch: enable curator version override [puppet] - 10https://gerrit.wikimedia.org/r/787816 (https://phabricator.wikimedia.org/T301017)
[19:34:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[19:34:49] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudcontrol100[6-7].wikimedia.org - https://phabricator.wikimedia.org/T306853 (10Andrew)  > I'm also interested in knowing how they work, will both servers be a redundant pair? if so how does failover work...
[19:35:48] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] "PCC NOOP https://puppet-compiler.wmflabs.org/pcc-worker1003/35011/" [puppet] - 10https://gerrit.wikimedia.org/r/787816 (https://phabricator.wikimedia.org/T301017) (owner: 10Cwhite)
[19:35:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27134 and previous config saved to /var/cache/conftool/dbconfig/20220429-193549-ladsgroup.json
[19:35:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:35:57] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[19:36:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T306560)', diff saved to https://phabricator.wikimedia.org/P27135 and previous config saved to /var/cache/conftool/dbconfig/20220429-193624-ladsgroup.json
[19:36:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[19:36:27] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[19:36:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:32] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host aqs2001.codfw.wmnet with OS bullseye
[19:36:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[19:36:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[19:36:41] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops, 10Patch-For-Review: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host aqs2001.codfw.wmnet with OS bullseye
[19:36:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[19:36:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[19:36:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27136 and previous config saved to /var/cache/conftool/dbconfig/20220429-193649-ladsgroup.json
[19:36:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:37:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:37:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:37:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:37:23] <wikibugs>	 (03PS13) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[19:37:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[19:38:46] <wikibugs>	 10SRE, 10serviceops: Service Ops SRE support for iOS notifications update - https://phabricator.wikimedia.org/T306397 (10Dzahn)
[19:39:15] <wikibugs>	 10SRE, 10serviceops: Service Ops SRE support for iOS notifications update - https://phabricator.wikimedia.org/T306397 (10Dzahn)
[19:41:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[19:41:17] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[19:41:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:41:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1175 (T306560)', diff saved to https://phabricator.wikimedia.org/P27137 and previous config saved to /var/cache/conftool/dbconfig/20220429-194122-ladsgroup.json
[19:41:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:41:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:41:32] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[19:43:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27138 and previous config saved to /var/cache/conftool/dbconfig/20220429-194308-ladsgroup.json
[19:43:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:43:16] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[19:44:23] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab200[2|3] and gitlab-runner200[2|3|4] - https://phabricator.wikimedia.org/T301183 (10Dzahn)
[19:44:31] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10Dzahn)
[19:44:41] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab100[3|4] and gitlab-runner100[2|3|4] - https://phabricator.wikimedia.org/T301177 (10Dzahn) Thank you @Cmjohnson   We continue this on T307142
[19:47:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P27139 and previous config saved to /var/cache/conftool/dbconfig/20220429-194735-ladsgroup.json
[19:47:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:49:27] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10GitLab (Infrastructure): Q3:(Need By: TBD) rack/setup/install gitlab200[2|3] and gitlab-runner200[2|3|4] - https://phabricator.wikimedia.org/T301183 (10Dzahn) Thank you @Papaul. We continue on T307142
[19:50:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[19:52:03] <wikibugs>	 (03PS1) 10Dzahn: add gitlab-runner role on new physical server gitlab-runner2002 [puppet] - 10https://gerrit.wikimedia.org/r/787820 (https://phabricator.wikimedia.org/T307142)
[19:56:22] <wikibugs>	 10SRE, 10serviceops: Service Ops SRE support for iOS notifications update - https://phabricator.wikimedia.org/T306397 (10Dmantena)
[19:58:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27140 and previous config saved to /var/cache/conftool/dbconfig/20220429-195813-ladsgroup.json
[19:58:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:02:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P27141 and previous config saved to /var/cache/conftool/dbconfig/20220429-200240-ladsgroup.json
[20:02:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:03] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2001.codfw.wmnet with OS bullseye
[20:08:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:11] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops, 10Patch-For-Review: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host aqs2001.codfw.wmnet with OS bullseye executed wi...
[20:11:18] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host aqs2001.codfw.wmnet with OS bullseye
[20:11:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:11:35] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops, 10Patch-For-Review: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host aqs2001.codfw.wmnet with OS bullseye
[20:11:35] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:11:57] <wikibugs>	 (03PS1) 10Cwhite: opensearch: set USE_OPENSEARCH curator env variable [puppet] - 10https://gerrit.wikimedia.org/r/787824 (https://phabricator.wikimedia.org/T301017)
[20:13:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27142 and previous config saved to /var/cache/conftool/dbconfig/20220429-201319-ladsgroup.json
[20:13:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:30] <wikibugs>	 (03PS14) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[20:15:28] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host aqs2002.codfw.wmnet with OS bullseye
[20:15:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:35] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops, 10Patch-For-Review: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host aqs2002.codfw.wmnet with OS bullseye
[20:15:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[20:16:14] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
[20:16:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:17:37] <wikibugs>	 (03PS1) 10Cwhite: beta-logs: disable compatibility mode [puppet] - 10https://gerrit.wikimedia.org/r/787826 (https://phabricator.wikimedia.org/T301017)
[20:17:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142 (T306560)', diff saved to https://phabricator.wikimedia.org/P27143 and previous config saved to /var/cache/conftool/dbconfig/20220429-201745-ladsgroup.json
[20:17:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
[20:17:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
[20:17:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:17:52] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[20:17:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1143 (T306560)', diff saved to https://phabricator.wikimedia.org/P27144 and previous config saved to /var/cache/conftool/dbconfig/20220429-201753-ladsgroup.json
[20:17:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:18:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:18:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:39] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
[20:19:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:55] <wikibugs>	 10SRE, 10ops-drmrs, 10ops-esams, 10Infrastructure-Foundations, 10netops: drmrs-esams wave provisioning - https://phabricator.wikimedia.org/T307221 (10wiki_willy) @RobH - here are the LOAs in pdf format below:  {F35074530}  {F35074529}  Thanks, Willy
[20:28:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P27145 and previous config saved to /var/cache/conftool/dbconfig/20220429-202824-ladsgroup.json
[20:28:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:32] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:30:01] <wikibugs>	 (03PS1) 10Jforrester: [Beta Cluster] Set special footer licence message for Wikifunctions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787828 (https://phabricator.wikimedia.org/T297330)
[20:30:03] <wikibugs>	 (03PS1) 10Jforrester: Set special footer licence message for MediaWiki.org re. Help: pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/787829 (https://phabricator.wikimedia.org/T301483)
[20:30:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143 (T306560)', diff saved to https://phabricator.wikimedia.org/P27146 and previous config saved to /var/cache/conftool/dbconfig/20220429-203045-ladsgroup.json
[20:30:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:52] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[20:31:17] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2001.codfw.wmnet with OS bullseye
[20:31:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:31:23] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host aqs2001.codfw.wmnet with OS bullseye completed: - aqs2001 (**PASS**)...
[20:31:39] <wikibugs>	 10SRE, 10serviceops: Q1:(Need By: TBD) rack/setup/install mw241[2-9].codfw.wmnet - https://phabricator.wikimedia.org/T290192 (10Dzahn)
[20:34:34] <wikibugs>	 10SRE, 10serviceops: Q1:(Need By: TBD) rack/setup/install mw241[2-9].codfw.wmnet - https://phabricator.wikimedia.org/T290192 (10Dzahn) 05Open→03Resolved >>! In T290192#7886070, @Papaul wrote: > @Dzahn i think it is best to create another task for this issue and not reopen the rack/setup task. Thanks  repla...
[20:34:41] <icinga-wm>	 RECOVERY - SSH on analytics1061.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:34:42] <wikibugs>	 10SRE, 10serviceops: Q1:(Need By: TBD) rack/setup/install mw241[2-9].codfw.wmnet - https://phabricator.wikimedia.org/T290192 (10Dzahn) a:05Dzahn→03Papaul
[20:35:28] <wikibugs>	 (03PS15) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[20:36:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[20:41:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T306560)', diff saved to https://phabricator.wikimedia.org/P27147 and previous config saved to /var/cache/conftool/dbconfig/20220429-204136-ladsgroup.json
[20:41:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:41:44] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[20:44:13] <wikibugs>	 (03PS16) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[20:44:18] <jinxer-wm>	 (ProbeDown) firing: Service thumbor:8800 has failed probes (http_thumbor_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=http - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:45:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P27148 and previous config saved to /var/cache/conftool/dbconfig/20220429-204550-ladsgroup.json
[20:45:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:23] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2002.codfw.wmnet with OS bullseye
[20:46:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:28] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host aqs2002.codfw.wmnet with OS bullseye executed with errors: - aqs2002 (...
[20:46:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[20:49:18] <jinxer-wm>	 (ProbeDown) resolved: Service thumbor:8800 has failed probes (http_thumbor_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=http - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:52:14] <wikibugs>	 (03PS17) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[20:54:19] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "lgtm! I'd like to be around when you merge though." [puppet] - 10https://gerrit.wikimedia.org/r/779936 (https://phabricator.wikimedia.org/T305589) (owner: 10Ssingh)
[20:54:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[20:56:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P27149 and previous config saved to /var/cache/conftool/dbconfig/20220429-205641-ladsgroup.json
[20:56:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P27150 and previous config saved to /var/cache/conftool/dbconfig/20220429-210055-ladsgroup.json
[21:01:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:25] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:openstack::nova::compute::service: replace query_nodes with role_hosts [puppet] - 10https://gerrit.wikimedia.org/r/787484 (owner: 10Jbond)
[21:02:26] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:openstack::haproxy: enable built-in prometheus exporter [puppet] - 10https://gerrit.wikimedia.org/r/786783 (owner: 10Majavah)
[21:04:29] <wikibugs>	 (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/787831
[21:09:43] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:11:33] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "LGTM but let's not merge on a Friday" [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666) (owner: 10Majavah)
[21:11:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P27151 and previous config saved to /var/cache/conftool/dbconfig/20220429-211146-ladsgroup.json
[21:11:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:12:17] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:13:40] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:openstack::rabbitmq: cleanup [puppet] - 10https://gerrit.wikimedia.org/r/787003 (owner: 10Majavah)
[21:14:32] <wikibugs>	 (03PS18) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[21:16:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143 (T306560)', diff saved to https://phabricator.wikimedia.org/P27152 and previous config saved to /var/cache/conftool/dbconfig/20220429-211601-ladsgroup.json
[21:16:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
[21:16:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
[21:16:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:16:08] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[21:16:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1148 (T306560)', diff saved to https://phabricator.wikimedia.org/P27153 and previous config saved to /var/cache/conftool/dbconfig/20220429-211609-ladsgroup.json
[21:16:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:16:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:16:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[21:16:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:21:29] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host aqs2002.codfw.wmnet with OS bullseye
[21:21:33] <wikibugs>	 (03PS19) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[21:21:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:21:36] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host aqs2002.codfw.wmnet with OS bullseye
[21:23:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[21:26:26] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
[21:26:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:26:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T306560)', diff saved to https://phabricator.wikimedia.org/P27154 and previous config saved to /var/cache/conftool/dbconfig/20220429-212652-ladsgroup.json
[21:26:56] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
[21:26:57] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
[21:26:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:26:58] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
[21:26:58] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[21:27:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:27:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
[21:27:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:27:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:27:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:28:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148 (T306560)', diff saved to https://phabricator.wikimedia.org/P27155 and previous config saved to /var/cache/conftool/dbconfig/20220429-212808-ladsgroup.json
[21:28:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:29:53] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
[21:29:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:34:41] <wikibugs>	 (03PS20) 10Bking: Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797)
[21:36:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Elastic: test puppet logic [puppet] - 10https://gerrit.wikimedia.org/r/787505 (https://phabricator.wikimedia.org/T299797) (owner: 10Bking)
[21:41:05] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[21:41:06] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[21:41:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:42:13] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2002.codfw.wmnet with OS bullseye
[21:42:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:42:19] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host aqs2002.codfw.wmnet with OS bullseye completed: - aqs2002 (**PASS**)...
[21:43:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P27156 and previous config saved to /var/cache/conftool/dbconfig/20220429-214313-ladsgroup.json
[21:43:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:37] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS bullseye
[21:54:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:42] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host aqs2003.codfw.wmnet with OS bullseye
[21:58:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P27157 and previous config saved to /var/cache/conftool/dbconfig/20220429-215818-ladsgroup.json
[21:58:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:01:44] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: transform rotation frequency values to datestamp format [puppet] - 10https://gerrit.wikimedia.org/r/777882 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[22:02:07] <wikibugs>	 (03PS5) 10Cwhite: logstash: transform rotation frequency values to datestamp format [puppet] - 10https://gerrit.wikimedia.org/r/777882 (https://phabricator.wikimedia.org/T305175)
[22:13:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148 (T306560)', diff saved to https://phabricator.wikimedia.org/P27158 and previous config saved to /var/cache/conftool/dbconfig/20220429-221323-ladsgroup.json
[22:13:25] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
[22:13:26] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
[22:13:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:13:31] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[22:13:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1149 (T306560)', diff saved to https://phabricator.wikimedia.org/P27159 and previous config saved to /var/cache/conftool/dbconfig/20220429-221331-ladsgroup.json
[22:13:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:13:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:13:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:15:26] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host aqs2004.codfw.wmnet with OS bullseye
[22:15:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:15:31] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host aqs2004.codfw.wmnet with OS bullseye
[22:15:38] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2004.codfw.wmnet with OS bullseye
[22:15:42] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host aqs2004.codfw.wmnet with OS bullseye executed with errors: - aqs2004 (...
[22:15:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:16:56] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host aqs2004.codfw.wmnet with OS bullseye
[22:17:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:17:00] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host aqs2004.codfw.wmnet with OS bullseye
[22:25:43] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2003.codfw.wmnet with OS bullseye
[22:25:48] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host aqs2003.codfw.wmnet with OS bullseye executed with errors: - aqs2003 (...
[22:25:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:26:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149 (T306560)', diff saved to https://phabricator.wikimedia.org/P27160 and previous config saved to /var/cache/conftool/dbconfig/20220429-222620-ladsgroup.json
[22:26:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:26:27] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[22:28:48] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS bullseye
[22:28:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:28:54] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host aqs2003.codfw.wmnet with OS bullseye
[22:33:44] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
[22:33:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:12] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
[22:37:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[22:41:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P27161 and previous config saved to /var/cache/conftool/dbconfig/20220429-224125-ladsgroup.json
[22:41:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:25] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2004.codfw.wmnet with OS bullseye
[22:48:29] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host aqs2004.codfw.wmnet with OS bullseye executed with errors: - aqs2004 (...
[22:48:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:49:26] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2003.codfw.wmnet with OS bullseye
[22:49:30] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host aqs2003.codfw.wmnet with OS bullseye completed: - aqs2003 (**PASS**)...
[22:49:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P27162 and previous config saved to /var/cache/conftool/dbconfig/20220429-225631-ladsgroup.json
[22:56:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[23:11:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149 (T306560)', diff saved to https://phabricator.wikimedia.org/P27163 and previous config saved to /var/cache/conftool/dbconfig/20220429-231136-ladsgroup.json
[23:11:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:11:43] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[23:22:45] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[23:38:40] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s2 on db2101 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2104.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:38:50] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s5 on db2101 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2123.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:39:04] <icinga-wm>	 PROBLEM - MariaDB Replica IO: x1 on db2101 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2096.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:40:52] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s2 on db2101 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:41:02] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s5 on db2101 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:41:18] <icinga-wm>	 RECOVERY - MariaDB Replica IO: x1 on db2101 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:42:12] <icinga-wm>	 PROBLEM - Check systemd state on alert1001 is CRITICAL: CRITICAL - degraded: The following units failed: certspotter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:46:29] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] beta-logs: disable compatibility mode [puppet] - 10https://gerrit.wikimedia.org/r/787826 (https://phabricator.wikimedia.org/T301017) (owner: 10Cwhite)
[23:50:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[23:51:13] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+1] Enable $wgFixDoubleRedirects on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/780636 (https://phabricator.wikimedia.org/T305782) (owner: 10MarcoAurelio)