[00:03:24] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2028 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:09:52] <icinga-wm>	 PROBLEM - SSH on db2090.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:12:13] <wikibugs>	 (03PS2) 10STran: Add IPInfo viewing rights for certain groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/766882 (https://phabricator.wikimedia.org/T296499)
[01:40:30] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job gitlab in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[03:13:22] <icinga-wm>	 RECOVERY - SSH on db2090.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:55:36] <wikibugs>	 (03PS2) 10Ladsgroup: db1147: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/767797 (https://phabricator.wikimedia.org/T302950) (owner: 10Gerrit maintenance bot)
[04:55:41] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] db1147: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/767797 (https://phabricator.wikimedia.org/T302950) (owner: 10Gerrit maintenance bot)
[05:14:48] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[05:14:50] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[05:14:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:14:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[05:15:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[05:15:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21858 and previous config saved to /var/cache/conftool/dbconfig/20220307-051537-ladsgroup.json
[05:15:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:40] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[05:18:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[05:18:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:18:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[05:18:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:18:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21859 and previous config saved to /var/cache/conftool/dbconfig/20220307-051807-ladsgroup.json
[05:18:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:18:10] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[05:22:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.reimage for host db1147.eqiad.wmnet with OS bullseye
[05:22:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:22:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21860 and previous config saved to /var/cache/conftool/dbconfig/20220307-052257-ladsgroup.json
[05:23:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:24:57] <wikibugs>	 (03PS1) 10Juan90264: Revert "Change temporary logo for slwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768155
[05:27:52] <wikibugs>	 (03PS2) 10Juan90264: Revert "Change temporary logo for slwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768155 (https://phabricator.wikimedia.org/T302661)
[05:33:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1147.eqiad.wmnet with reason: host reimage
[05:33:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:36:31] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1147.eqiad.wmnet with reason: host reimage
[05:36:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:38:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21861 and previous config saved to /var/cache/conftool/dbconfig/20220307-053802-ladsgroup.json
[05:38:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:40:56] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job gitlab in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[05:51:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1147.eqiad.wmnet with OS bullseye
[05:51:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:53:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21862 and previous config saved to /var/cache/conftool/dbconfig/20220307-055307-ladsgroup.json
[05:53:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:04:02] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: db2147 SMART error - https://phabricator.wikimedia.org/T302951 (10Marostegui) I don't think we need to swap any disks here, they all seem fine to me: ` Media Error Count: 0 Other Error Count: 0 Media Error Count: 0 Other Error Count: 0 Media Error Count: 0 Other Error Count: 0 Medi...
[06:08:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21863 and previous config saved to /var/cache/conftool/dbconfig/20220307-060811-ladsgroup.json
[06:08:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[06:08:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:08:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[06:08:15] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[06:08:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:08:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:08:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21864 and previous config saved to /var/cache/conftool/dbconfig/20220307-060819-ladsgroup.json
[06:08:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:13:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21865 and previous config saved to /var/cache/conftool/dbconfig/20220307-061318-ladsgroup.json
[06:13:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:13:22] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[06:27:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21866 and previous config saved to /var/cache/conftool/dbconfig/20220307-062713-ladsgroup.json
[06:27:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:27:17] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[06:28:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21867 and previous config saved to /var/cache/conftool/dbconfig/20220307-062823-ladsgroup.json
[06:28:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:40:39] <wikibugs>	 (03PS1) 10Ladsgroup: Revert "db1147: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/768157
[06:40:50] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Revert "db1147: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/768157 (owner: 10Ladsgroup)
[06:42:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21868 and previous config saved to /var/cache/conftool/dbconfig/20220307-064217-ladsgroup.json
[06:42:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:43:21] <wikibugs>	 (03PS2) 10Urbanecm: ThrottleTest: Cast strtotime to bool before comparing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767887
[06:43:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21869 and previous config saved to /var/cache/conftool/dbconfig/20220307-064327-ladsgroup.json
[06:43:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:44:05] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "no-op for prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767887 (owner: 10Urbanecm)
[06:44:44] <wikibugs>	 (03Merged) 10jenkins-bot: ThrottleTest: Cast strtotime to bool before comparing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767887 (owner: 10Urbanecm)
[06:44:59] <wikibugs>	 (03PS3) 10Urbanecm: throttle: Add rule for Wikigap 2022 in CZ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767883 (https://phabricator.wikimedia.org/T303002)
[06:45:13] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] throttle: Add rule for Wikigap 2022 in CZ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767883 (https://phabricator.wikimedia.org/T303002) (owner: 10Urbanecm)
[06:45:24] <wikibugs>	 (03PS4) 10Urbanecm: throttle: Add rule for arwiki Wikigap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767885 (https://phabricator.wikimedia.org/T302973)
[06:45:28] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] throttle: Add rule for arwiki Wikigap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767885 (https://phabricator.wikimedia.org/T302973) (owner: 10Urbanecm)
[06:45:53] <wikibugs>	 (03Merged) 10jenkins-bot: throttle: Add rule for Wikigap 2022 in CZ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767883 (https://phabricator.wikimedia.org/T303002) (owner: 10Urbanecm)
[06:46:09] <wikibugs>	 (03Merged) 10jenkins-bot: throttle: Add rule for arwiki Wikigap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767885 (https://phabricator.wikimedia.org/T302973) (owner: 10Urbanecm)
[06:49:51] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/throttle.php: 2e9fdd4: 867bb7b: Add throttle rules (T302973; T303002) (duration: 00m 49s)
[06:49:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:49:56] <stashbot>	 T302973: Temporary lift IP cap for WikiGap edit-a-thon at Khawarizmi College in 7 March 2022 - https://phabricator.wikimedia.org/T302973
[06:49:56] <stashbot>	 T303002: Request a throttle lift for Wikigap 2022 in Czech Republic – March 10, 2022 - https://phabricator.wikimedia.org/T303002
[06:50:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[06:50:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:52:00] <urbanecm>	 !log Reset authentication throttle for 217.23.37.10 via resetAuthenticationThrottle.php (T302973)
[06:52:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[06:52:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:52:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[06:52:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:52:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:53:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[06:53:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:57:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21870 and previous config saved to /var/cache/conftool/dbconfig/20220307-065722-ladsgroup.json
[06:57:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:58:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21871 and previous config saved to /var/cache/conftool/dbconfig/20220307-065832-ladsgroup.json
[06:58:34] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[06:58:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:58:35] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[06:58:35] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[06:58:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:58:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:58:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21872 and previous config saved to /var/cache/conftool/dbconfig/20220307-065839-ladsgroup.json
[06:58:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:03:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[07:03:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:03:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21873 and previous config saved to /var/cache/conftool/dbconfig/20220307-070355-ladsgroup.json
[07:03:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:03:58] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[07:05:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P21874 and previous config saved to /var/cache/conftool/dbconfig/20220307-070537-marostegui.json
[07:05:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:06:00] <marostegui>	 !log dbmaint on db1179 s3@eqiad T302222
[07:06:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:06:02] <stashbot>	 T302222: Check and fix compressed mismatched tables - https://phabricator.wikimedia.org/T302222
[07:07:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[07:07:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:54] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[07:07:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:09:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21875 and previous config saved to /var/cache/conftool/dbconfig/20220307-070953-root.json
[07:09:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:11:54] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[07:11:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:12:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21876 and previous config saved to /var/cache/conftool/dbconfig/20220307-071227-ladsgroup.json
[07:12:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:12:30] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[07:14:36] <elukey>	 !log kill tmux sessions of user 'zpapierski' on wdqs[1004,2002,2003] (puppet broken, offboarded user)
[07:14:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:36] <elukey>	 !log `elukey@ml-staging-ctrl2002:~$ sudo systemctl reset-failed ifup@ens13.service`
[07:15:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:19:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21877 and previous config saved to /var/cache/conftool/dbconfig/20220307-071900-ladsgroup.json
[07:19:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:23:43] <wikibugs>	 (03PS2) 10Ladsgroup: db1144: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768054 (https://phabricator.wikimedia.org/T302950) (owner: 10Gerrit maintenance bot)
[07:23:47] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] db1144: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768054 (https://phabricator.wikimedia.org/T302950) (owner: 10Gerrit maintenance bot)
[07:24:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[07:24:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:24:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[07:24:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:24:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21878 and previous config saved to /var/cache/conftool/dbconfig/20220307-072453-ladsgroup.json
[07:24:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:24:56] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[07:24:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21879 and previous config saved to /var/cache/conftool/dbconfig/20220307-072457-root.json
[07:24:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:25:24] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2077: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/768158
[07:26:04] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db2077: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/768158 (owner: 10Marostegui)
[07:26:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21880 and previous config saved to /var/cache/conftool/dbconfig/20220307-072624-ladsgroup.json
[07:26:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:28:35] <wikibugs>	 10ops-eqiad: analytics10[63,67] mgmt interfaces seem flapping from time to time - https://phabricator.wikimedia.org/T303151 (10elukey)
[07:32:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.reimage for host db1144.eqiad.wmnet with OS bullseye
[07:32:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:33:43] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: db1143: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768649 (https://phabricator.wikimedia.org/T302950)
[07:34:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21881 and previous config saved to /var/cache/conftool/dbconfig/20220307-073405-ladsgroup.json
[07:34:06] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: db1142: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768650 (https://phabricator.wikimedia.org/T302950)
[07:34:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:34:19] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: db1141: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768651 (https://phabricator.wikimedia.org/T302950)
[07:34:32] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: db1125: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768652 (https://phabricator.wikimedia.org/T302950)
[07:34:45] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: db1124: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768653 (https://phabricator.wikimedia.org/T302950)
[07:40:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21882 and previous config saved to /var/cache/conftool/dbconfig/20220307-074001-root.json
[07:40:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:44:28] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1144.eqiad.wmnet with reason: host reimage
[07:44:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:44:40] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2025 is CRITICAL: /en.wikipedia.org/v1/page/mobile-html-offline-resources/{title} (Get offline resource links to accompany page content HTML for test page) is CRITICAL: Test Get offline resource links to accompany page content HTML for test page returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[07:47:11] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1144.eqiad.wmnet with reason: host reimage
[07:47:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:48:46] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/page/talk/{title} (Get structured talk page for enwiki Salt article) is CRITICAL: Test Get structured talk page for enwiki Salt article returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[07:48:59] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "LGTM but this is something Ariel needs to approve." [puppet] - 10https://gerrit.wikimedia.org/r/768032 (https://phabricator.wikimedia.org/T300255) (owner: 10Hoo man)
[07:49:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21883 and previous config saved to /var/cache/conftool/dbconfig/20220307-074909-ladsgroup.json
[07:49:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[07:49:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:49:13] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[07:49:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[07:49:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:49:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[07:49:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:49:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:49:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[07:49:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:49:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21884 and previous config saved to /var/cache/conftool/dbconfig/20220307-074923-ladsgroup.json
[07:49:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:50:04] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[07:51:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P21885 and previous config saved to /var/cache/conftool/dbconfig/20220307-075120-marostegui.json
[07:51:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:20] <wikibugs>	 (03CR) 10MMandere: prometheus:rules_global: Provide HAProxy availability metrics (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/768057 (owner: 10Vgutierrez)
[07:53:34] <wikibugs>	 (03PS1) 10Marostegui: db1181: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768654
[07:53:41] <marostegui>	 !log dbmaint on db1181 s7@eqiad T276150
[07:53:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:44] <stashbot>	 T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150
[07:54:18] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1181: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768654 (owner: 10Marostegui)
[07:54:27] <wikibugs>	 (03CR) 10MMandere: [C: 03+1] prometheus:rules_ops: Provide HAProxy total responses metrics [puppet] - 10https://gerrit.wikimedia.org/r/768056 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[07:54:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21886 and previous config saved to /var/cache/conftool/dbconfig/20220307-075433-ladsgroup.json
[07:54:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:54:37] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[07:55:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21887 and previous config saved to /var/cache/conftool/dbconfig/20220307-075504-root.json
[07:55:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:55:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P21888 and previous config saved to /var/cache/conftool/dbconfig/20220307-075523-marostegui.json
[07:55:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21889 and previous config saved to /var/cache/conftool/dbconfig/20220307-075708-root.json
[07:57:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:04] <jouncebot>	 Amir1, awight, Urbanecm, and taavi: (Dis)respected human, time to deploy UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220307T0800). Please do the needful.
[08:00:04] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[08:00:08] <taavi>	 o/
[08:00:26] <taavi>	 looks like nothing to do
[08:01:25] <urbanecm>	 Indeed. 
[08:03:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1144.eqiad.wmnet with OS bullseye
[08:03:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:18] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1181: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/768159
[08:05:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21890 and previous config saved to /var/cache/conftool/dbconfig/20220307-080545-root.json
[08:05:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:52] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db1181: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/768159 (owner: 10Marostegui)
[08:08:18] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[08:08:54] <wikibugs>	 (03CR) 10Muehlenhoff: "No need, certspotter is an edge package and only used by the alert* hosts, you can instead simply upload the 0.10 package to buster-wikime" [puppet] - 10https://gerrit.wikimedia.org/r/768058 (owner: 10Ssingh)
[08:09:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21891 and previous config saved to /var/cache/conftool/dbconfig/20220307-080938-ladsgroup.json
[08:09:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:38] <icinga-wm>	 PROBLEM - At least one CPU core of an LVS is saturated- packet drops are likely on lvs5002 is CRITICAL: cpu={0,10,12,14,2,4,6,8} https://bit.ly/wmf-lvscpu https://grafana.wikimedia.org/d/000000377/host-overview?var-server=lvs5002&var-datasource=eqsin+prometheus/ops
[08:12:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21892 and previous config saved to /var/cache/conftool/dbconfig/20220307-081212-root.json
[08:12:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:15:18] <icinga-wm>	 RECOVERY - At least one CPU core of an LVS is saturated- packet drops are likely on lvs5002 is OK: All metrics within thresholds. https://bit.ly/wmf-lvscpu https://grafana.wikimedia.org/d/000000377/host-overview?var-server=lvs5002&var-datasource=eqsin+prometheus/ops
[08:17:46] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) firing: Alert for device cr2-eqsin.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[08:17:46] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) firing: Alert for device cr2-eqsin.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[08:17:54] <XioNoX>	 yo
[08:18:48] <XioNoX>	 _security
[08:19:02] <elukey>	 yep
[08:20:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 20%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21893 and previous config saved to /var/cache/conftool/dbconfig/20220307-082049-root.json
[08:20:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:46] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) resolved: Device cr2-eqsin.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[08:22:46] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) resolved: Device cr2-eqsin.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[08:24:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21894 and previous config saved to /var/cache/conftool/dbconfig/20220307-082443-ladsgroup.json
[08:24:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21895 and previous config saved to /var/cache/conftool/dbconfig/20220307-082716-root.json
[08:27:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:35:09] <wikibugs>	 (03CR) 10DCausse: "needs to be rebased on the patch pulling the new image with s3 client drivers" [deployment-charts] - 10https://gerrit.wikimedia.org/r/766123 (https://phabricator.wikimedia.org/T302494) (owner: 10ZPapierski)
[08:35:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21896 and previous config saved to /var/cache/conftool/dbconfig/20220307-083553-root.json
[08:35:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21897 and previous config saved to /var/cache/conftool/dbconfig/20220307-083948-ladsgroup.json
[08:39:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:51] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[08:39:55] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[08:39:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[08:39:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
[08:39:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:40:08] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
[08:40:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:40:11] <urbanecm>	 jouncebot: nowandnext
[08:40:12] <jouncebot>	 For the next 0 hour(s) and 19 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220307T0800)
[08:40:12] <jouncebot>	 In 5 hour(s) and 19 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220307T1400)
[08:40:33] <urbanecm>	 since nothing's happening in B&C, let me sneak in one patch
[08:40:50] <wikibugs>	 (03PS2) 10Urbanecm: enwiki: Deploy Growth features to 100% of users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767525 (https://phabricator.wikimedia.org/T302846)
[08:40:54] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] enwiki: Deploy Growth features to 100% of users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767525 (https://phabricator.wikimedia.org/T302846) (owner: 10Urbanecm)
[08:41:22] <wikibugs>	 (03CR) 10DCausse: [C: 04-1] "the chart also needs to be updated to declare the new S3 config in the flink config file" [deployment-charts] - 10https://gerrit.wikimedia.org/r/766123 (https://phabricator.wikimedia.org/T302494) (owner: 10ZPapierski)
[08:41:35] <wikibugs>	 (03Merged) 10jenkins-bot: enwiki: Deploy Growth features to 100% of users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767525 (https://phabricator.wikimedia.org/T302846) (owner: 10Urbanecm)
[08:42:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21898 and previous config saved to /var/cache/conftool/dbconfig/20220307-084219-root.json
[08:42:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P21899 and previous config saved to /var/cache/conftool/dbconfig/20220307-084235-marostegui.json
[08:42:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:45] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: e3f70f699e37a27872b73f6483f6d27c669bb520: enwiki: Deploy Growth features to 100% of users (T302846) (duration: 00m 50s)
[08:42:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:48] <stashbot>	 T302846: Scale: increase share on English Wikipedia to 100% / 10% - https://phabricator.wikimedia.org/T302846
[08:43:08] * urbanecm done
[08:43:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[08:43:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:43:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[08:43:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:44:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21900 and previous config saved to /var/cache/conftool/dbconfig/20220307-084413-root.json
[08:44:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:45:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21901 and previous config saved to /var/cache/conftool/dbconfig/20220307-084516-ladsgroup.json
[08:45:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:45:19] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[08:46:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[08:46:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:37] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[08:46:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21902 and previous config saved to /var/cache/conftool/dbconfig/20220307-084641-ladsgroup.json
[08:46:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:44] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[08:46:58] <elukey>	 !log `kafka configs --alter --entity-type topics --entity-name udp_localhost-info --add-config retention.bytes=300000000000` on kafka-logging to reduce the size of the biggest topic partitions
[08:46:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:20] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove cumin2001 from list of Cumin hosts [puppet] - 10https://gerrit.wikimedia.org/r/768656
[08:47:22] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove cumin2001 from mysql root clients and related grants [puppet] - 10https://gerrit.wikimedia.org/r/768657 (https://phabricator.wikimedia.org/T276589)
[08:48:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[08:48:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:15] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "I did some testing and found a couple of issues, see inline." [software/spicerack] - 10https://gerrit.wikimedia.org/r/765480 (https://phabricator.wikimedia.org/T293209) (owner: 10Filippo Giunchedi)
[08:50:09] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, but I'm not sure about the order of things." [puppet] - 10https://gerrit.wikimedia.org/r/768656 (owner: 10Muehlenhoff)
[08:50:26] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, but the grants should also be removed from mysql." [puppet] - 10https://gerrit.wikimedia.org/r/768657 (https://phabricator.wikimedia.org/T276589) (owner: 10Muehlenhoff)
[08:50:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 40%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21903 and previous config saved to /var/cache/conftool/dbconfig/20220307-085056-root.json
[08:50:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:51:28] <wikibugs>	 (03Abandoned) 10Ladsgroup: Depool esams [dns] - 10https://gerrit.wikimedia.org/r/767817 (owner: 10Ladsgroup)
[08:51:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21904 and previous config saved to /var/cache/conftool/dbconfig/20220307-085139-ladsgroup.json
[08:51:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:52:31] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[08:52:32] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[08:52:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:52:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:52:54] <wikibugs>	 10SRE, 10SRE-Access-Requests: Request Administrator Access to Google Search Console - https://phabricator.wikimedia.org/T302625 (10JMeybohm) >>! In T302625#7754020, @dr0ptp4kt wrote: > To answer the question on the creds, no, they don't need to be shared. But delegated access will need to be established. An SR...
[08:54:01] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: db1121: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768658 (https://phabricator.wikimedia.org/T302950)
[08:56:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[08:56:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:57:55] <wikibugs>	 (03PS3) 10Vgutierrez: prometheus:rules_ops: Provide HAProxy total responses metrics [puppet] - 10https://gerrit.wikimedia.org/r/768056 (https://phabricator.wikimedia.org/T290005)
[08:57:57] <wikibugs>	 (03PS2) 10Vgutierrez: prometheus:rules_global: Provide HAProxy availability metrics [puppet] - 10https://gerrit.wikimedia.org/r/768057
[08:58:36] <wikibugs>	 (03CR) 10Vgutierrez: prometheus:rules_global: Provide HAProxy availability metrics (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/768057 (owner: 10Vgutierrez)
[08:59:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21905 and previous config saved to /var/cache/conftool/dbconfig/20220307-085917-root.json
[08:59:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:00:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21906 and previous config saved to /var/cache/conftool/dbconfig/20220307-090021-ladsgroup.json
[09:00:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:00:29] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: db2147 SMART error - https://phabricator.wikimedia.org/T302951 (10Marostegui) 05Open→03Invalid Closing for now, if the RAID finally fails, we can replace the failed disk.
[09:01:52] <dcausse>	 !log restarting blazegraph on wdqs1013 (jvm stuck for 6hours)
[09:01:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:04:07] <icinga-wm>	 PROBLEM - SSH on bast4003 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:06:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21907 and previous config saved to /var/cache/conftool/dbconfig/20220307-090600-root.json
[09:06:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:06:43] <icinga-wm>	 RECOVERY - SSH on bast4003 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:06:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21908 and previous config saved to /var/cache/conftool/dbconfig/20220307-090644-ladsgroup.json
[09:06:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:08:18] <wikibugs>	 (03PS1) 10Jcrespo: puppet: Print nodes that change on every puppet run, sorted [puppet] - 10https://gerrit.wikimedia.org/r/768659
[09:10:48] <wikibugs>	 (03CR) 10Jcrespo: "What do you think? Small improvement (and easy to review) that has improves a lot the quality of life/observability." [puppet] - 10https://gerrit.wikimedia.org/r/768659 (owner: 10Jcrespo)
[09:12:10] <wikibugs>	 (03CR) 10Jcrespo: "cumin1001" [puppet] - 10https://gerrit.wikimedia.org/r/768659 (owner: 10Jcrespo)
[09:14:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21909 and previous config saved to /var/cache/conftool/dbconfig/20220307-091421-root.json
[09:14:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:15:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21910 and previous config saved to /var/cache/conftool/dbconfig/20220307-091527-ladsgroup.json
[09:15:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:01] <wikibugs>	 (03CR) 10Hokwelum: [C: 03+1] "we looked at this together" [puppet] - 10https://gerrit.wikimedia.org/r/768045 (https://phabricator.wikimedia.org/T302930) (owner: 10ArielGlenn)
[09:16:32] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM. Please collect +1 from Filippo before merging." [puppet] - 10https://gerrit.wikimedia.org/r/768294 (owner: 10Majavah)
[09:17:58] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: fix remaining http keystone urls [puppet] - 10https://gerrit.wikimedia.org/r/768293 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah)
[09:20:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[09:20:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:30] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[09:20:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21911 and previous config saved to /var/cache/conftool/dbconfig/20220307-092034-marostegui.json
[09:20:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:38] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[09:21:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 60%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21912 and previous config saved to /var/cache/conftool/dbconfig/20220307-092103-root.json
[09:21:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:21:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21913 and previous config saved to /var/cache/conftool/dbconfig/20220307-092148-ladsgroup.json
[09:21:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:22:37] <logmsgbot>	 !log ebysans@deploy1002 Started deploy [airflow-dags/analytics@19520c1]: (no justification provided)
[09:22:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:22:42] <logmsgbot>	 !log ebysans@deploy1002 Finished deploy [airflow-dags/analytics@19520c1]: (no justification provided) (duration: 00m 04s)
[09:22:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:23:29] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] wikitech_private: write to wmg* constants [puppet] - 10https://gerrit.wikimedia.org/r/768260 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[09:24:42] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp2036 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768661 (https://phabricator.wikimedia.org/T290005)
[09:26:59] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp2036 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768661 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[09:28:19] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp2036.codfw.wmnet with OS buster
[09:28:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:32] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp2036.codfw.wmnet with OS buster
[09:29:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21914 and previous config saved to /var/cache/conftool/dbconfig/20220307-092924-root.json
[09:29:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:30:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P21915 and previous config saved to /var/cache/conftool/dbconfig/20220307-093013-marostegui.json
[09:30:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:30:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21916 and previous config saved to /var/cache/conftool/dbconfig/20220307-093032-ladsgroup.json
[09:30:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:30:35] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[09:31:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21917 and previous config saved to /var/cache/conftool/dbconfig/20220307-093146-root.json
[09:31:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:32:20] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Add DNS verification records for Bing and Yandex Webmaster tools [dns] - 10https://gerrit.wikimedia.org/r/768037 (https://phabricator.wikimedia.org/T302617) (owner: 10SCherukuwada)
[09:35:29] <jynus>	 !log updated non-A wikipedia.org DNS records
[09:35:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21918 and previous config saved to /var/cache/conftool/dbconfig/20220307-093607-root.json
[09:36:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21919 and previous config saved to /var/cache/conftool/dbconfig/20220307-093615-ladsgroup.json
[09:36:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:18] <jynus>	 !log updated non-A wikipedia.org DNS records T302617
[09:36:18] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[09:36:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:20] <stashbot>	 T302617: Domain Ownership Verification on Various Search Properties - https://phabricator.wikimedia.org/T302617
[09:36:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21920 and previous config saved to /var/cache/conftool/dbconfig/20220307-093653-ladsgroup.json
[09:36:55] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[09:36:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:56] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[09:36:57] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[09:36:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21921 and previous config saved to /var/cache/conftool/dbconfig/20220307-093701-ladsgroup.json
[09:37:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:38:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove cumin2001 from list of Cumin hosts [puppet] - 10https://gerrit.wikimedia.org/r/768656 (owner: 10Muehlenhoff)
[09:40:09] <wikibugs>	 10SRE: Domain Ownership Verification on Various Search Properties - https://phabricator.wikimedia.org/T302617 (10jcrespo) Looking good:  `lines=10,lang=bash root@authdns1001:~$ for i in 0 1 2 ; do dig @ns${i}.wikimedia.org -t txt wikipedia.org ; done  ; <<>> DiG 9.11.5-P4-5.1+deb10u6-Debian <<>> @ns0.wikimedia.o...
[09:40:11] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove cumin2001 from mysql root clients and related grants [puppet] - 10https://gerrit.wikimedia.org/r/768657 (https://phabricator.wikimedia.org/T276589)
[09:40:56] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job gitlab in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[09:41:49] <wikibugs>	 (03CR) 10Filippo Giunchedi: misc: search-grafana-dashboards.js (031 comment) [software] - 10https://gerrit.wikimedia.org/r/767118 (owner: 10Filippo Giunchedi)
[09:42:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21922 and previous config saved to /var/cache/conftool/dbconfig/20220307-094216-ladsgroup.json
[09:42:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:20] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[09:42:39] <wikibugs>	 10SRE, 10Patch-For-Review: migrate services from cumin2001 to cumin2002 - https://phabricator.wikimedia.org/T276589 (10Marostegui) >>! In T276589#7755980, @gerritbot wrote: > Change 768657 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff): > %%%[operations/puppet@production] Remove cumin20...
[09:43:21] <wikibugs>	 (03CR) 10Marostegui: "Commented about it here too: https://phabricator.wikimedia.org/T276589#7756067" [puppet] - 10https://gerrit.wikimedia.org/r/768657 (https://phabricator.wikimedia.org/T276589) (owner: 10Muehlenhoff)
[09:44:04] <wikibugs>	 (03PS1) 10Majavah: P:tcpircbot: cleanup allowed hosts [puppet] - 10https://gerrit.wikimedia.org/r/768662
[09:44:32] <wikibugs>	 (03PS1) 10Btullis: Add a record for datahubsearch service [dns] - 10https://gerrit.wikimedia.org/r/768663 (https://phabricator.wikimedia.org/T301458)
[09:46:32] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
[09:46:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21923 and previous config saved to /var/cache/conftool/dbconfig/20220307-094649-root.json
[09:46:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:56] <wikibugs>	 (03PS1) 10SCherukuwada: Add Yandex's TXT verification entry to www. [dns] - 10https://gerrit.wikimedia.org/r/768664
[09:47:08] <wikibugs>	 10SRE, 10SRE-OnFire (FY2021/2022-Q3), 10Data-Engineering, 10Event-Platform, and 2 others: Banner sampling leading to a relatively wide site outage (mostly esams) - https://phabricator.wikimedia.org/T303036 (10Ladsgroup)
[09:47:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/768294 (owner: 10Majavah)
[09:47:32] <wikibugs>	 (03PS2) 10SCherukuwada: Add Yandex's TXT verification entry to www. [dns] - 10https://gerrit.wikimedia.org/r/768664
[09:48:00] <wikibugs>	 10SRE: Domain Ownership Verification on Various Search Properties - https://phabricator.wikimedia.org/T302617 (10SCherukuwada) I confirm that Bing.com verification has worked properly. However, for Yandex it seems they need the TXT entry to be under www.wikipedia.org and not wikipedia.org. Sent out patch https:/...
[09:48:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add Yandex's TXT verification entry to www. [dns] - 10https://gerrit.wikimedia.org/r/768664 (owner: 10SCherukuwada)
[09:49:04] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
[09:49:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:49:34] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Dale_Zhou - https://phabricator.wikimedia.org/T303031 (10JMeybohm) Thanks @MGerlach. Could you please also provide an expiry/end date for this contract/agreement?
[09:49:40] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for ShubhankarP - https://phabricator.wikimedia.org/T303032 (10JMeybohm) Thanks @MGerlach. Could you please also provide an expiry/end date for this contract/agreement?
[09:50:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add a record for datahubsearch service [dns] - 10https://gerrit.wikimedia.org/r/768663 (https://phabricator.wikimedia.org/T301458) (owner: 10Btullis)
[09:51:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21924 and previous config saved to /var/cache/conftool/dbconfig/20220307-095111-root.json
[09:51:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21925 and previous config saved to /var/cache/conftool/dbconfig/20220307-095120-ladsgroup.json
[09:51:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:35] <wikibugs>	 10SRE, 10Data-Catalog, 10Data-Engineering, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) p:05Triage→03Medium
[09:52:38] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10netbox: Grant cn=nda some sort of read only access to Netbox - https://phabricator.wikimedia.org/T302870 (10JMeybohm) p:05Triage→03Medium
[09:52:59] <wikibugs>	 (03CR) 10Jcrespo: "10:48:16 error: Name 'www.wikipedia.org.': CNAME not allowed alongside other data" [dns] - 10https://gerrit.wikimedia.org/r/768664 (owner: 10SCherukuwada)
[09:53:11] <wikibugs>	 10SRE, 10SRE-Access-Requests: Request Administrator Access to Google Search Console - https://phabricator.wikimedia.org/T302625 (10JMeybohm) a:05JMeybohm→03None
[09:53:59] <wikibugs>	 (03PS3) 10SCherukuwada: Add Yandex's TXT verification entry to www. [dns] - 10https://gerrit.wikimedia.org/r/768664 (https://phabricator.wikimedia.org/T302617)
[09:54:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add Yandex's TXT verification entry to www. [dns] - 10https://gerrit.wikimedia.org/r/768664 (https://phabricator.wikimedia.org/T302617) (owner: 10SCherukuwada)
[09:55:23] <wikibugs>	 (03CR) 10Volans: Add a record for datahubsearch service (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/768663 (https://phabricator.wikimedia.org/T301458) (owner: 10Btullis)
[09:56:00] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/768659 (owner: 10Jcrespo)
[09:57:06] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10netbox: Grant cn=nda some sort of read only access to Netbox - https://phabricator.wikimedia.org/T302870 (10Ladsgroup) >>! In T302870#7750922, @Dzahn wrote: > Are hardware serial numbers more abusable / serious than other things we give NDAed pe...
[09:57:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21926 and previous config saved to /var/cache/conftool/dbconfig/20220307-095720-ladsgroup.json
[09:57:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:56] <wikibugs>	 (03PS2) 10Btullis: Add a record for datahubsearch service [dns] - 10https://gerrit.wikimedia.org/r/768663 (https://phabricator.wikimedia.org/T301458)
[09:58:11] <wikibugs>	 (03CR) 10Btullis: Add a record for datahubsearch service (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/768663 (https://phabricator.wikimedia.org/T301458) (owner: 10Btullis)
[09:58:47] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.prepare-upgrade
[09:58:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:58:53] <wikibugs>	 10SRE, 10DNS, 10Traffic, 10Wikimedia Enterprise: 301 redirect setup for wikimediaenterprise - https://phabricator.wikimedia.org/T302756 (10Vgutierrez) 05Open→03Stalled we cannot perform that redirect cause we don't handle the DNS for that domain: `$ host -t ns wikimediaenterprise.org wikimediaenterpris...
[10:00:06] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.prepare-upgrade
[10:00:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:01:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21927 and previous config saved to /var/cache/conftool/dbconfig/20220307-100153-root.json
[10:01:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:02:24] <wikibugs>	 (03CR) 10Jcrespo: "www is a CNAME to dyna.wikimedia.org, as such it is my understanding that it cannot have further data (TXT or other)." [dns] - 10https://gerrit.wikimedia.org/r/768664 (https://phabricator.wikimedia.org/T302617) (owner: 10SCherukuwada)
[10:03:38] <wikibugs>	 (03PS2) 10Jcrespo: puppet: Print nodes that change on every puppet run, sorted [puppet] - 10https://gerrit.wikimedia.org/r/768659
[10:04:06] <wikibugs>	 (03CR) 10SCherukuwada: "Yup. What I'm trying here is actually incompatible with www being a CNAME. https://www.rfc-editor.org/rfc/rfc1034" [dns] - 10https://gerrit.wikimedia.org/r/768664 (https://phabricator.wikimedia.org/T302617) (owner: 10SCherukuwada)
[10:04:27] <vgutierrez>	 !log pool cp2036 with HAProxy as TLS termination layer - T290005
[10:04:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:31] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[10:05:31] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[10:06:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21928 and previous config saved to /var/cache/conftool/dbconfig/20220307-100624-ladsgroup.json
[10:06:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:17] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[10:08:53] <wikibugs>	 (03CR) 10Jcrespo: "bblack: We were sent the following requirement for Yandex search console authentication. Apparently, they require domain validation by add" [dns] - 10https://gerrit.wikimedia.org/r/768664 (https://phabricator.wikimedia.org/T302617) (owner: 10SCherukuwada)
[10:10:22] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2036.codfw.wmnet with OS buster
[10:10:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:10:33] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp2036.codfw.wmnet with OS buster c...
[10:12:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21929 and previous config saved to /var/cache/conftool/dbconfig/20220307-101225-ladsgroup.json
[10:12:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:11] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp1084 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768667 (https://phabricator.wikimedia.org/T290005)
[10:14:36] <wikibugs>	 (03PS2) 10Ladsgroup: db1143: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768649 (https://phabricator.wikimedia.org/T302950) (owner: 10Gerrit maintenance bot)
[10:14:42] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] db1143: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768649 (https://phabricator.wikimedia.org/T302950) (owner: 10Gerrit maintenance bot)
[10:16:01] <wikibugs>	 (03PS1) 10Btullis: Added config for the datahubsearch LVS service [puppet] - 10https://gerrit.wikimedia.org/r/768668
[10:16:53] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes1001 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:16:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21930 and previous config saved to /var/cache/conftool/dbconfig/20220307-101657-root.json
[10:16:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:08] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp1084 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768667 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[10:17:51] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp1084.eqiad.wmnet with OS buster
[10:17:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:18:03] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp1084.eqiad.wmnet with OS buster
[10:18:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P21931 and previous config saved to /var/cache/conftool/dbconfig/20220307-101824-marostegui.json
[10:18:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:19:20] <wikibugs>	 (03CR) 10Jcrespo: "Note the Yandex disclaimer: "If you chose DNS as your verification method, it may take up to 72 hours (three days) to verify your domain"" [dns] - 10https://gerrit.wikimedia.org/r/768664 (https://phabricator.wikimedia.org/T302617) (owner: 10SCherukuwada)
[10:20:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21932 and previous config saved to /var/cache/conftool/dbconfig/20220307-102054-marostegui.json
[10:20:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:20:57] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[10:21:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21933 and previous config saved to /var/cache/conftool/dbconfig/20220307-102129-ladsgroup.json
[10:21:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:33] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[10:21:52] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
[10:21:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
[10:21:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21934 and previous config saved to /var/cache/conftool/dbconfig/20220307-102158-ladsgroup.json
[10:22:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:22:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P21935 and previous config saved to /var/cache/conftool/dbconfig/20220307-102209-marostegui.json
[10:22:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:23:08] <wikibugs>	 (03CR) 10SCherukuwada: Add Yandex's TXT verification entry to www. (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/768664 (https://phabricator.wikimedia.org/T302617) (owner: 10SCherukuwada)
[10:26:57] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes1001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[10:27:29] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10dom_walden) >>! In T302699#7740763, @dom_walden wrote: > ` > AH00288: scoreboard is full, not at MaxRequestWorker...
[10:27:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21936 and previous config saved to /var/cache/conftool/dbconfig/20220307-102730-ladsgroup.json
[10:27:32] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[10:27:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[10:27:34] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[10:27:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21937 and previous config saved to /var/cache/conftool/dbconfig/20220307-102737-ladsgroup.json
[10:27:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:09] <wikibugs>	 10SRE, 10Patch-For-Review: migrate services from cumin2001 to cumin2002 - https://phabricator.wikimedia.org/T276589 (10Kormat) >>! In T276589#7756067, @Marostegui wrote: >>>! In T276589#7755980, @gerritbot wrote: >> Change 768657 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff): >> %%%[op...
[10:31:09] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch cumin2001 to insetup role [puppet] - 10https://gerrit.wikimedia.org/r/768670 (https://phabricator.wikimedia.org/T276589)
[10:31:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.reimage for host db1143.eqiad.wmnet with OS bullseye
[10:31:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:18] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp4036 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768671 (https://phabricator.wikimedia.org/T290005)
[10:32:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21938 and previous config saved to /var/cache/conftool/dbconfig/20220307-103253-ladsgroup.json
[10:32:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:58] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[10:33:14] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/768657 (https://phabricator.wikimedia.org/T276589) (owner: 10Muehlenhoff)
[10:33:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21939 and previous config saved to /var/cache/conftool/dbconfig/20220307-103323-root.json
[10:33:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:54] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp4036 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768671 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[10:34:04] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
[10:34:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:34:08] <wikibugs>	 (03CR) 10Jcrespo: Add Yandex's TXT verification entry to www. (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/768664 (https://phabricator.wikimedia.org/T302617) (owner: 10SCherukuwada)
[10:34:17] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:34:33] <jayme>	 !log (re)started ferm on kubernetes1001
[10:34:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:35:34] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp4036.ulsfo.wmnet with OS buster
[10:35:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:35:47] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4036.ulsfo.wmnet with OS buster
[10:37:31] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
[10:37:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:28] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch TGC same site cookie to strict [puppet] - 10https://gerrit.wikimedia.org/r/768673
[10:43:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1143.eqiad.wmnet with reason: host reimage
[10:43:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:45:38] <icinga-wm>	 PROBLEM - SSH on analytics1067.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:46:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1143.eqiad.wmnet with reason: host reimage
[10:46:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:46:49] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] prometheus:rules_ops: Provide HAProxy total responses metrics [puppet] - 10https://gerrit.wikimedia.org/r/768056 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[10:46:56] <wikibugs>	 10SRE: Domain Ownership Verification on Various Search Properties - https://phabricator.wikimedia.org/T302617 (10SCherukuwada) It got a bit trickier.  You can't add a TXT entry for www when www exists as a CNAME. And it might not even help in that case. Here's why:   On Yandex, when you request verification for...
[10:48:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21940 and previous config saved to /var/cache/conftool/dbconfig/20220307-104759-ladsgroup.json
[10:48:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:48:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21941 and previous config saved to /var/cache/conftool/dbconfig/20220307-104826-root.json
[10:48:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:49:00] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[10:49:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:49:01] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[10:49:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:49:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21942 and previous config saved to /var/cache/conftool/dbconfig/20220307-104906-marostegui.json
[10:49:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:49:10] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[10:49:28] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove LDAP access for coreyfloyd [puppet] - 10https://gerrit.wikimedia.org/r/768677
[10:49:32] <icinga-wm>	 PROBLEM - Host ripe-atlas-esams IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[10:51:48] <icinga-wm>	 PROBLEM - Host ripe-atlas-esams is DOWN: PING CRITICAL - Packet loss = 100%
[10:52:07] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp4036.ulsfo.wmnet with reason: host reimage
[10:52:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:55:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove LDAP access for coreyfloyd [puppet] - 10https://gerrit.wikimedia.org/r/768677 (owner: 10Muehlenhoff)
[10:55:33] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4036.ulsfo.wmnet with reason: host reimage
[10:55:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:56:57] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes1001 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[10:59:20] <vgutierrez>	 !log pool cp1084 with HAProxy as TLS termination layer - T290005
[10:59:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:59:23] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[11:00:46] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1084.eqiad.wmnet with OS buster
[11:00:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:00:58] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp1084.eqiad.wmnet with OS buster c...
[11:02:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1143.eqiad.wmnet with OS bullseye
[11:02:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:02:27] <wikibugs>	 (03PS1) 10MSantos: WIP: introduce geoshapes service [deployment-charts] - 10https://gerrit.wikimedia.org/r/768678
[11:03:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21943 and previous config saved to /var/cache/conftool/dbconfig/20220307-110304-ladsgroup.json
[11:03:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:03:19] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "forgot to add the response message" [software/spicerack] - 10https://gerrit.wikimedia.org/r/765480 (https://phabricator.wikimedia.org/T293209) (owner: 10Filippo Giunchedi)
[11:03:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21944 and previous config saved to /var/cache/conftool/dbconfig/20220307-110330-root.json
[11:03:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:04:25] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp5016 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768679 (https://phabricator.wikimedia.org/T290005)
[11:06:00] <wikibugs>	 (03PS1) 10Kosta Harlan: GrowthExperiments: Add image experiment for fa/fr/pt/trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768680 (https://phabricator.wikimedia.org/T302828)
[11:08:00] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp5016 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768679 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[11:08:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21945 and previous config saved to /var/cache/conftool/dbconfig/20220307-110823-marostegui.json
[11:08:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:08:27] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[11:10:12] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp5016.eqsin.wmnet with OS buster
[11:10:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:10:23] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp5016.eqsin.wmnet with OS buster
[11:11:19] <wikibugs>	 (03PS1) 10Elukey: WIP - calico,cfssl-issuer,knative-serving: fix dependencies [deployment-charts] - 10https://gerrit.wikimedia.org/r/768681
[11:12:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP - calico,cfssl-issuer,knative-serving: fix dependencies [deployment-charts] - 10https://gerrit.wikimedia.org/r/768681 (owner: 10Elukey)
[11:12:42] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10AlexisJazz) >>! In T302699#7756154, @dom_walden wrote: >>>! In T302699#7740763, @dom_walden wrote: >> ` >> AH0028...
[11:12:50] <vgutierrez>	 !log pool cp4036 with HAProxy as TLS termination layer - T290005
[11:12:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:12:53] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[11:14:21] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10Majavah)
[11:17:25] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp3060 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768682 (https://phabricator.wikimedia.org/T290005)
[11:17:39] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10AlexisJazz) @Majavah are you sure T303165 is a dupe? That task is about api.php (and nothing else!) **consistentl...
[11:18:00] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4036.ulsfo.wmnet with OS buster
[11:18:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21946 and previous config saved to /var/cache/conftool/dbconfig/20220307-111809-ladsgroup.json
[11:18:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[11:18:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:12] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[11:18:12] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[11:18:12] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4036.ulsfo.wmnet with OS buster c...
[11:18:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21947 and previous config saved to /var/cache/conftool/dbconfig/20220307-111816-ladsgroup.json
[11:18:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:20] <wikibugs>	 (03PS1) 10Jelto: gitlab_runner: add gitlab-runner to docker group, change folder permissions [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481)
[11:18:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21948 and previous config saved to /var/cache/conftool/dbconfig/20220307-111834-root.json
[11:18:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:19:57] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10Majavah) >>! In T302699#7756393, @AlexisJazz wrote: > @Majavah are you sure T303165 is a dupe? That task is about...
[11:20:00] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp3060 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768682 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[11:20:42] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp3060.esams.wmnet with OS buster
[11:20:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:55] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp3060.esams.wmnet with OS buster
[11:21:15] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34091/console" [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[11:22:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21949 and previous config saved to /var/cache/conftool/dbconfig/20220307-112207-ladsgroup.json
[11:22:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:22:11] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[11:23:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21950 and previous config saved to /var/cache/conftool/dbconfig/20220307-112307-ladsgroup.json
[11:23:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21951 and previous config saved to /var/cache/conftool/dbconfig/20220307-112328-marostegui.json
[11:23:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:01] <wikibugs>	 (03PS2) 10Ladsgroup: db1142: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768650 (https://phabricator.wikimedia.org/T302950) (owner: 10Gerrit maintenance bot)
[11:28:13] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] db1142: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/768650 (https://phabricator.wikimedia.org/T302950) (owner: 10Gerrit maintenance bot)
[11:29:30] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
[11:29:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:30:55] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
[11:30:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:32:15] <wikibugs>	 (03PS1) 10Btullis: Failover the active hive services to the standby server [dns] - 10https://gerrit.wikimedia.org/r/768686 (https://phabricator.wikimedia.org/T303168)
[11:33:18] <wikibugs>	 (03PS2) 10Jelto: gitlab_runner: add gitlab-runner to docker group, change folder permissions [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481)
[11:35:23] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10AlexisJazz) >>! In T302699#7756404, @Majavah wrote: >>>! In T302699#7756393, @AlexisJazz wrote: >> @Majavah are y...
[11:35:35] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez)
[11:35:39] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10LSobanski) dumpsdata1007 is running Bullseye BTW for anyone else watching from the sidelines.
[11:36:23] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez)
[11:36:42] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
[11:36:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21952 and previous config saved to /var/cache/conftool/dbconfig/20220307-113712-ladsgroup.json
[11:37:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:21] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Failover the active hive services to the standby server [dns] - 10https://gerrit.wikimedia.org/r/768686 (https://phabricator.wikimedia.org/T303168) (owner: 10Btullis)
[11:38:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21953 and previous config saved to /var/cache/conftool/dbconfig/20220307-113811-ladsgroup.json
[11:38:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:38:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21954 and previous config saved to /var/cache/conftool/dbconfig/20220307-113833-marostegui.json
[11:38:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:40:04] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
[11:40:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:41:19] <wikibugs>	 (03PS1) 10Ladsgroup: dbtools: Add db_maint_mapper_sal.py [software] - 10https://gerrit.wikimedia.org/r/768687
[11:41:54] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10Vgutierrez) in this case a 502 is emitted by ats-backend cause it isn't able to reach its backend server. The 503...
[11:45:42] <XioNoX>	 !log remove MTU1400 on drmrs GTT links
[11:45:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:45:57] <icinga-wm>	 RECOVERY - SSH on analytics1067.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:48:15] <wikibugs>	 (03PS3) 10Vgutierrez: prometheus:rules_global: Provide HAProxy availability metrics [puppet] - 10https://gerrit.wikimedia.org/r/768057
[11:48:36] <wikibugs>	 (03CR) 10Vgutierrez: prometheus:rules_global: Provide HAProxy availability metrics (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/768057 (owner: 10Vgutierrez)
[11:49:48] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
[11:49:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:35] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:52:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21955 and previous config saved to /var/cache/conftool/dbconfig/20220307-115217-ladsgroup.json
[11:52:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:52:39] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:53:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21956 and previous config saved to /var/cache/conftool/dbconfig/20220307-115316-ladsgroup.json
[11:53:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:53:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21957 and previous config saved to /var/cache/conftool/dbconfig/20220307-115337-marostegui.json
[11:53:39] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[11:53:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:53:40] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[11:53:41] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[11:53:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:53:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:16] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
[11:54:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:58:13] <wikibugs>	 (03PS3) 10Volans: sre.hosts.provision: always set the BiosBootSeq [cookbooks] - 10https://gerrit.wikimedia.org/r/767074
[11:59:24] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[11:59:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:59:26] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[11:59:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:25] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10netbox: puppet lookup causes spurious puppetdb entries - https://phabricator.wikimedia.org/T303170 (10jbond) p:05Triage→03Low
[12:02:40] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "SGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768680 (https://phabricator.wikimedia.org/T302828) (owner: 10Kosta Harlan)
[12:03:14] <vgutierrez>	 !log pool cp5016 with HAProxy as TLS termination layer - T290005
[12:03:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:03:18] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[12:03:54] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5016.eqsin.wmnet with OS buster
[12:03:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:04:06] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp5016.eqsin.wmnet with OS buster c...
[12:05:26] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[12:05:28] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[12:05:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:05:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:05:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21958 and previous config saved to /var/cache/conftool/dbconfig/20220307-120532-marostegui.json
[12:05:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:05:35] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[12:06:52] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/api-gateway: sync
[12:07:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:11] <wikibugs>	 (03CR) 10Volans: [C: 03+2] sre.hosts.provision: always set the BiosBootSeq [cookbooks] - 10https://gerrit.wikimedia.org/r/767074 (owner: 10Volans)
[12:07:13] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
[12:07:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:22] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp2037 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768689 (https://phabricator.wikimedia.org/T290005)
[12:07:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21959 and previous config saved to /var/cache/conftool/dbconfig/20220307-120722-ladsgroup.json
[12:07:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:25] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[12:08:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21960 and previous config saved to /var/cache/conftool/dbconfig/20220307-120821-ladsgroup.json
[12:08:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[12:08:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:08:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[12:08:24] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[12:08:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:08:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:10:03] <XioNoX>	 !log reboot cr2-drmrs for software upgrade
[12:10:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:10:10] <wikibugs>	 (03Merged) 10jenkins-bot: sre.hosts.provision: always set the BiosBootSeq [cookbooks] - 10https://gerrit.wikimedia.org/r/767074 (owner: 10Volans)
[12:10:18] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21961 and previous config saved to /var/cache/conftool/dbconfig/20220307-121018-marostegui.json
[12:10:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:10:26] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] Add IPInfo viewing rights for certain groups (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/766882 (https://phabricator.wikimedia.org/T296499) (owner: 10STran)
[12:11:16] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[12:11:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:18] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[12:11:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1166 (T300775)', diff saved to https://phabricator.wikimedia.org/P21962 and previous config saved to /var/cache/conftool/dbconfig/20220307-121122-marostegui.json
[12:11:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:25] <stashbot>	 T300775: Add tl_target_id column to templatelinks - https://phabricator.wikimedia.org/T300775
[12:11:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[12:11:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:37] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[12:11:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:12:37] <icinga-wm>	 PROBLEM - BGP status on cr3-eqsin is CRITICAL: BGP CRITICAL - No response from remote host 103.102.166.131 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:12:56] <wikibugs>	 10SRE, 10Observability-Metrics, 10Traffic: Port Traffic dashboards to Thanos - https://phabricator.wikimedia.org/T302266 (10MMandere)
[12:13:50] <vgutierrez>	 !log pool cp3060 with HAProxy as TLS termination layer - T290005
[12:13:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:13:53] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[12:14:37] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[12:14:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:39] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[12:14:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21963 and previous config saved to /var/cache/conftool/dbconfig/20220307-121443-ladsgroup.json
[12:14:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:46] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[12:15:07] <icinga-wm>	 PROBLEM - BFD status on cr2-eqdfw is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:15:07] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on alert1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[12:16:07] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[12:17:07] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:17:25] <XioNoX>	 that's expected ^ (cr2-drmrs upgrade)
[12:17:33] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[12:17:34] <vgutierrez>	 ack
[12:18:15] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3060.esams.wmnet with OS buster
[12:18:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:18:28] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp3060.esams.wmnet with OS buster c...
[12:18:35] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[12:18:55] <wikibugs>	 (03CR) 10Kosta Harlan: GrowthExperiments: Add image experiment for fa/fr/pt/trwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768680 (https://phabricator.wikimedia.org/T302828) (owner: 10Kosta Harlan)
[12:19:33] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp2037 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768689 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[12:19:43] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: OK: UP: 24 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:19:45] <wikibugs>	 (03CR) 10Urbanecm: "Security review passed (T260822), but perf review (T260821) is currently opened. Per https://www.mediawiki.org/wiki/Writing_an_extension_f" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767216 (https://phabricator.wikimedia.org/T260598) (owner: 10Tchanders)
[12:19:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21964 and previous config saved to /var/cache/conftool/dbconfig/20220307-121958-ladsgroup.json
[12:20:00] <wikibugs>	 (03PS1) 10Btullis: Move some common resources to the opensearch::server profile [puppet] - 10https://gerrit.wikimedia.org/r/768702 (https://phabricator.wikimedia.org/T301382)
[12:20:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:02] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[12:20:07] <icinga-wm>	 RECOVERY - BFD status on cr2-eqdfw is OK: OK: UP: 13 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:20:15] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp2037.codfw.wmnet with OS buster
[12:20:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:28] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp2037.codfw.wmnet with OS buster
[12:22:07] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] Autopromote-once users to the 'ipinfo' group after one edit (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767845 (https://phabricator.wikimedia.org/T296184) (owner: 10Tchanders)
[12:25:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21965 and previous config saved to /var/cache/conftool/dbconfig/20220307-122523-marostegui.json
[12:25:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:29] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-2] "Per Sammy. -2'ing to prevent accidential merge. IMO, most important thing is to have a test plan (how and when to evaluate whether this wa" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767912 (https://phabricator.wikimedia.org/T43479) (owner: 10Samtar)
[12:32:21] <icinga-wm>	 PROBLEM - Host cp1090.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:33:52] <wikibugs>	 10SRE, 10Discovery-Search (Current work), 10Patch-For-Review: /var/run/elasticsearch deleted by elasticsearch - https://phabricator.wikimedia.org/T276198 (10Gehel) 05Open→03Resolved
[12:35:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21966 and previous config saved to /var/cache/conftool/dbconfig/20220307-123503-ladsgroup.json
[12:35:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:36:29] <wikibugs>	 (03CR) 10Btullis: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34093/console" [puppet] - 10https://gerrit.wikimedia.org/r/768702 (https://phabricator.wikimedia.org/T301382) (owner: 10Btullis)
[12:37:49] <XioNoX>	 !log restart cr1-drmrs for software upgrade
[12:37:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:38:15] <icinga-wm>	 RECOVERY - Host cp1090.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.02 ms
[12:38:42] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
[12:38:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:05] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "looks good" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/747186 (https://phabricator.wikimedia.org/T203941) (owner: 10Kosta Harlan)
[12:40:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21967 and previous config saved to /var/cache/conftool/dbconfig/20220307-124028-marostegui.json
[12:40:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:32] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
[12:41:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:42:50] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[12:45:02] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:46:26] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[12:46:37] <wikibugs>	 (03PS1) 10Ayounsi: drmrs: add ORIGIN for v6 PTR LVS [dns] - 10https://gerrit.wikimedia.org/r/768709
[12:48:09] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
[12:48:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:11] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
[12:48:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21968 and previous config saved to /var/cache/conftool/dbconfig/20220307-124815-ladsgroup.json
[12:48:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:18] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[12:48:22] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, thanks for the fix" [dns] - 10https://gerrit.wikimedia.org/r/768709 (owner: 10Ayounsi)
[12:48:49] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] drmrs: add ORIGIN for v6 PTR LVS [dns] - 10https://gerrit.wikimedia.org/r/768709 (owner: 10Ayounsi)
[12:49:47] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics@46d88a2]: Migrate wikidata/item_page_link/weekly
[12:49:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:55] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics@46d88a2]: Migrate wikidata/item_page_link/weekly (duration: 00m 07s)
[12:49:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21969 and previous config saved to /var/cache/conftool/dbconfig/20220307-125007-ladsgroup.json
[12:50:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:15] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm thx" [puppet] - 10https://gerrit.wikimedia.org/r/768659 (owner: 10Jcrespo)
[12:51:54] <wikibugs>	 (03PS1) 104nn1l2: etwikiquote: Update logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768710 (https://phabricator.wikimedia.org/T302683)
[12:53:17] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.reimage for host db1142.eqiad.wmnet with OS bullseye
[12:53:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21970 and previous config saved to /var/cache/conftool/dbconfig/20220307-125532-marostegui.json
[12:55:34] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[12:55:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:36] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[12:55:36] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[12:55:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21971 and previous config saved to /var/cache/conftool/dbconfig/20220307-125540-marostegui.json
[12:55:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:58:58] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] puppet: Print nodes that change on every puppet run, sorted [puppet] - 10https://gerrit.wikimedia.org/r/768659 (owner: 10Jcrespo)
[13:00:03] <wikibugs>	 (03PS1) 10Btullis: Failback the hive services to an-coord1001 [dns] - 10https://gerrit.wikimedia.org/r/768712 (https://phabricator.wikimedia.org/T303168)
[13:03:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21972 and previous config saved to /var/cache/conftool/dbconfig/20220307-130326-marostegui.json
[13:03:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:30] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[13:05:05] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1142.eqiad.wmnet with reason: host reimage
[13:05:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21973 and previous config saved to /var/cache/conftool/dbconfig/20220307-130512-ladsgroup.json
[13:05:14] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[13:05:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:15] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[13:05:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[13:05:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21974 and previous config saved to /var/cache/conftool/dbconfig/20220307-130520-ladsgroup.json
[13:05:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:07:52] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1142.eqiad.wmnet with reason: host reimage
[13:07:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:08:24] <wikibugs>	 (03CR) 10Jcrespo: "They are now showing in order:" [puppet] - 10https://gerrit.wikimedia.org/r/768659 (owner: 10Jcrespo)
[13:08:26] <wikibugs>	 (03CR) 10Jbond: "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/768662 (owner: 10Majavah)
[13:09:51] <aqu_>	 !log About to deploy analytics/refinery - Migrate wikidata/item_page_link/weekly from Oozie to Airflow
[13:09:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:30] <wikibugs>	 (03PS3) 10Jelto: gitlab_runner: add gitlab-runner to docker group, change folder permissions [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481)
[13:11:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21975 and previous config saved to /var/cache/conftool/dbconfig/20220307-131100-ladsgroup.json
[13:11:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:03] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[13:12:03] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@51d074b]: Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
[13:12:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:12:16] <wikibugs>	 10SRE, 10Traffic-Icebox, 10Performance-Team (Radar): Consider collecting more timestamp milestones from ATS-TLS - https://phabricator.wikimedia.org/T265869 (10Aklapper) a:05ema→03None Resetting inactive assignee
[13:12:17] <wikibugs>	 10SRE, 10Traffic-Icebox, 10SecTeam-Processed: Consider removing X-Wikimedia-Security-Audit VCL support - https://phabricator.wikimedia.org/T229320 (10Aklapper) a:05ema→03None Resetting inactive assignee
[13:12:37] <wikibugs>	 10SRE, 10Pybal, 10Traffic-Icebox: pybal's "can-depool" logic only takes downServers into account - https://phabricator.wikimedia.org/T184715 (10Aklapper) a:05ema→03None Resetting inactive assignee
[13:12:58] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34094/console" [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[13:16:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1096 (s5,s6)', diff saved to https://phabricator.wikimedia.org/P21976 and previous config saved to /var/cache/conftool/dbconfig/20220307-131606-marostegui.json
[13:16:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:16] <wikibugs>	 (03PS1) 104nn1l2: fawiki: Disable creating community books and remove "Create a book" link from sidebar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768718 (https://phabricator.wikimedia.org/T303173)
[13:18:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21977 and previous config saved to /var/cache/conftool/dbconfig/20220307-131830-marostegui.json
[13:18:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:18:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21978 and previous config saved to /var/cache/conftool/dbconfig/20220307-131857-root.json
[13:18:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:21:30] <wikibugs>	 10SRE, 10Traffic-Icebox, 10Performance-Team (Radar): Consider collecting more timestamp milestones from ATS-TLS - https://phabricator.wikimedia.org/T265869 (10Krinkle) 05Open→03Resolved a:03Krinkle
[13:21:37] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 (10Krinkle)
[13:22:12] <wikibugs>	 10SRE, 10Traffic-Icebox, 10Performance-Team (Radar): Consider collecting more timestamp milestones from ATS-TLS - https://phabricator.wikimedia.org/T265869 (10Krinkle) a:05Krinkle→03ema
[13:22:28] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "@Jbond this change adds gitlab-runner user to the docker group by setting the id and primary group. However the id needs to be numeric and" [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[13:22:34] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1142.eqiad.wmnet with OS bullseye
[13:22:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:24] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations, 10User-jbond: facter3: use structured facts - https://phabricator.wikimedia.org/T222160 (10jbond) 05Declined→03Open p:05Medium→03Low Re-Opening this task as it seems from FACT-2913 and https://github.com/puppetlabs/puppet/pull/8868#issuecomment-1059388...
[13:25:30] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations, 10Packaging, and 2 others: upgrade facter and puppet across the fleet - https://phabricator.wikimedia.org/T219803 (10jbond)
[13:26:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21979 and previous config saved to /var/cache/conftool/dbconfig/20220307-132605-ladsgroup.json
[13:26:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:27] <wikibugs>	 (03CR) 10Kormat: Refactor check_mariadb_backups.py and add enough tests for it (031 comment) [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/767844 (owner: 10Jcrespo)
[13:33:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21980 and previous config saved to /var/cache/conftool/dbconfig/20220307-133335-marostegui.json
[13:33:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21981 and previous config saved to /var/cache/conftool/dbconfig/20220307-133400-root.json
[13:34:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:56] <wikibugs>	 (03CR) 10Jcrespo: "Thank you." [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/767844 (owner: 10Jcrespo)
[13:35:09] <wikibugs>	 (03CR) 10Bartosz Dziewoński: Enable reply tool by default on enwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/758988 (https://phabricator.wikimedia.org/T296645) (owner: 10Esanders)
[13:35:16] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: Enable reply tool by default on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/758988 (https://phabricator.wikimedia.org/T296645) (owner: 10Esanders)
[13:36:39] <wikibugs>	 (03CR) 10Kormat: Refactor check_mariadb_backups.py and add enough tests for it (031 comment) [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/767844 (owner: 10Jcrespo)
[13:37:07] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@51d074b]: Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 25m 04s)
[13:37:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:37:28] <icinga-wm>	 PROBLEM - SSH on kubernetes2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:39:28] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@51d074b] (thin): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
[13:39:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:39:36] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@51d074b] (thin): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 00m 08s)
[13:39:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:39:55] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@51d074b] (hadoop-test): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
[13:39:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:40:56] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job gitlab in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[13:41:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21982 and previous config saved to /var/cache/conftool/dbconfig/20220307-134109-ladsgroup.json
[13:41:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:41:41] <wikibugs>	 (03PS13) 10Filippo Giunchedi: Introduce 'alertmanager' and 'alerting' modules [software/spicerack] - 10https://gerrit.wikimedia.org/r/765480 (https://phabricator.wikimedia.org/T293209)
[13:43:38] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Thanks for the review, with the last PS I was able to create a silence as expected (with 'alertmanager.py' in my home on cumin1001)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/765480 (https://phabricator.wikimedia.org/T293209) (owner: 10Filippo Giunchedi)
[13:44:05] <wikibugs>	 (03CR) 10Jbond: "see comments" [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[13:46:24] <wikibugs>	 (03PS3) 10Jcrespo: Refactor check_mariadb_backups.py and add enough tests for it [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/767844
[13:46:26] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:47:13] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@51d074b] (hadoop-test): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 07m 17s)
[13:47:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21983 and previous config saved to /var/cache/conftool/dbconfig/20220307-134715-ladsgroup.json
[13:47:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:18] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[13:48:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21984 and previous config saved to /var/cache/conftool/dbconfig/20220307-134840-marostegui.json
[13:48:42] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[13:48:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:43] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[13:48:43] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[13:48:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P21985 and previous config saved to /var/cache/conftool/dbconfig/20220307-134848-marostegui.json
[13:48:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21986 and previous config saved to /var/cache/conftool/dbconfig/20220307-134904-root.json
[13:49:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:44] <wikibugs>	 (03PS4) 10Jelto: gitlab_runner: add gitlab-runner to docker group, change folder permissions [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481)
[13:51:47] <wikibugs>	 (03PS4) 10Jcrespo: Refactor check_mariadb_backups.py and add enough tests for it [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/767844 (https://phabricator.wikimedia.org/T138562)
[13:52:09] <wikibugs>	 (03PS2) 10Jbond: utils: create blame-stats script [puppet] - 10https://gerrit.wikimedia.org/r/768114 (https://phabricator.wikimedia.org/T67270)
[13:52:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] utils: create blame-stats script [puppet] - 10https://gerrit.wikimedia.org/r/768114 (https://phabricator.wikimedia.org/T67270) (owner: 10Jbond)
[13:56:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21987 and previous config saved to /var/cache/conftool/dbconfig/20220307-135614-ladsgroup.json
[13:56:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:56:19] <stashbot>	 T300992: Add linter_template and linter_tag columns to the Linter table - https://phabricator.wikimedia.org/T300992
[13:59:16] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] Remove cumin2001 from mysql root clients and related grants [puppet] - 10https://gerrit.wikimedia.org/r/768657 (https://phabricator.wikimedia.org/T276589) (owner: 10Muehlenhoff)
[13:59:51] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
[13:59:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, and Urbanecm: How many deployers does it take to do UTC afternoon backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220307T1400).
[14:00:05] <jouncebot>	 cscott, Juan_90264, nn1l2, and MatmaRex: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:09] <nn1l2>	 hi
[14:00:09] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
[14:00:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:13] <vgutierrez>	 !log pool cp2037 with HAProxy as TLS termination layer - T290005
[14:00:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:16] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[14:00:24] <MatmaRex>	 hello
[14:00:29] <kormat>	 !log removing cumin2001 grants from all db sections T276589
[14:00:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:31] <stashbot>	 T276589: migrate services from cumin2001 to cumin2002 - https://phabricator.wikimedia.org/T276589
[14:00:47] <urbanecm>	 i can deploy today (unless someone else wants to)
[14:01:14] <icinga-wm>	 RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:01:34] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] etwikiquote: Update logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768710 (https://phabricator.wikimedia.org/T302683) (owner: 104nn1l2)
[14:02:01] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
[14:02:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:15] <wikibugs>	 (03Merged) 10jenkins-bot: etwikiquote: Update logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768710 (https://phabricator.wikimedia.org/T302683) (owner: 104nn1l2)
[14:02:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21988 and previous config saved to /var/cache/conftool/dbconfig/20220307-140219-ladsgroup.json
[14:02:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:56] <urbanecm>	 nn1l2: pulled the logo to mwdebug1001, can you check?
[14:02:58] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2037.codfw.wmnet with OS buster
[14:02:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:01] <nn1l2>	 ok
[14:03:14] * urbanecm doesn't see cscott or Juan_90264 here, so I'll skip the patches
[14:03:20] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp2037.codfw.wmnet with OS buster c...
[14:03:31] <wikibugs>	 (03PS1) 10Jbond: varnish: Rate limit hotlinking [puppet] - 10https://gerrit.wikimedia.org/r/768723
[14:04:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21989 and previous config saved to /var/cache/conftool/dbconfig/20220307-140408-root.json
[14:04:09] <nn1l2>	 LGTM
[14:04:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:04:52] <urbanecm>	 nn1l2: syncing
[14:05:08] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] fawiki: Disable creating community books and remove "Create a book" link from sidebar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768718 (https://phabricator.wikimedia.org/T303173) (owner: 104nn1l2)
[14:05:16] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
[14:05:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:05:46] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
[14:05:47] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp1085 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768724 (https://phabricator.wikimedia.org/T290005)
[14:05:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:05:51] <wikibugs>	 (03Merged) 10jenkins-bot: fawiki: Disable creating community books and remove "Create a book" link from sidebar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768718 (https://phabricator.wikimedia.org/T303173) (owner: 104nn1l2)
[14:06:48] <wikibugs>	 (03CR) 10Jelto: "If code changes are needed in systemd::sysuser, I would prefer to implement a additional_groups parameter in a related change. I'll upload" [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[14:07:07] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized static/images/project-logos/: 8619f5933966071cdb39097a3e0d38fdead40b66: etwikiquote: Update logo (T302683; 1/3) (duration: 00m 50s)
[14:07:09] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Q2:(Need By: TBD) rack/setup/install ms-fe1009-1012 - https://phabricator.wikimedia.org/T294137 (10MatthewVernon) Thanks. Yes, the ms-fe* nodes will end up behind LVS; but they're not in service at that point. So from my POV, whenever you (or DC team) are...
[14:07:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:10] <stashbot>	 T302683: Requesting logo change for et.wikiquote.org - https://phabricator.wikimedia.org/T302683
[14:07:48] <icinga-wm>	 PROBLEM - Check systemd state on ml-serve2004 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:07:52] <urbanecm>	 !log Purge https://en.wikipedia.org/static/images/project-logos/etwikiquote.png (T302683)
[14:07:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:57] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/logos.php: 8619f5933966071cdb39097a3e0d38fdead40b66: etwikiquote: Update logo (T302683; 2/3) (duration: 00m 49s)
[14:07:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:08:21] <urbanecm>	 MatmaRex: just double checking, it's okay to ignore Ed's -1 on your patch, right?
[14:08:25] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp1085 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768724 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[14:08:25] <urbanecm>	 sounds to be about scheduling only
[14:08:32] <MatmaRex>	 urbanecm: yeah
[14:08:36] <urbanecm>	 okay
[14:08:46] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized logos/config.yaml: 8619f5933966071cdb39097a3e0d38fdead40b66: etwikiquote: Update logo (T302683; 3/3) (duration: 00m 49s)
[14:08:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:08:54] <MatmaRex>	 urbanecm: i wanted to remove that -1 vote, but i can't
[14:09:03] <urbanecm>	 yeah, only those with +2 and voters can :)
[14:09:05] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp1085.eqiad.wmnet with OS buster
[14:09:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:23] <urbanecm>	 nn1l2: should be live
[14:09:24] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp1085.eqiad.wmnet with OS buster
[14:09:28] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
[14:09:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:39] <nn1l2>	 Thanks!
[14:09:40] <urbanecm>	 nn1l2: your second patch is at mwdebug1001, please test
[14:09:46] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:09:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:53] <wikibugs>	 (03PS3) 10Urbanecm: Enable reply tool by default on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/758988 (https://phabricator.wikimedia.org/T296645) (owner: 10Esanders)
[14:10:12] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "Ignoring Ed's scheduling -1 per MatmaRex. Deploying." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/758988 (https://phabricator.wikimedia.org/T296645) (owner: 10Esanders)
[14:10:38] <wikibugs>	 10SRE, 10Patch-For-Review: migrate services from cumin2001 to cumin2002 - https://phabricator.wikimedia.org/T276589 (10Kormat) Granted removed: [] es1 [] es2 [] es3 [x] es4 [x] es5 [x] m1 (root@10.% seems to supersede it?) [x] m2 (root@10.% seems to supersede it?) [] m3 [] m5 [] s1 [] s2 [] s3 [] s4 [] s5 [] s...
[14:10:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:10:57] <wikibugs>	 (03Merged) 10jenkins-bot: Enable reply tool by default on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/758988 (https://phabricator.wikimedia.org/T296645) (owner: 10Esanders)
[14:10:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:10:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:50] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:11:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:12:45] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp4030 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768729 (https://phabricator.wikimedia.org/T290005)
[14:13:04] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
[14:13:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:25] <urbanecm>	 nn1l2: how is the testing going?
[14:13:41] <nn1l2>	 I did not recieve ping!
[14:13:49] <nn1l2>	 I will test it now!
[14:14:12] <urbanecm>	 okay :)
[14:14:22] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp4030 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768729 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[14:14:37] <nn1l2>	 It's okay
[14:14:43] <nn1l2>	 Good to go
[14:14:54] <urbanecm>	 syncing
[14:15:03] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
[14:15:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:20] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp4030.ulsfo.wmnet with OS buster
[14:15:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:33] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4030.ulsfo.wmnet with OS buster
[14:16:06] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 8f20ec9f5bd7f507580d8e8860116e3b1842ac9a: fawiki: Disable creating community books and remove "Create a book" link from sidebar (T303173) (duration: 00m 49s)
[14:16:07] <wikibugs>	 (03PS1) 10Volans: prospector: update config for latest version [software/homer] - 10https://gerrit.wikimedia.org/r/768731
[14:16:09] <wikibugs>	 (03PS1) 10Volans: homer: expand user paths when reading ssh_config [software/homer] - 10https://gerrit.wikimedia.org/r/768732
[14:16:11] <urbanecm>	 should be live nn1l2 
[14:16:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:14] <stashbot>	 T303173: Disable creating community books and remove "Create a book" link from the sidebar on Farsi Wikipedia - https://phabricator.wikimedia.org/T303173
[14:16:21] <urbanecm>	 MatmaRex: your patch is at mwdebug1001
[14:16:23] <urbanecm>	 can you check?
[14:16:45] <MatmaRex>	 looking
[14:16:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:16:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21990 and previous config saved to /var/cache/conftool/dbconfig/20220307-141724-ladsgroup.json
[14:17:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:28] <MatmaRex>	 urbanecm: looks fine!
[14:17:30] <nn1l2>	 Thanks again!
[14:17:34] <urbanecm>	 syncing!
[14:17:35] <urbanecm>	 np nn1l2 
[14:18:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:18:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:18:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:44] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 64b128459f04514cd0093745d7a83166555449b2: Enable reply tool by default on enwiki (T296645) (duration: 00m 49s)
[14:18:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:47] <stashbot>	 T296645: Config change: Deploy Reply Tool as opt-out preference at en.wiki - https://phabricator.wikimedia.org/T296645
[14:19:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:19:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21991 and previous config saved to /var/cache/conftool/dbconfig/20220307-141911-root.json
[14:19:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21992 and previous config saved to /var/cache/conftool/dbconfig/20220307-141915-root.json
[14:19:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:20:57] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
[14:20:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:03] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
[14:21:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:58] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
[14:23:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:16] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
[14:23:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:27] <Juan_90264>	 Sorry for the delay, shall we deploy?
[14:24:05] <wikibugs>	 (03CR) 10Vgutierrez: varnish: Rate limit hotlinking (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/768723 (owner: 10Jbond)
[14:24:48] <Juan_90264>	 Hello?
[14:25:11] <Juan_90264>	 urbanecm ?
[14:25:30] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
[14:25:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:08] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 verb={CREATE,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[14:26:11] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
[14:26:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:29] <urbanecm>	 Juan_90264: sure, wait on line
[14:27:02] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
[14:27:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:12] <icinga-wm>	 RECOVERY - k8s API server requests latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[14:28:13] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
[14:28:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:28:51] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
[14:28:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:02] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
[14:29:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:15] <wikibugs>	 (03PS1) 10Jbond: R:systemd::sysuser: add support for id => "-:groupname" [puppet] - 10https://gerrit.wikimedia.org/r/768735
[14:29:22] <wikibugs>	 (03PS3) 10Urbanecm: Revert "Change temporary logo for slwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768155 (https://phabricator.wikimedia.org/T302661) (owner: 10Juan90264)
[14:29:27] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Revert "Change temporary logo for slwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768155 (https://phabricator.wikimedia.org/T302661) (owner: 10Juan90264)
[14:29:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] R:systemd::sysuser: add support for id => "-:groupname" [puppet] - 10https://gerrit.wikimedia.org/r/768735 (owner: 10Jbond)
[14:29:56] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34096/console" [puppet] - 10https://gerrit.wikimedia.org/r/768735 (owner: 10Jbond)
[14:30:12] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Change temporary logo for slwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768155 (https://phabricator.wikimedia.org/T302661) (owner: 10Juan90264)
[14:30:30] <urbanecm>	 Juan_90264: please test at mwdebug1001
[14:30:32] <Juan_90264>	 Okay merged
[14:30:56] <moritzm>	 !log rebooting etherpad1003 (running etherpad1003) for kernel update
[14:30:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:30:58] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
[14:31:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:31:05] <MatmaRex>	 (thanks for deploying!)
[14:31:10] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp4030.ulsfo.wmnet with reason: host reimage
[14:31:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:31:17] <Juan_90264>	 urbanecm What's there to test on that, I'm just reversing the logo usage
[14:31:25] <urbanecm>	 yeah, test it was reversed ;)
[14:31:31] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ml-serve2004 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[14:31:59] <wikibugs>	 (03PS2) 10Jbond: R:systemd::sysuser: add support for id => "-:groupname" [puppet] - 10https://gerrit.wikimedia.org/r/768735
[14:32:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21993 and previous config saved to /var/cache/conftool/dbconfig/20220307-143229-ladsgroup.json
[14:32:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:32:33] <stashbot>	 T302950: Upgrade s4 to bullseye - https://phabricator.wikimedia.org/T302950
[14:33:06] <wikibugs>	 (03PS1) 10Btullis: Add a profile specific to datahubsearch servers [puppet] - 10https://gerrit.wikimedia.org/r/768736 (https://phabricator.wikimedia.org/T301382)
[14:33:17] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1003.eqiad.wmnet
[14:33:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:24] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10MoritzMuehlenhoff) megacli hasn't changed since a long time. I also tried perccli, but it also fails, the issue is rather on the kernel driver side.   But I think I have identified the commits which need...
[14:33:52] <wikibugs>	 (03CR) 10Btullis: [V: 03+1] Move some common resources to the opensearch::server profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/768702 (https://phabricator.wikimedia.org/T301382) (owner: 10Btullis)
[14:34:07] <wikibugs>	 (03CR) 10Btullis: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34097/console" [puppet] - 10https://gerrit.wikimedia.org/r/768736 (https://phabricator.wikimedia.org/T301382) (owner: 10Btullis)
[14:34:16] <urbanecm>	 Juan_90264: how is it going?
[14:34:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21994 and previous config saved to /var/cache/conftool/dbconfig/20220307-143419-root.json
[14:34:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:25] <Juan_90264>	 Urbanecm: I tested and approved
[14:34:25] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:34:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:29] <urbanecm>	 syncing
[14:34:32] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4030.ulsfo.wmnet with reason: host reimage
[14:34:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:22] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
[14:35:23] <logmsgbot>	 !log ntsako@deploy1002 Started deploy [airflow-dags/analytics@46d88a2]: (no justification provided)
[14:35:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:27] <logmsgbot>	 !log ntsako@deploy1002 Finished deploy [airflow-dags/analytics@46d88a2]: (no justification provided) (duration: 00m 04s)
[14:35:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:36] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:35:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:35:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:16] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/logos.php: f50c4746c5fa733929b80b036eef4eee84cf17d1: Revert "Change temporary logo for slwiki" (T302661; 1/2) (duration: 00m 49s)
[14:36:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:19] <stashbot>	 T302661: Requesting temporary logo change for sl.wikipedia.org - https://phabricator.wikimedia.org/T302661
[14:36:33] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:36:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:04] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized static/images/project-logos/: f50c4746c5fa733929b80b036eef4eee84cf17d1: Revert "Change temporary logo for slwiki" (T302661; 2/2) (duration: 00m 48s)
[14:37:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:24] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
[14:37:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:32] <icinga-wm>	 RECOVERY - SSH on kubernetes2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:39:17] <Juan_90264>	 Working, thanks Urbanecm for deploying!
[14:40:39] <urbanecm>	 np
[14:42:02] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
[14:42:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:11] <wikibugs>	 (03CR) 10Volans: "Thanks for the fixes! I've tested all the functionalities and all looks good." [software/spicerack] - 10https://gerrit.wikimedia.org/r/765480 (https://phabricator.wikimedia.org/T293209) (owner: 10Filippo Giunchedi)
[14:42:24] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubemaster1001 is CRITICAL: instance=10.64.0.117 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[14:42:56] <wikibugs>	 (03CR) 10Btullis: [V: 03+1] "This may be a bit liberal, opening up port 9200 to all DOMAIN_NETWORKS, but I've put in a parameter so that we can restrict it further lat" [puppet] - 10https://gerrit.wikimedia.org/r/768736 (https://phabricator.wikimedia.org/T301382) (owner: 10Btullis)
[14:43:23] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Failback the hive services to an-coord1001 [dns] - 10https://gerrit.wikimedia.org/r/768712 (https://phabricator.wikimedia.org/T303168) (owner: 10Btullis)
[14:45:32] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] homer: expand user paths when reading ssh_config [software/homer] - 10https://gerrit.wikimedia.org/r/768732 (owner: 10Volans)
[14:45:43] <vgutierrez>	 !log pool cp1085 with HAProxy as TLS termination layer - T290005
[14:45:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:49] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] prospector: update config for latest version [software/homer] - 10https://gerrit.wikimedia.org/r/768731 (owner: 10Volans)
[14:46:24] <wikibugs>	 (03CR) 10Volans: [C: 03+2] prospector: update config for latest version [software/homer] - 10https://gerrit.wikimedia.org/r/768731 (owner: 10Volans)
[14:46:29] <wikibugs>	 (03CR) 10Volans: [C: 03+2] homer: expand user paths when reading ssh_config [software/homer] - 10https://gerrit.wikimedia.org/r/768732 (owner: 10Volans)
[14:46:43] <wikibugs>	 (03PS1) 10Jbond: C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739
[14:46:46] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[14:46:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:46:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[14:46:49] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[14:46:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:46:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:46:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[14:46:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739 (owner: 10Jbond)
[14:48:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P21995 and previous config saved to /var/cache/conftool/dbconfig/20220307-144829-marostegui.json
[14:48:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:48:33] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[14:48:58] <icinga-wm>	 RECOVERY - Check systemd state on build2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:49:00] <icinga-wm>	 RECOVERY - Check systemd state on ml-serve2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:49:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21996 and previous config saved to /var/cache/conftool/dbconfig/20220307-144922-root.json
[14:49:23] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
[14:49:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:33] <wikibugs>	 (03Merged) 10jenkins-bot: prospector: update config for latest version [software/homer] - 10https://gerrit.wikimedia.org/r/768731 (owner: 10Volans)
[14:49:35] <wikibugs>	 (03Merged) 10jenkins-bot: homer: expand user paths when reading ssh_config [software/homer] - 10https://gerrit.wikimedia.org/r/768732 (owner: 10Volans)
[14:50:46] <wikibugs>	 (03CR) 10Joal: [C: 03+1] "Thanks @phuedx - Good for me on the werequest logging side." [puppet] - 10https://gerrit.wikimedia.org/r/765485 (https://phabricator.wikimedia.org/T301238) (owner: 10Phuedx)
[14:53:07] <wikibugs>	 10SRE, 10Patch-For-Review: migrate services from cumin2001 to cumin2002 - https://phabricator.wikimedia.org/T276589 (10Kormat) >>! In T276589#7756918, @Kormat wrote: > Granted removed:  Alright, that should be all the grants cleaned up.
[14:53:13] <wikibugs>	 (03PS2) 10Elukey: calico,cfssl-issuer,knative-serving: fix dependencies [deployment-charts] - 10https://gerrit.wikimedia.org/r/768681
[14:56:27] <vgutierrez>	 !log depool cp1085
[14:56:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:38] <vgutierrez>	 cp1085 is having some issues :/
[14:58:12] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
[14:58:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:59:37] <wikibugs>	 (03PS5) 10Jelto: gitlab_runner: add gitlab-runner to docker group, change folder permissions [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481)
[15:01:45] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
[15:01:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:47] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
[15:01:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:55] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183 (10Vgutierrez)
[15:02:04] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
[15:02:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:06] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
[15:02:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:08] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ml-serve2004 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[15:02:22] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183 (10Vgutierrez) p:05Triage→03Medium
[15:02:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
[15:02:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
[15:02:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:35] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4030.ulsfo.wmnet with OS buster
[15:02:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:38] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
[15:02:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:42] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
[15:02:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:43] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
[15:02:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
[15:02:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
[15:02:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:47] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4030.ulsfo.wmnet with OS buster c...
[15:03:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
[15:03:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
[15:03:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
[15:03:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
[15:03:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21997 and previous config saved to /var/cache/conftool/dbconfig/20220307-150334-marostegui.json
[15:03:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:36] <vgutierrez>	 !log pool cp4030 with HAProxy as TLS termination layer - T290005
[15:03:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:39] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[15:03:56] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
[15:03:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:58] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
[15:03:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21998 and previous config saved to /var/cache/conftool/dbconfig/20220307-150426-root.json
[15:04:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:05:52] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
[15:05:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:05:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
[15:05:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:06:48] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad URL) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[15:08:12] <wikibugs>	 (03CR) 10Elukey: "The diff looks long, I am wondering if it is only a consequence of the new deps being added or if it will translate to some prod changes" [deployment-charts] - 10https://gerrit.wikimedia.org/r/768681 (owner: 10Elukey)
[15:08:28] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] toolserver_legacy: add a block-all robots.txt [puppet] - 10https://gerrit.wikimedia.org/r/756126 (owner: 10Majavah)
[15:08:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
[15:08:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:08:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
[15:08:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:08:46] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[15:09:32] <logmsgbot>	 !log ntsako@deploy1002 Started deploy [airflow-dags/analytics_test@7642d65]: (no justification provided)
[15:09:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:41] <logmsgbot>	 !log ntsako@deploy1002 Finished deploy [airflow-dags/analytics_test@7642d65]: (no justification provided) (duration: 00m 09s)
[15:09:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
[15:11:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
[15:11:16] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
[15:11:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:18] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
[15:11:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
[15:13:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
[15:13:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:15:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
[15:15:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:15:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
[15:15:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:50] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack: haproxy site definition is not a profile [puppet] - 10https://gerrit.wikimedia.org/r/756982 (owner: 10Majavah)
[15:18:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21999 and previous config saved to /var/cache/conftool/dbconfig/20220307-151839-marostegui.json
[15:18:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:19:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
[15:19:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:19:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
[15:19:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:19:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22000 and previous config saved to /var/cache/conftool/dbconfig/20220307-151929-root.json
[15:19:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:20:17] <logmsgbot>	 !log ntsako@deploy1002 Started deploy [airflow-dags/analytics@7642d65]: (no justification provided)
[15:20:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:20:24] <logmsgbot>	 !log ntsako@deploy1002 Finished deploy [airflow-dags/analytics@7642d65]: (no justification provided) (duration: 00m 07s)
[15:20:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:12] <wikibugs>	 (03PS14) 10Filippo Giunchedi: Introduce 'alertmanager' and 'alerting' modules [software/spicerack] - 10https://gerrit.wikimedia.org/r/765480 (https://phabricator.wikimedia.org/T293209)
[15:25:18] <wikibugs>	 (03CR) 10Filippo Giunchedi: Introduce 'alertmanager' and 'alerting' modules (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/765480 (https://phabricator.wikimedia.org/T293209) (owner: 10Filippo Giunchedi)
[15:29:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, though I'll let Cole vote" [puppet] - 10https://gerrit.wikimedia.org/r/768702 (https://phabricator.wikimedia.org/T301382) (owner: 10Btullis)
[15:29:44] <wikibugs>	 (03CR) 10Ssingh: aptrepo: add a component for certspotter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/768058 (owner: 10Ssingh)
[15:30:03] <wikibugs>	 (03Abandoned) 10Ssingh: aptrepo: add a component for certspotter [puppet] - 10https://gerrit.wikimedia.org/r/768058 (owner: 10Ssingh)
[15:33:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P22001 and previous config saved to /var/cache/conftool/dbconfig/20220307-153343-marostegui.json
[15:33:46] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[15:33:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:48] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[15:33:48] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[15:33:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:49] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[15:33:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:53] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[15:33:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22002 and previous config saved to /var/cache/conftool/dbconfig/20220307-153357-marostegui.json
[15:33:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22003 and previous config saved to /var/cache/conftool/dbconfig/20220307-153641-marostegui.json
[15:36:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:38:40] <logmsgbot>	 !log vgutierrez@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1085.eqiad.wmnet with OS buster
[15:38:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:38:52] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp1085.eqiad.wmnet with OS buster e...
[15:39:15] <wikibugs>	 (03PS1) 10Jelto: isystemd::sysuser: create option to add additional groups to user [puppet] - 10https://gerrit.wikimedia.org/r/768743 (https://phabricator.wikimedia.org/T295481)
[15:39:58] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp1085.eqiad.wmnet with reason: HW issues see T303183
[15:40:01] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp1085.eqiad.wmnet with reason: HW issues see T303183
[15:40:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:03] <stashbot>	 T303183: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183
[15:40:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:08] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183 (10ops-monitoring-bot) Icinga downtime set by vgutierrez@cumin1001 for 30 days, 0:00:00 1 host(s) and their services with reason: HW issues see T303183 ` cp1085.eqiad.wmnet `
[15:40:10] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: add more flexibility for frontends [puppet] - 10https://gerrit.wikimedia.org/r/756983 (owner: 10Majavah)
[15:40:18] <wikibugs>	 (03PS4) 10Andrew Bogott: openstack::haproxy: add more flexibility for frontends [puppet] - 10https://gerrit.wikimedia.org/r/756983 (owner: 10Majavah)
[15:40:24] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Ship it!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/765480 (https://phabricator.wikimedia.org/T293209) (owner: 10Filippo Giunchedi)
[15:40:33] <wikibugs>	 (03PS2) 10Jbond: C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739
[15:40:35] <wikibugs>	 (03PS2) 10Jbond: varnish: Rate limit hotlinking [puppet] - 10https://gerrit.wikimedia.org/r/768723
[15:40:54] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM overall, see inline" [puppet] - 10https://gerrit.wikimedia.org/r/768057 (owner: 10Vgutierrez)
[15:44:18] <icinga-wm>	 RECOVERY - Check systemd state on grafana1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:44:22] <wikibugs>	 (03PS3) 10Jbond: C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739
[15:45:20] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp5010 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768744 (https://phabricator.wikimedia.org/T290005)
[15:49:05] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp5010 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768744 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[15:49:56] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp5010.eqsin.wmnet with OS buster
[15:49:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:11] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp5010.eqsin.wmnet with OS buster
[15:51:23] <wikibugs>	 (03PS1) 10Majavah: P:wmcs::prometheus: use a single entry for openstack-exporter [puppet] - 10https://gerrit.wikimedia.org/r/768747
[15:51:35] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hieradata: cloud: Set monitoring_hosts as empty [puppet] - 10https://gerrit.wikimedia.org/r/757014 (owner: 10Majavah)
[15:51:40] <wikibugs>	 (03PS2) 10Andrew Bogott: hieradata: cloud: Set monitoring_hosts as empty [puppet] - 10https://gerrit.wikimedia.org/r/757014 (owner: 10Majavah)
[15:51:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22004 and previous config saved to /var/cache/conftool/dbconfig/20220307-155146-marostegui.json
[15:51:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:28] <wikibugs>	 (03PS4) 10Jbond: C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739
[15:56:48] <jayme>	 !log eqiad: kubectl -n istio-system delete po istiod-69d679d8b5-hm64j - T303184
[15:56:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:51] <stashbot>	 T303184: High API server request latencies (LIST) for istio API groups - https://phabricator.wikimedia.org/T303184
[15:58:01] <godog>	 jouncebot: next
[15:58:01] <jouncebot>	 In 0 hour(s) and 31 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220307T1630)
[15:58:14] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
[15:58:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:09] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
[15:59:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:01:06] <wikibugs>	 10SRE, 10serviceops: enhance otrs alerting - https://phabricator.wikimedia.org/T303190 (10Arnoldokoth)
[16:01:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
[16:01:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:02:23] <wikibugs>	 10SRE, 10serviceops: investigate otrs database grants - https://phabricator.wikimedia.org/T303191 (10Arnoldokoth)
[16:02:34] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] R:systemd::sysuser: add support for id => "-:groupname" [puppet] - 10https://gerrit.wikimedia.org/r/768735 (owner: 10Jbond)
[16:02:54] <icinga-wm>	 RECOVERY - k8s API server requests latencies on kubemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[16:03:01] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
[16:03:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:28] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
[16:03:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:35] <wikibugs>	 (03CR) 10Jbond: gitlab_runner: add gitlab-runner to docker group, change folder permissions (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[16:03:45] <wikibugs>	 (03PS6) 10Jbond: gitlab_runner: add gitlab-runner to docker group, change folder permissions [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[16:04:34] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
[16:04:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:04:45] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
[16:04:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:04:57] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
[16:04:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:12] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34103/console" [puppet] - 10https://gerrit.wikimedia.org/r/768683 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[16:05:18] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
[16:05:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:22] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
[16:05:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:29] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
[16:06:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:51] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22005 and previous config saved to /var/cache/conftool/dbconfig/20220307-160650-marostegui.json
[16:06:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:26] <wikibugs>	 (03PS1) 10Vgutierrez: site: Reimage cp3058 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768751 (https://phabricator.wikimedia.org/T290005)
[16:07:43] <wikibugs>	 (03CR) 10Jbond: "did an early pass" [puppet] - 10https://gerrit.wikimedia.org/r/768743 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[16:09:36] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
[16:09:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:49] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
[16:09:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:10:30] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job gitlab in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:10:48] <wikibugs>	 10SRE, 10ops-eqsin, 10Traffic: SMART error (CurrentPendingSector) detected on host: cp5004 - https://phabricator.wikimedia.org/T303043 (10Vgutierrez) p:05Triage→03Medium @wiki_willy how should we handle this HW issue on eqsin?
[16:10:55] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
[16:10:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:11:50] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
[16:11:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:55] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] misc: search-grafana-dashboards.js (031 comment) [software] - 10https://gerrit.wikimedia.org/r/767118 (owner: 10Filippo Giunchedi)
[16:14:06] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
[16:14:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:20] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
[16:14:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:28] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp5010.eqsin.wmnet with reason: host reimage
[16:14:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:15:30] <jinxer-wm>	 (JobUnavailable) firing: (5) Reduced availability for job gitlab in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:15:41] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
[16:15:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:24] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/changeprop: sync
[16:16:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:40] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp3058 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/768751 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[16:16:43] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
[16:16:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:48] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
[16:16:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:17:04] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
[16:17:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:17:09] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5010.eqsin.wmnet with reason: host reimage
[16:17:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:16] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp3058.esams.wmnet with OS buster
[16:18:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:29] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp3058.esams.wmnet with OS buster
[16:18:41] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
[16:18:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:19:57] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
[16:19:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:34] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
[16:20:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:41] <wikibugs>	 (03PS1) 10Ayounsi: Set fr-ops to operations [homer/public] - 10https://gerrit.wikimedia.org/r/768756 (https://phabricator.wikimedia.org/T302992)
[16:21:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22006 and previous config saved to /var/cache/conftool/dbconfig/20220307-162157-marostegui.json
[16:22:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:02] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[16:22:02] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
[16:22:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:04] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
[16:22:05] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
[16:22:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:13] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
[16:22:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:23] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
[16:22:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:30] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
[16:22:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:40] <wikibugs>	 (03PS5) 10Jbond: C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739
[16:22:47] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
[16:22:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:23:53] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34104/console" [puppet] - 10https://gerrit.wikimedia.org/r/768739 (owner: 10Jbond)
[16:24:15] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
[16:24:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:27:03] <wikibugs>	 (03PS6) 10Jbond: C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739
[16:27:47] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
[16:27:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:15] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34105/console" [puppet] - 10https://gerrit.wikimedia.org/r/768739 (owner: 10Jbond)
[16:28:15] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[16:28:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:17] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[16:28:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22007 and previous config saved to /var/cache/conftool/dbconfig/20220307-162821-marostegui.json
[16:28:23] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
[16:28:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:25] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[16:28:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:04] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
[16:29:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:05] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
[16:29:06] <logmsgbot>	 !log filippo@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host graphite2003.codfw.wmnet
[16:29:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:15] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
[16:29:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:05] <jouncebot>	 jan_drewniak: #bothumor My software never has bugs. It just develops random features. Rise for Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220307T1630).
[16:34:41] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
[16:34:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:52] <godog>	 jouncebot: next
[16:34:52] <jouncebot>	 In 1 hour(s) and 25 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220307T1800)
[16:35:49] <wikibugs>	 (03PS7) 10Jbond: C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739
[16:36:12] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
[16:36:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22008 and previous config saved to /var/cache/conftool/dbconfig/20220307-163612-marostegui.json
[16:36:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:36:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:36:17] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[16:36:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739 (owner: 10Jbond)
[16:36:53] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
[16:36:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:08] <wikibugs>	 (03PS8) 10Jbond: C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739
[16:37:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739 (owner: 10Jbond)
[16:38:54] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
[16:38:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:39:37] <icinga-wm>	 PROBLEM - Check unit status of statograph_post on alert1001 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[16:39:57] <wikibugs>	 (03PS9) 10Jbond: C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739
[16:41:17] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34108/console" [puppet] - 10https://gerrit.wikimedia.org/r/768739 (owner: 10Jbond)
[16:41:42] <vgutierrez>	 !log pool cp5010 with HAProxy as TLS termination layer - T290005
[16:41:44] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5010.eqsin.wmnet with OS buster
[16:41:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:45] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[16:41:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:51] <icinga-wm>	 PROBLEM - Maps tiles generation on alert1001 is CRITICAL: CRITICAL: 100.00% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/d/000000305/maps-performances?panelId=8&fullscreen&orgId=1
[16:41:58] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp5010.eqsin.wmnet with OS buster c...
[16:42:09] <godog>	 the statograph_port error is due to 502 from graphite.w.o FYI (cc cdanis)
[16:42:16] <godog>	 statograph_post error even
[16:42:28] <godog>	 JFYI though, it'll recover soon
[16:43:02] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC is essentially a noop" [puppet] - 10https://gerrit.wikimedia.org/r/768739 (owner: 10Jbond)
[16:43:08] <wikibugs>	 10SRE, 10ops-eqiad: analytics10[63,67] mgmt interfaces seem flapping from time to time - https://phabricator.wikimedia.org/T303151 (10Cmjohnson) @elukey analytics1063 and 1067 idrac's are stuck and each server needs to be physically powered off and unplugged  for 20-30 secs
[16:43:27] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
[16:43:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:28] <jinxer-wm>	 (ThanosRuleHighRuleEvaluationFailures) firing: (2) Thanos Rule is failing to evaluate rules. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/35da848f5f92b2dc612e0c3a0577b8a1/thanos-rule - https://alerts.wikimedia.org
[16:43:36] <cdanis>	 godog: ah that's fine, ty
[16:44:15] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/768702 (https://phabricator.wikimedia.org/T301382) (owner: 10Btullis)
[16:44:31] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
[16:44:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:44:55] <icinga-wm>	 PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The following units failed: docker-reporter-releng-images.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:44:59] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
[16:45:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:45:07] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
[16:45:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:09] <wikibugs>	 10SRE, 10ops-eqiad: analytics10[63,67] mgmt interfaces seem flapping from time to time - https://phabricator.wikimedia.org/T303151 (10elukey) @BTullis can you coordinate with @Cmjohnson to shutdown these nodes?
[16:46:33] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
[16:46:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:44] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
[16:46:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:48:11] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
[16:48:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:48:28] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
[16:48:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:49:38] <icinga-wm>	 RECOVERY - Check unit status of statograph_post on alert1001 is OK: OK: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[16:49:58] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA: Degraded RAID on es1029 - https://phabricator.wikimedia.org/T302169 (10Cmjohnson) 05Open→03Resolved Disk has been replaced and is rebuidling   cmjohnson@es1029:~$ sudo megacli -PDList -aALL |grep "Firmware state" Firmware state: Online, Spun Up Firmware state: Online, Spun Up F...
[16:50:30] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job gitlab in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:51:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22009 and previous config saved to /var/cache/conftool/dbconfig/20220307-165117-marostegui.json
[16:51:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:51:58] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
[16:51:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:52:43] <wikibugs>	 (03CR) 10Cwhite: Add a profile specific to datahubsearch servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/768736 (https://phabricator.wikimedia.org/T301382) (owner: 10Btullis)
[16:52:52] <vgutierrez>	 !log depool cp5004 - T303043
[16:52:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:52:54] <stashbot>	 T303043: SMART error (CurrentPendingSector) detected on host: cp5004 - https://phabricator.wikimedia.org/T303043
[16:54:16] <jinxer-wm>	 (ThanosSidecarPrometheusDown) firing: Thanos Sidecar cannot connect to Prometheus - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org
[16:54:16] <jinxer-wm>	 (ThanosSidecarUnhealthy) firing: Thanos Sidecar is unhealthy. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org
[16:54:41] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudservices1004.wikimedia.org with OS bullseye
[16:54:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:54:43] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): hw troubleshooting:  move cloudcephmon1003.eqiad.wmnet from rack B2 to rack C8 - https://phabricator.wikimedia.org/T303058 (10Cmjohnson) @dcaro @nskaggs Can I do this anytime?
[16:55:30] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job gitlab in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:55:30] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
[16:55:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:56:40] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183 (10Cmjohnson) @Vgutierrez @wiki_willy This server is out of warranty.  Expired June 2021
[16:58:11] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
[16:58:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:58:28] <jinxer-wm>	 (ThanosRuleHighRuleEvaluationFailures) resolved: (2) Thanos Rule is failing to evaluate rules. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/35da848f5f92b2dc612e0c3a0577b8a1/thanos-rule - https://alerts.wikimedia.org
[16:58:36] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): hw troubleshooting:  move cloudcephmon1003.eqiad.wmnet from rack B2 to rack C8 - https://phabricator.wikimedia.org/T303058 (10dcaro) @Cmjohnson feel free to move it yes, just make sure to ping us when you start/end (@nskaggs is clinic duty thi...
[16:58:46] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "great thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/768743 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[16:59:16] <jinxer-wm>	 (ThanosSidecarPrometheusDown) resolved: Thanos Sidecar cannot connect to Prometheus - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org
[16:59:16] <jinxer-wm>	 (ThanosSidecarUnhealthy) resolved: Thanos Sidecar is unhealthy. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org
[16:59:28] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
[16:59:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:50] <wikibugs>	 (03PS1) 10Hnowlan: jobqueue: set CPU request [deployment-charts] - 10https://gerrit.wikimedia.org/r/768760 (https://phabricator.wikimedia.org/T300914)
[17:03:06] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
[17:03:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:03:24] <wikibugs>	 10SRE, 10ops-eqsin, 10Traffic: SMART error (CurrentPendingSector) detected on host: cp5004 - https://phabricator.wikimedia.org/T303043 (10wiki_willy) a:03RobH Hi @Vgutierrez - it's due to be refreshed towards the end of this calendar year (and will be on next FY's budget).  Would you be able to go that lon...
[17:06:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22010 and previous config saved to /var/cache/conftool/dbconfig/20220307-170622-marostegui.json
[17:06:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:06:56] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1004.wikimedia.org with reason: host reimage
[17:06:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:07:14] <vgutierrez>	 !log pool cp3058 with HAProxy as TLS termination layer - T290005
[17:07:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:07:18] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[17:07:46] <icinga-wm>	 PROBLEM - Check systemd state on netflow6001 is CRITICAL: CRITICAL - degraded: The following units failed: ifup@ens13.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:07:50] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3058.esams.wmnet with OS buster
[17:07:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:08:01] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez)
[17:08:09] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp3058.esams.wmnet with OS buster c...
[17:08:41] <wikibugs>	 (03PS1) 10Jbond: P:cache::varnish::frontend: Update lookup keys [puppet] - 10https://gerrit.wikimedia.org/r/768762
[17:09:36] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1004.wikimedia.org with reason: host reimage
[17:09:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:44] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183 (10Vgutierrez) could we replace the faulty DIMM somehow? missing one server on text@eqiad is far from a ideal scenario
[17:10:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:cache::varnish::frontend: Update lookup keys [puppet] - 10https://gerrit.wikimedia.org/r/768762 (owner: 10Jbond)
[17:14:26] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183 (10wiki_willy)
[17:15:10] <wikibugs>	 (03PS2) 10Jbond: P:cache::varnish::frontend: Update lookup keys [puppet] - 10https://gerrit.wikimedia.org/r/768762
[17:15:58] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183 (10wiki_willy) No problem @Vgutierrez.  I just created T303203 with @RobH to procure a replacement DIMM  Thanks, Willy
[17:16:27] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34111/console" [puppet] - 10https://gerrit.wikimedia.org/r/768762 (owner: 10Jbond)
[17:20:48] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp5004.eqsin.wmnet with reason: HW issues see T303043
[17:20:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:51] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp5004.eqsin.wmnet with reason: HW issues see T303043
[17:20:52] <stashbot>	 T303043: SMART error (CurrentPendingSector) detected on host: cp5004 - https://phabricator.wikimedia.org/T303043
[17:20:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:58] <wikibugs>	 10SRE, 10ops-eqsin, 10Traffic: SMART error (CurrentPendingSector) detected on host: cp5004 - https://phabricator.wikimedia.org/T303043 (10ops-monitoring-bot) Icinga downtime set by vgutierrez@cumin1001 for 30 days, 0:00:00 1 host(s) and their services with reason: HW issues see T303043 ` cp5004.eqsin.wmnet `
[17:21:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22011 and previous config saved to /var/cache/conftool/dbconfig/20220307-172126-marostegui.json
[17:21:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[17:21:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:30] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[17:21:30] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[17:21:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22012 and previous config saved to /var/cache/conftool/dbconfig/20220307-172134-marostegui.json
[17:21:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:24:08] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubernetes1022.eqiad.wmnet
[17:24:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:26:37] <wikibugs>	 (03CR) 10Muehlenhoff: puppet: Print nodes that change on every puppet run, sorted (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/768659 (owner: 10Jcrespo)
[17:27:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22013 and previous config saved to /var/cache/conftool/dbconfig/20220307-172755-marostegui.json
[17:27:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:27:59] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[17:29:50] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1022.eqiad.wmnet
[17:29:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:32:19] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1004.wikimedia.org with OS bullseye
[17:32:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:16] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
[17:36:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:43:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22014 and previous config saved to /var/cache/conftool/dbconfig/20220307-174300-marostegui.json
[17:43:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:43:04] <wikibugs>	 (03PS1) 10Jbond: P:varnish::common: Add support for passing wikimedia_domains [puppet] - 10https://gerrit.wikimedia.org/r/768766
[17:44:05] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
[17:44:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:44:10] <wikibugs>	 (03PS1) 10Majavah: P:wmcs::prometheus: update pdns ports [puppet] - 10https://gerrit.wikimedia.org/r/768767 (https://phabricator.wikimedia.org/T281276)
[17:46:00] <wikibugs>	 (03PS3) 10Jbond: varnish: Rate limit hotlinking [puppet] - 10https://gerrit.wikimedia.org/r/768723
[17:46:25] <wikibugs>	 (03PS3) 10Tchanders: Enable IPInfo on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767216 (https://phabricator.wikimedia.org/T260598)
[17:46:27] <wikibugs>	 (03PS2) 10Tchanders: Autopromote-once users to the 'ipinfo' group after one edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/767845 (https://phabricator.wikimedia.org/T296184)
[17:47:41] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
[17:47:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:47:44] <logmsgbot>	 !log jayme@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubestage2002.codfw.wmnet
[17:47:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:49:11] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
[17:49:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:50:05] <wikibugs>	 (03PS4) 10Jbond: varnish: Rate limit hotlinking [puppet] - 10https://gerrit.wikimedia.org/r/768723
[17:50:54] <wikibugs>	 (03PS2) 10Jbond: P:varnish::common: Add support for passing wikimedia_domains [puppet] - 10https://gerrit.wikimedia.org/r/768766
[17:51:28] <wikibugs>	 (03PS10) 10Jbond: C:varnish::common: Add documentation [puppet] - 10https://gerrit.wikimedia.org/r/768739
[17:51:37] <wikibugs>	 (03PS3) 10Jbond: P:cache::varnish::frontend: Update lookup keys [puppet] - 10https://gerrit.wikimedia.org/r/768762
[17:51:44] <wikibugs>	 (03PS3) 10Jbond: P:varnish::common: Add support for passing wikimedia_domains [puppet] - 10https://gerrit.wikimedia.org/r/768766
[17:51:52] <wikibugs>	 (03PS5) 10Jbond: varnish: Rate limit hotlinking [puppet] - 10https://gerrit.wikimedia.org/r/768723
[17:52:35] <wikibugs>	 (03PS4) 10Jbond: P:varnish::common: Add support for passing wikimedia_domains [puppet] - 10https://gerrit.wikimedia.org/r/768766
[17:52:55] <wikibugs>	 (03PS6) 10Jbond: varnish: Rate limit hotlinking [puppet] - 10https://gerrit.wikimedia.org/r/768723
[17:53:56] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34114/console" [puppet] - 10https://gerrit.wikimedia.org/r/768766 (owner: 10Jbond)
[17:55:15] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2002.codfw.wmnet
[17:55:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:56:11] <wikibugs>	 (03PS2) 10Majavah: P:wmcs::prometheus: update pdns ports [puppet] - 10https://gerrit.wikimedia.org/r/768767 (https://phabricator.wikimedia.org/T281276)
[17:56:13] <wikibugs>	 (03PS1) 10Majavah: P:prometheus::ops: fix powerdns-auth port [puppet] - 10https://gerrit.wikimedia.org/r/768770 (https://phabricator.wikimedia.org/T300254)
[17:58:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22015 and previous config saved to /var/cache/conftool/dbconfig/20220307-175805-marostegui.json
[17:58:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:00:04] <jouncebot>	 ryankemper: Time to snap out of that daydream and deploy Wikidata Query Service weekly deploy. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220307T1800).
[18:06:06] <wikibugs>	 (03PS5) 10Jcrespo: Refactor check_mariadb_backups.py and add enough tests for it [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/767844 (https://phabricator.wikimedia.org/T138562)
[18:07:34] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Refactor check_mariadb_backups.py and add enough tests for it [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/767844 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[18:07:44] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Use yaml safeloader to parse config files [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/767716 (owner: 10Jcrespo)
[18:09:13] <wikibugs>	 (03Merged) 10jenkins-bot: Use yaml safeloader to parse config files [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/767716 (owner: 10Jcrespo)
[18:09:15] <wikibugs>	 (03Merged) 10jenkins-bot: Refactor check_mariadb_backups.py and add enough tests for it [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/767844 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[18:13:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22016 and previous config saved to /var/cache/conftool/dbconfig/20220307-181310-marostegui.json
[18:13:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:13:14] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[18:13:22] <wikibugs>	 (03Abandoned) 10Dduvall: Move Redis server definitions to services files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726660 (owner: 10Dduvall)
[18:13:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:prometheus::ops: fix powerdns-auth port [puppet] - 10https://gerrit.wikimedia.org/r/768770 (https://phabricator.wikimedia.org/T300254) (owner: 10Majavah)
[18:16:11] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] calico,cfssl-issuer,knative-serving: fix dependencies (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/768681 (owner: 10Elukey)
[18:20:49] <wikibugs>	 (03PS1) 10Clare Ming: Fix language alert regression [skins/Vector] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/768786 (https://phabricator.wikimedia.org/T302018)
[18:31:02] <wikibugs>	 (03PS1) 10Ottomata: Revert "Hive - set hive.warehouse.subdir.inherit.perms = false" [puppet] - 10https://gerrit.wikimedia.org/r/768787
[18:31:21] <wikibugs>	 (03PS2) 10Ottomata: Revert "Hive - set hive.warehouse.subdir.inherit.perms = false" [puppet] - 10https://gerrit.wikimedia.org/r/768787 (https://phabricator.wikimedia.org/T291664)
[18:31:29] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Revert "Hive - set hive.warehouse.subdir.inherit.perms = false" [puppet] - 10https://gerrit.wikimedia.org/r/768787 (https://phabricator.wikimedia.org/T291664) (owner: 10Ottomata)
[18:39:08] <wikibugs>	 (03PS1) 10Dduvall: Revert "Revert "contint: Install docker 20.10 from thirdparty/ci on buster"" [puppet] - 10https://gerrit.wikimedia.org/r/768774
[18:39:47] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] Fix language alert regression [skins/Vector] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/768786 (https://phabricator.wikimedia.org/T302018) (owner: 10Clare Ming)
[18:40:36] <wikibugs>	 (03PS2) 10Dduvall: Revert "Revert "contint: Install docker 20.10 from thirdparty/ci on buster"" [puppet] - 10https://gerrit.wikimedia.org/r/768774 (https://phabricator.wikimedia.org/T300682)
[18:44:00] <wikibugs>	 (03CR) 10RLazarus: "On including $site: LGTM, thanks!" [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/768108 (https://phabricator.wikimedia.org/T302842) (owner: 10Herron)
[18:55:30] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job cloud_dev_pdns in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[19:02:34] <icinga-wm>	 PROBLEM - SSH on db2090.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:09:08] <icinga-wm>	 PROBLEM - SSH on thumbor2004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:15:56] <wikibugs>	 (03PS1) 10Majavah: prometheus: include number of changes on puppet run metrics [puppet] - 10https://gerrit.wikimedia.org/r/768776
[19:17:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] prometheus: include number of changes on puppet run metrics [puppet] - 10https://gerrit.wikimedia.org/r/768776 (owner: 10Majavah)
[19:17:59] <wikibugs>	 (03PS2) 10Majavah: prometheus: include number of changes on puppet run metrics [puppet] - 10https://gerrit.wikimedia.org/r/768776
[19:23:19] <wikibugs>	 (03PS1) 10Ebernhardson: Prevent caching of auth redirect [puppet] - 10https://gerrit.wikimedia.org/r/768777 (https://phabricator.wikimedia.org/T301650)
[19:34:21] <wikibugs>	 (03PS2) 10Ebernhardson: Prevent caching of auth redirect [puppet] - 10https://gerrit.wikimedia.org/r/768777 (https://phabricator.wikimedia.org/T301650)
[19:34:27] <wikibugs>	 (03CR) 10Ebernhardson: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/768777 (https://phabricator.wikimedia.org/T301650) (owner: 10Ebernhardson)
[19:38:23] <wikibugs>	 (03CR) 10Ebernhardson: "Tested with the docker-compose environment in mw-oauth-proxy, verified the header is emitted by nginx from the sub-request when a redirect" [puppet] - 10https://gerrit.wikimedia.org/r/768777 (https://phabricator.wikimedia.org/T301650) (owner: 10Ebernhardson)
[19:41:30] <icinga-wm>	 PROBLEM - SSH on mw2258.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:49:54] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
[19:49:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:52:20] <icinga-wm>	 PROBLEM - SSH on dumpsdata1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:52:22] <icinga-wm>	 PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[20:10:40] <icinga-wm>	 RECOVERY - SSH on thumbor2004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:11:28] <icinga-wm>	 PROBLEM - Host checker.tools.wmflabs.org is DOWN: check_ping: Invalid hostname/address - checker.tools.wmflabs.org
[20:12:34] <icinga-wm>	 RECOVERY - Host checker.tools.wmflabs.org is UP: PING OK - Packet loss = 0%, RTA = 0.59 ms
[20:13:43] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
[20:13:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:19] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
[20:16:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:22:55] <wikibugs>	 10SRE, 10Znuny, 10serviceops: investigate otrs database grants - https://phabricator.wikimedia.org/T303191 (10Peachey88)
[20:23:04] <wikibugs>	 10SRE, 10Znuny, 10serviceops: enhance otrs alerting - https://phabricator.wikimedia.org/T303190 (10Peachey88)
[20:39:48] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1003.wikimedia.org with OS bullseye
[20:39:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:42:43] <icinga-wm>	 RECOVERY - SSH on mw2258.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:51:13] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[20:54:01] <icinga-wm>	 RECOVERY - SSH on dumpsdata1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:56:39] <icinga-wm>	 RECOVERY - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is OK: 1 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[21:00:07] <jouncebot>	 RoanKattouw and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220307T2100).
[21:00:07] <jouncebot>	 nray: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:30] <nray>	 o/ here
[21:06:48] <nray>	 is anyone available to deploy for the backport window rn?
[21:11:42] <wikibugs>	 10SRE, 10Znuny, 10serviceops: enhance Znuny (otrs) alerting - https://phabricator.wikimedia.org/T303190 (10Aklapper)
[21:11:58] <wikibugs>	 10SRE, 10Znuny, 10serviceops: enhance Znuny (otrs) alerting - https://phabricator.wikimedia.org/T303190 (10Aklapper) Hi @Arnoldokoth, the lack of a task description makes is hard for others to help or contribute, for a triager/tester to figure out at some point in the future whether this is still a valid tas...
[21:15:51] <urbanecm>	 nray: i am, sorry for being late
[21:15:55] <urbanecm>	 are you still around?
[21:16:50] <nray>	 yes I'm here!
[21:17:04] <urbanecm>	 let's start then
[21:17:13] <nray>	 sweet, thank you!
[21:17:15] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Fix language alert regression [skins/Vector] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/768786 (https://phabricator.wikimedia.org/T302018) (owner: 10Clare Ming)
[21:32:37] <wikibugs>	 (03Merged) 10jenkins-bot: Fix language alert regression [skins/Vector] (wmf/1.38.0-wmf.24) - 10https://gerrit.wikimedia.org/r/768786 (https://phabricator.wikimedia.org/T302018) (owner: 10Clare Ming)
[21:33:16] <urbanecm>	 nray: should be pulled to mwdebug1001. Can you have a look?
[21:33:25] <nray>	 yes, thank you
[21:33:55] <urbanecm>	 let me know how it looks like :)
[21:34:05] <nray>	 will do
[21:35:25] <nray>	 things look good urbanecm , you can proceed!
[21:35:30] <urbanecm>	 syncing!
[21:35:50] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[21:35:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:36:49] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[21:36:50] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[21:36:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:36:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:36] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: eac551c: Fix language alert regression (T302018) (duration: 00m 50s)
[21:37:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:40] <stashbot>	 T302018: [Regression] Language in sidebar should not show on pages without languages - https://phabricator.wikimedia.org/T302018
[21:37:43] <urbanecm>	 nray: and should be live!
[21:37:44] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[21:37:45] <urbanecm>	 anything else?
[21:37:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:57] <nray>	 urbanecm: that's all, thanks so much for your help!
[21:38:05] <urbanecm>	 happy to help
[21:38:11] <urbanecm>	 !log UTC late B&C window done
[21:38:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:49:20] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
[21:49:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:00:05] <jouncebot>	 Reedy and sbassett: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220307T2200).
[22:18:11] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
[22:18:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:19:42] <wikibugs>	 (03PS1) 10Razzi: elasticsearch: move cluster configuration to puppet [puppet] - 10https://gerrit.wikimedia.org/r/768816 (https://phabricator.wikimedia.org/T278378)
[22:20:26] <wikibugs>	 (03PS2) 10Razzi: elasticsearch: move cluster configuration to puppet [puppet] - 10https://gerrit.wikimedia.org/r/768816 (https://phabricator.wikimedia.org/T278378)
[22:20:38] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
[22:20:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:21:20] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1003.wikimedia.org with OS bullseye
[22:21:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:21:28] <wikibugs>	 (03PS1) 10Ebernhardson: icinga: Move cirrus check into cirrus_cluster_checks [puppet] - 10https://gerrit.wikimedia.org/r/768818
[22:21:44] <wikibugs>	 (03CR) 10Razzi: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34115/console" [puppet] - 10https://gerrit.wikimedia.org/r/768816 (https://phabricator.wikimedia.org/T278378) (owner: 10Razzi)
[22:23:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] icinga: Move cirrus check into cirrus_cluster_checks [puppet] - 10https://gerrit.wikimedia.org/r/768818 (owner: 10Ebernhardson)
[22:23:30] <wikibugs>	 (03PS3) 10Ryan Kemper: elasticsearch: move cluster configuration to puppet [puppet] - 10https://gerrit.wikimedia.org/r/768816 (https://phabricator.wikimedia.org/T278378) (owner: 10Razzi)
[22:23:32] <wikibugs>	 (03CR) 10Ebernhardson: "I'm a bit indecisive on what is appropriate here. I don't see any obvious reason this check should be in either file, and I'm left wonderi" [puppet] - 10https://gerrit.wikimedia.org/r/768818 (owner: 10Ebernhardson)
[22:25:31] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
[22:25:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:26:28] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
[22:26:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:27:39] <wikibugs>	 (03CR) 10Ryan Kemper: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/716532 (https://phabricator.wikimedia.org/T278378) (owner: 10Ryan Kemper)
[22:28:55] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
[22:28:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:33:30] <wikibugs>	 (03PS10) 10Razzi: elasticsearch: load config from yaml [software/spicerack] - 10https://gerrit.wikimedia.org/r/716532 (https://phabricator.wikimedia.org/T278378) (owner: 10Ryan Kemper)
[22:33:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: load config from yaml [software/spicerack] - 10https://gerrit.wikimedia.org/r/716532 (https://phabricator.wikimedia.org/T278378) (owner: 10Ryan Kemper)
[22:33:46] <wikibugs>	 (03PS2) 10Ebernhardson: icinga: Move cirrus check into cirrus_cluster_checks [puppet] - 10https://gerrit.wikimedia.org/r/768818
[22:35:15] <wikibugs>	 (03CR) 10Ebernhardson: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/768818 (owner: 10Ebernhardson)
[22:36:56] <wikibugs>	 (03PS11) 10Razzi: elasticsearch: load config from yaml [software/spicerack] - 10https://gerrit.wikimedia.org/r/716532 (https://phabricator.wikimedia.org/T278378) (owner: 10Ryan Kemper)
[22:37:49] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1003.wikimedia.org with OS bullseye
[22:37:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:42:43] <wikibugs>	 10SRE, 10ops-eqiad: analytics10[63,67] mgmt interfaces seem flapping from time to time - https://phabricator.wikimedia.org/T303151 (10wiki_willy) a:03Cmjohnson
[22:42:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: load config from yaml [software/spicerack] - 10https://gerrit.wikimedia.org/r/716532 (https://phabricator.wikimedia.org/T278378) (owner: 10Ryan Kemper)
[22:46:22] <wikibugs>	 (03PS12) 10Ryan Kemper: elasticsearch: load config from yaml [software/spicerack] - 10https://gerrit.wikimedia.org/r/716532 (https://phabricator.wikimedia.org/T278378)
[22:49:18] <wikibugs>	 (03PS13) 10Ryan Kemper: elasticsearch: load config from yaml [software/spicerack] - 10https://gerrit.wikimedia.org/r/716532 (https://phabricator.wikimedia.org/T278378)
[22:51:21] <wikibugs>	 (03PS14) 10Ryan Kemper: elasticsearch: load config from yaml [software/spicerack] - 10https://gerrit.wikimedia.org/r/716532 (https://phabricator.wikimedia.org/T278378)
[22:52:19] <wikibugs>	 (03PS15) 10Ryan Kemper: elasticsearch: load config from yaml [software/spicerack] - 10https://gerrit.wikimedia.org/r/716532 (https://phabricator.wikimedia.org/T278378)
[22:53:47] <wikibugs>	 10SRE, 10Traffic, 10envoy, 10serviceops: Refactor envoy HTTP protocol options to new version - https://phabricator.wikimedia.org/T303230 (10RLazarus)
[22:55:31] <wikibugs>	 10SRE, 10Traffic, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10RLazarus)
[22:55:39] <wikibugs>	 10SRE, 10Traffic, 10envoy, 10serviceops: Refactor envoy HTTP protocol options to new version - https://phabricator.wikimedia.org/T303230 (10RLazarus) 05Open→03Stalled p:05Triage→03Low
[22:55:56] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job gitlab in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:59:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: load config from yaml [software/spicerack] - 10https://gerrit.wikimedia.org/r/716532 (https://phabricator.wikimedia.org/T278378) (owner: 10Ryan Kemper)
[23:02:32] <wikibugs>	 10SRE, 10Traffic, 10envoy, 10serviceops: Refactor envoy access_log_path to access loggers - https://phabricator.wikimedia.org/T303231 (10RLazarus)
[23:05:05] <wikibugs>	 10SRE, 10Traffic, 10envoy, 10serviceops: Refactor envoy access_log_path to access loggers - https://phabricator.wikimedia.org/T303231 (10RLazarus) p:05Triage→03Medium
[23:38:19] <wikibugs>	 (03CR) 10Cwhite: Added config for the datahubsearch LVS service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/768668 (owner: 10Btullis)
[23:40:31] <logmsgbot>	 !log jhathaway@cumin1001 START - Cookbook sre.hosts.downtime for 0:15:00 on mirror1001.wikimedia.org with reason: reboot
[23:40:32] <logmsgbot>	 !log jhathaway@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mirror1001.wikimedia.org with reason: reboot
[23:40:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:40:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:44:00] <wikibugs>	 (03PS1) 10Andrew Bogott: Add files and templates for OpenStack Wallaby [puppet] - 10https://gerrit.wikimedia.org/r/768829 (https://phabricator.wikimedia.org/T281275)
[23:44:02] <wikibugs>	 (03PS1) 10Andrew Bogott: OpenStack: add manifests for openstack wallaby [puppet] - 10https://gerrit.wikimedia.org/r/768830 (https://phabricator.wikimedia.org/T281275)
[23:44:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add files and templates for OpenStack Wallaby [puppet] - 10https://gerrit.wikimedia.org/r/768829 (https://phabricator.wikimedia.org/T281275) (owner: 10Andrew Bogott)
[23:45:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] OpenStack: add manifests for openstack wallaby [puppet] - 10https://gerrit.wikimedia.org/r/768830 (https://phabricator.wikimedia.org/T281275) (owner: 10Andrew Bogott)
[23:49:45] <logmsgbot>	 !log jhathaway@cumin1001 START - Cookbook sre.hosts.downtime for 0:15:00 on mx1001.wikimedia.org with reason: reboot
[23:49:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:49:46] <logmsgbot>	 !log jhathaway@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mx1001.wikimedia.org with reason: reboot
[23:49:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:50:01] <logmsgbot>	 !log jhathaway@cumin1001 START - Cookbook sre.hosts.downtime for 0:15:00 on mx2001.wikimedia.org with reason: reboot
[23:50:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:50:03] <logmsgbot>	 !log jhathaway@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mx2001.wikimedia.org with reason: reboot
[23:50:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:54:46] <wikibugs>	 (03PS2) 10Andrew Bogott: Add files and templates for OpenStack Wallaby [puppet] - 10https://gerrit.wikimedia.org/r/768829 (https://phabricator.wikimedia.org/T281275)
[23:54:48] <wikibugs>	 (03PS2) 10Andrew Bogott: OpenStack: add manifests for openstack wallaby [puppet] - 10https://gerrit.wikimedia.org/r/768830 (https://phabricator.wikimedia.org/T281275)
[23:55:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] OpenStack: add manifests for openstack wallaby [puppet] - 10https://gerrit.wikimedia.org/r/768830 (https://phabricator.wikimedia.org/T281275) (owner: 10Andrew Bogott)
[23:57:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add files and templates for OpenStack Wallaby [puppet] - 10https://gerrit.wikimedia.org/r/768829 (https://phabricator.wikimedia.org/T281275) (owner: 10Andrew Bogott)
[23:59:11] <icinga-wm>	 PROBLEM - SSH on dumpsdata1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook