[00:00:14] <wikibugs>	 (03PS1) 10Dzahn: kubernetes::deployment_server: add new service image-suggestion [puppet] - 10https://gerrit.wikimedia.org/r/784791 (https://phabricator.wikimedia.org/T251305)
[00:04:51] <wikibugs>	 (03PS2) 10Dzahn: kubernetes::deployment_server: add new service image-suggestion [puppet] - 10https://gerrit.wikimedia.org/r/784791 (https://phabricator.wikimedia.org/T304891)
[00:07:01] <icinga-wm>	 PROBLEM - Check systemd state on alert1001 is CRITICAL: CRITICAL - degraded: The following units failed: certspotter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:07:25] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/pcc-worker1002/34931/deploy1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/784791 (https://phabricator.wikimedia.org/T304891) (owner: 10Dzahn)
[00:09:01] <mutante>	 !log alert1001 - sudo systemctl start certspotter (after an alert from Icinga itself that it failed. error was some temp error fetching data from comodo)
[00:09:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:09:15] <icinga-wm>	 RECOVERY - Check systemd state on alert1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:09:21] <mutante>	 ^ fixed
[00:10:55] <wikibugs>	 (03CR) 10Ssingh: P:wikidough: add a check to ensure service has been restarted (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/784697 (owner: 10Ssingh)
[00:12:13] <wikibugs>	 10SRE, 10Generated Data Platform, 10Image-Suggestions, 10serviceops, and 3 others: New Service Request Generated Datasets: Image Suggestions Service - https://phabricator.wikimedia.org/T304891 (10Dzahn) >>! In T304891#7823942, @JMeybohm wrote: > We still have those in labs/private `hieradata/common/profile...
[00:12:53] <wikibugs>	 10SRE, 10Generated Data Platform, 10Image-Suggestions, 10serviceops, and 3 others: New Service Request Generated Datasets: Image Suggestions Service - https://phabricator.wikimedia.org/T304891 (10Dzahn) >>! In T304891#7823946, @Joe wrote: > * The deployment will be called image-suggestion and use the image...
[00:21:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[00:21:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[00:21:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:21:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1175 (T298565)', diff saved to https://phabricator.wikimedia.org/P25834 and previous config saved to /var/cache/conftool/dbconfig/20220421-002107-ladsgroup.json
[00:21:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:21:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:21:14] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[00:24:28] <wikibugs>	 (03PS1) 10Dzahn: kubernetes: add dummy tokens for image-suggestion service [labs/private] - 10https://gerrit.wikimedia.org/r/784794 (https://phabricator.wikimedia.org/T304891)
[00:24:49] <icinga-wm>	 PROBLEM - Check systemd state on alert1001 is CRITICAL: CRITICAL - degraded: The following units failed: certspotter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:29:15] <icinga-wm>	 RECOVERY - Check systemd state on alert1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:29:25] <wikibugs>	 (03PS2) 10Dzahn: kubernetes: add dummy tokens for image-suggestion service [labs/private] - 10https://gerrit.wikimedia.org/r/784794 (https://phabricator.wikimedia.org/T304891)
[00:30:33] <mutante>	 !log alert1001 - sudo systemctl start certspotter - another time, not on our end but should probably fail more gracefully
[00:30:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:31:04] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] kubernetes: add dummy tokens for image-suggestion service [labs/private] - 10https://gerrit.wikimedia.org/r/784794 (https://phabricator.wikimedia.org/T304891) (owner: 10Dzahn)
[00:37:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298565)', diff saved to https://phabricator.wikimedia.org/P25835 and previous config saved to /var/cache/conftool/dbconfig/20220421-003720-ladsgroup.json
[00:37:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:37:26] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[00:48:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P25836 and previous config saved to /var/cache/conftool/dbconfig/20220421-004846-ladsgroup.json
[00:48:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:48:52] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[00:52:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P25837 and previous config saved to /var/cache/conftool/dbconfig/20220421-005225-ladsgroup.json
[00:52:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:02:19] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:02:44] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[01:03:15] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:03:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P25838 and previous config saved to /var/cache/conftool/dbconfig/20220421-010351-ladsgroup.json
[01:03:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:04:23] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 47965 bytes in 0.055 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:05:21] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.271 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:07:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P25839 and previous config saved to /var/cache/conftool/dbconfig/20220421-010730-ladsgroup.json
[01:07:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:18:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P25840 and previous config saved to /var/cache/conftool/dbconfig/20220421-011856-ladsgroup.json
[01:19:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:22:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298565)', diff saved to https://phabricator.wikimedia.org/P25841 and previous config saved to /var/cache/conftool/dbconfig/20220421-012235-ladsgroup.json
[01:22:37] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[01:22:39] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[01:22:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:22:41] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[01:22:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:22:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:34:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P25842 and previous config saved to /var/cache/conftool/dbconfig/20220421-013401-ladsgroup.json
[01:34:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:34:07] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[01:34:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[01:34:51] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[01:34:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:34:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:34:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1110 (T298565)', diff saved to https://phabricator.wikimedia.org/P25843 and previous config saved to /var/cache/conftool/dbconfig/20220421-013456-ladsgroup.json
[01:35:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:37:31] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[01:40:45] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:41:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298565)', diff saved to https://phabricator.wikimedia.org/P25844 and previous config saved to /var/cache/conftool/dbconfig/20220421-014116-ladsgroup.json
[01:41:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:41:22] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[01:45:45] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:47:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[01:56:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P25845 and previous config saved to /var/cache/conftool/dbconfig/20220421-015621-ladsgroup.json
[01:56:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:02:51] <wikibugs>	 (03PS5) 10RLazarus: varnish: Rename public_clouds.json to ipblock_cloud.json [puppet] - 10https://gerrit.wikimedia.org/r/784761 (https://phabricator.wikimedia.org/T305581)
[02:02:53] <wikibugs>	 (03PS6) 10RLazarus: varnish: Allow using netmapper with multiple requestctl ipblock types [puppet] - 10https://gerrit.wikimedia.org/r/784774 (https://phabricator.wikimedia.org/T305581)
[02:02:55] <wikibugs>	 (03PS1) 10RLazarus: cache: Support multiple requestctl ipblock types in netmapper confd template [puppet] - 10https://gerrit.wikimedia.org/r/784798 (https://phabricator.wikimedia.org/T305581)
[02:04:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cache: Support multiple requestctl ipblock types in netmapper confd template [puppet] - 10https://gerrit.wikimedia.org/r/784798 (https://phabricator.wikimedia.org/T305581) (owner: 10RLazarus)
[02:07:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[02:07:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[02:07:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:07:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1179 (T298565)', diff saved to https://phabricator.wikimedia.org/P25846 and previous config saved to /var/cache/conftool/dbconfig/20220421-020727-ladsgroup.json
[02:07:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:07:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:07:33] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[02:11:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P25847 and previous config saved to /var/cache/conftool/dbconfig/20220421-021126-ladsgroup.json
[02:11:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:14:35] <wikibugs>	 (03PS2) 10RLazarus: cache: Support multiple requestctl ipblock types in netmapper confd template [puppet] - 10https://gerrit.wikimedia.org/r/784798 (https://phabricator.wikimedia.org/T305581)
[02:14:37] <wikibugs>	 (03PS7) 10RLazarus: varnish: Allow using netmapper with multiple requestctl ipblock types [puppet] - 10https://gerrit.wikimedia.org/r/784774 (https://phabricator.wikimedia.org/T305581)
[02:15:44] <wikibugs>	 (03CR) 10RLazarus: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34932/console" [puppet] - 10https://gerrit.wikimedia.org/r/784798 (https://phabricator.wikimedia.org/T305581) (owner: 10RLazarus)
[02:19:05] <wikibugs>	 (03CR) 10RLazarus: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34933/console" [puppet] - 10https://gerrit.wikimedia.org/r/784774 (https://phabricator.wikimedia.org/T305581) (owner: 10RLazarus)
[02:26:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298565)', diff saved to https://phabricator.wikimedia.org/P25848 and previous config saved to /var/cache/conftool/dbconfig/20220421-022631-ladsgroup.json
[02:26:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[02:26:35] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[02:26:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:26:38] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[02:26:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:26:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:30:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[02:30:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[02:30:54] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[02:30:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:30:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:31:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[02:31:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:31:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:32:46] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[02:32:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[02:32:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:32:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:32:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[02:37:05] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[02:37:06] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[02:37:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:37:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1130 (T298565)', diff saved to https://phabricator.wikimedia.org/P25849 and previous config saved to /var/cache/conftool/dbconfig/20220421-023710-ladsgroup.json
[02:37:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:37:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:37:16] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[02:39:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1130 (T298565)', diff saved to https://phabricator.wikimedia.org/P25850 and previous config saved to /var/cache/conftool/dbconfig/20220421-023942-ladsgroup.json
[02:39:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:49:59] <wikibugs>	 (03PS1) 10Andrew Bogott: Make cloudservices2005-dev the new ns1.openstack.codfw1dev.wikimediacloud.org [dns] - 10https://gerrit.wikimedia.org/r/784800 (https://phabricator.wikimedia.org/T304881)
[02:51:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Make cloudservices2005-dev the new ns1.openstack.codfw1dev.wikimediacloud.org [dns] - 10https://gerrit.wikimedia.org/r/784800 (https://phabricator.wikimedia.org/T304881) (owner: 10Andrew Bogott)
[02:58:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298565)', diff saved to https://phabricator.wikimedia.org/P25851 and previous config saved to /var/cache/conftool/dbconfig/20220421-025849-ladsgroup.json
[02:58:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:58:55] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[02:59:25] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[03:07:38] <wikibugs>	 (03PS8) 10Andrew Bogott: Make new cloudweb2002-dev node into a cloudweb node [puppet] - 10https://gerrit.wikimedia.org/r/784738 (https://phabricator.wikimedia.org/T304881)
[03:07:40] <wikibugs>	 (03PS1) 10Andrew Bogott: acme_chief: add ldap certs for cloudservices200[4,5]-dev [puppet] - 10https://gerrit.wikimedia.org/r/784802
[03:09:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] acme_chief: add ldap certs for cloudservices200[4,5]-dev [puppet] - 10https://gerrit.wikimedia.org/r/784802 (owner: 10Andrew Bogott)
[03:11:03] <wikibugs>	 (03PS1) 10Andrew Bogott: Make cloudservices2004-dev the new ns0.openstack.codfw1dev.wikimediacloud.org [dns] - 10https://gerrit.wikimedia.org/r/784804 (https://phabricator.wikimedia.org/T304881)
[03:13:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P25852 and previous config saved to /var/cache/conftool/dbconfig/20220421-031354-ladsgroup.json
[03:13:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:24:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[03:24:59] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[03:25:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:25:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25853 and previous config saved to /var/cache/conftool/dbconfig/20220421-032503-ladsgroup.json
[03:25:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:25:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:25:10] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[03:25:45] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[03:25:47] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[03:25:48] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[03:25:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:25:51] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[03:25:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:25:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:25:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1158 (T298563)', diff saved to https://phabricator.wikimedia.org/P25854 and previous config saved to /var/cache/conftool/dbconfig/20220421-032556-ladsgroup.json
[03:25:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:26:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:26:04] <stashbot>	 T298563: Fix mismatching field type of column text.old_flags on wmf wikis - https://phabricator.wikimedia.org/T298563
[03:27:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P25855 and previous config saved to /var/cache/conftool/dbconfig/20220421-032753-ladsgroup.json
[03:27:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:28:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[03:28:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[03:28:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:28:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[03:28:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:28:47] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[03:28:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:28:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:28:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P25856 and previous config saved to /var/cache/conftool/dbconfig/20220421-032859-ladsgroup.json
[03:29:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[03:29:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[03:29:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:29:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:29:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1174 (T306560)', diff saved to https://phabricator.wikimedia.org/P25857 and previous config saved to /var/cache/conftool/dbconfig/20220421-032906-ladsgroup.json
[03:29:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:29:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:29:14] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[03:29:58] <wikibugs>	 (03PS3) 10Ladsgroup: Add fix_img_major_mime_null_T306560.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/784762 (https://phabricator.wikimedia.org/T306560)
[03:31:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25858 and previous config saved to /var/cache/conftool/dbconfig/20220421-033154-ladsgroup.json
[03:31:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:32:00] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[03:39:10] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[03:40:02] <wikibugs>	 (03PS9) 10Andrew Bogott: Make new cloudweb2002-dev node into a cloudweb node [puppet] - 10https://gerrit.wikimedia.org/r/784738 (https://phabricator.wikimedia.org/T304881)
[03:40:04] <wikibugs>	 (03PS1) 10Andrew Bogott: Add new designate hosts to profile::openstack::codfw1dev::designate_hosts [puppet] - 10https://gerrit.wikimedia.org/r/784806
[03:40:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T306560)', diff saved to https://phabricator.wikimedia.org/P25859 and previous config saved to /var/cache/conftool/dbconfig/20220421-034021-ladsgroup.json
[03:40:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:40:28] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[03:40:55] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Add new designate hosts to profile::openstack::codfw1dev::designate_hosts [puppet] - 10https://gerrit.wikimedia.org/r/784806 (owner: 10Andrew Bogott)
[03:44:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298565)', diff saved to https://phabricator.wikimedia.org/P25860 and previous config saved to /var/cache/conftool/dbconfig/20220421-034404-ladsgroup.json
[03:44:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
[03:44:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
[03:44:10] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[03:44:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db[2074,2094,2109,2127,2149].codfw.wmnet with reason: Maintenance
[03:44:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db[2074,2094,2109,2127,2149].codfw.wmnet with reason: Maintenance
[03:44:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:47:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P25861 and previous config saved to /var/cache/conftool/dbconfig/20220421-034659-ladsgroup.json
[03:47:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:55:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P25862 and previous config saved to /var/cache/conftool/dbconfig/20220421-035526-ladsgroup.json
[03:55:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:57:08] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Make cloudservices2004-dev the new ns0.openstack.codfw1dev.wikimediacloud.org [dns] - 10https://gerrit.wikimedia.org/r/784804 (https://phabricator.wikimedia.org/T304881) (owner: 10Andrew Bogott)
[03:57:28] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Make new cloudweb2002-dev node into a cloudweb node [puppet] - 10https://gerrit.wikimedia.org/r/784738 (https://phabricator.wikimedia.org/T304881) (owner: 10Andrew Bogott)
[03:59:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestagemaster1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[04:02:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P25863 and previous config saved to /var/cache/conftool/dbconfig/20220421-040204-ladsgroup.json
[04:02:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:05:07] <wikibugs>	 (03PS1) 10Andrew Bogott: misc hiera changes to add cloudweb2002-dev [puppet] - 10https://gerrit.wikimedia.org/r/784807 (https://phabricator.wikimedia.org/T304881)
[04:08:32] <icinga-wm>	 PROBLEM - Swift https backend on ms-fe1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift
[04:09:10] <icinga-wm>	 PROBLEM - SSH on ms-fe1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[04:09:32] <icinga-wm>	 PROBLEM - Swift https frontend on ms-fe1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift
[04:10:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P25864 and previous config saved to /var/cache/conftool/dbconfig/20220421-041032-ladsgroup.json
[04:10:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:10:40] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] misc hiera changes to add cloudweb2002-dev [puppet] - 10https://gerrit.wikimedia.org/r/784807 (https://phabricator.wikimedia.org/T304881) (owner: 10Andrew Bogott)
[04:11:04] <icinga-wm>	 RECOVERY - SSH on ms-fe1012 is OK: SSH OK - OpenSSH_8.4p1 Debian-5 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[04:11:11] <wikibugs>	 (03PS2) 10Andrew Bogott: misc hiera changes to add cloudweb2002-dev [puppet] - 10https://gerrit.wikimedia.org/r/784807 (https://phabricator.wikimedia.org/T304881)
[04:11:28] <icinga-wm>	 RECOVERY - Swift https frontend on ms-fe1012 is OK: HTTP OK: HTTP/1.1 200 OK - 245 bytes in 0.011 second response time https://wikitech.wikimedia.org/wiki/Swift
[04:12:09] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] misc hiera changes to add cloudweb2002-dev [puppet] - 10https://gerrit.wikimedia.org/r/784807 (https://phabricator.wikimedia.org/T304881) (owner: 10Andrew Bogott)
[04:12:34] <icinga-wm>	 RECOVERY - Swift https backend on ms-fe1012 is OK: HTTP OK: HTTP/1.1 200 OK - 451 bytes in 0.022 second response time https://wikitech.wikimedia.org/wiki/Swift
[04:17:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25865 and previous config saved to /var/cache/conftool/dbconfig/20220421-041710-ladsgroup.json
[04:17:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[04:17:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[04:17:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:17:15] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[04:17:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:17:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:21:36] <icinga-wm>	 PROBLEM - Check systemd state on cloudweb2002-dev is CRITICAL: CRITICAL - degraded: The following units failed: mcrouter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:21:36] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[04:21:38] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[04:21:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:21:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25866 and previous config saved to /var/cache/conftool/dbconfig/20220421-042142-ladsgroup.json
[04:21:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:21:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:25:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T306560)', diff saved to https://phabricator.wikimedia.org/P25867 and previous config saved to /var/cache/conftool/dbconfig/20220421-042537-ladsgroup.json
[04:25:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[04:25:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[04:25:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:25:42] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[04:25:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1127 (T306560)', diff saved to https://phabricator.wikimedia.org/P25868 and previous config saved to /var/cache/conftool/dbconfig/20220421-042545-ladsgroup.json
[04:25:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:25:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:25:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:26:56] <icinga-wm>	 RECOVERY - Check systemd state on cloudweb2002-dev is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:30:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25869 and previous config saved to /var/cache/conftool/dbconfig/20220421-043014-ladsgroup.json
[04:30:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:30:20] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[04:37:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[04:37:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[04:37:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:37:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:44:46] <icinga-wm>	 PROBLEM - Memcached on cloudweb2002-dev is CRITICAL: connect to address 208.80.153.41 and port 11000: Connection refused https://wikitech.wikimedia.org/wiki/Memcached
[04:45:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P25870 and previous config saved to /var/cache/conftool/dbconfig/20220421-044519-ladsgroup.json
[04:45:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:46:53] <wikibugs>	 (03PS1) 10Andrew Bogott: Prepare cloudservices200[2,3]-dev for decom [puppet] - 10https://gerrit.wikimedia.org/r/784810 (https://phabricator.wikimedia.org/T304881)
[04:49:50] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Prepare cloudservices200[2,3]-dev for decom [puppet] - 10https://gerrit.wikimedia.org/r/784810 (https://phabricator.wikimedia.org/T304881) (owner: 10Andrew Bogott)
[05:00:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P25871 and previous config saved to /var/cache/conftool/dbconfig/20220421-050024-ladsgroup.json
[05:00:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:01:08] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on 31 hosts with reason: Primary switchover s8 T303927
[05:01:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:01:18] <stashbot>	 T303927: Switchover s8 master (db1109 -> db1104) - https://phabricator.wikimedia.org/T303927
[05:01:29] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 31 hosts with reason: Primary switchover s8 T303927
[05:01:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:01:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Set db1104 with weight 0 T303927', diff saved to https://phabricator.wikimedia.org/P25872 and previous config saved to /var/cache/conftool/dbconfig/20220421-050154-ladsgroup.json
[05:01:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:02:41] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] monitor_eventscheduler.pp: Monitor event_scheduler on tests hosts [puppet] - 10https://gerrit.wikimedia.org/r/784583 (https://phabricator.wikimedia.org/T254738) (owner: 10Marostegui)
[05:02:44] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[05:02:50] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[05:07:18] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[05:09:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T306560)', diff saved to https://phabricator.wikimedia.org/P25873 and previous config saved to /var/cache/conftool/dbconfig/20220421-050918-ladsgroup.json
[05:09:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:09:24] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[05:09:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1132 T301879', diff saved to https://phabricator.wikimedia.org/P25874 and previous config saved to /var/cache/conftool/dbconfig/20220421-050931-marostegui.json
[05:09:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:09:36] <stashbot>	 T301879: Test MariaDB 10.6 on Bullseye - https://phabricator.wikimedia.org/T301879
[05:10:18] <wikibugs>	 (03PS1) 10Marostegui: db1132: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/784811
[05:12:56] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1132: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/784811 (owner: 10Marostegui)
[05:15:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25875 and previous config saved to /var/cache/conftool/dbconfig/20220421-051529-ladsgroup.json
[05:15:32] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[05:15:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[05:15:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[05:15:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:37] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[05:15:38] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[05:15:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P25876 and previous config saved to /var/cache/conftool/dbconfig/20220421-051543-ladsgroup.json
[05:15:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:17:48] <wikibugs>	 (03PS3) 10Ladsgroup: mariadb: Promote db1104 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/784681 (https://phabricator.wikimedia.org/T303927)
[05:17:52] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] mariadb: Promote db1104 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/784681 (https://phabricator.wikimedia.org/T303927) (owner: 10Ladsgroup)
[05:19:18] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] mariadb: Promote db1104 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/784681 (https://phabricator.wikimedia.org/T303927) (owner: 10Ladsgroup)
[05:21:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[05:21:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[05:21:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:21:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P25877 and previous config saved to /var/cache/conftool/dbconfig/20220421-052146-ladsgroup.json
[05:21:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:21:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:21:52] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[05:22:05] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on alert1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:23:43] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.371 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[05:24:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P25878 and previous config saved to /var/cache/conftool/dbconfig/20220421-052423-ladsgroup.json
[05:24:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:24:45] <icinga-wm>	 RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: All metrics within thresholds. https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[05:25:17] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:26:31] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:28:59] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[05:31:49] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2046 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:32:53] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2046 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[05:33:47] <wikibugs>	 (03PS1) 10Andrew Bogott: Update dns IP for new cloudservices hosts in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/785067
[05:34:35] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.98 ms
[05:37:45] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Update dns IP for new cloudservices hosts in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/785067 (owner: 10Andrew Bogott)
[05:39:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P25879 and previous config saved to /var/cache/conftool/dbconfig/20220421-053928-ladsgroup.json
[05:39:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:40:33] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 241, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:46:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:47:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[05:54:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T306560)', diff saved to https://phabricator.wikimedia.org/P25880 and previous config saved to /var/cache/conftool/dbconfig/20220421-055433-ladsgroup.json
[05:54:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[05:54:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[05:54:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:39] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[05:54:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1136 (T306560)', diff saved to https://phabricator.wikimedia.org/P25881 and previous config saved to /var/cache/conftool/dbconfig/20220421-055441-ladsgroup.json
[05:54:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:56:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T306560)', diff saved to https://phabricator.wikimedia.org/P25882 and previous config saved to /var/cache/conftool/dbconfig/20220421-055655-ladsgroup.json
[05:57:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:57:05] <marostegui>	 Amir1: Which part do you want me to do?
[05:57:15] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host backup1006.eqiad.wmnet with OS bullseye
[05:57:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:58:46] <Amir1>	 marostegui: maybe the after bits would be okay
[05:58:52] <marostegui>	 ok
[05:59:20] <marostegui>	 you do the query killers thing
[05:59:26] <marostegui>	 as you'll have the output on the screen :p
[05:59:31] <marostegui>	 anyways, let's go for it
[06:00:05] <jouncebot>	 kormat, marostegui, and Amir1: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Primary database switchover deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T0600).
[06:00:08] <Amir1>	 o/
[06:00:10] <Amir1>	 awesome
[06:00:12] <marostegui>	 let's go
[06:00:14] <Amir1>	 !log Starting s8 eqiad failover from db1109 to db1104 - T303927
[06:00:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:00:20] <stashbot>	 T303927: Switchover s8 master (db1109 -> db1104) - https://phabricator.wikimedia.org/T303927
[06:00:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T303927', diff saved to https://phabricator.wikimedia.org/P25883 and previous config saved to /var/cache/conftool/dbconfig/20220421-060023-ladsgroup.json
[06:00:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:00:39] <marostegui>	 RO confirmed
[06:01:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Promote db1104 to s8 primary and set section read-write T303927', diff saved to https://phabricator.wikimedia.org/P25884 and previous config saved to /var/cache/conftool/dbconfig/20220421-060106-ladsgroup.json
[06:01:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:01:17] <marostegui>	 heartbeat cleaned
[06:01:32] <marostegui>	 I can write
[06:01:37] <Amir1>	 done
[06:02:47] <wikibugs>	 (03PS2) 10Marostegui: wmnet: Update s8-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/784678 (https://phabricator.wikimedia.org/T303927) (owner: 10Ladsgroup)
[06:03:11] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2046 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:03:14] <Amir1>	 marostegui: how do you measure the RO time?
[06:03:25] <marostegui>	 Amir1: dbctl times
[06:03:32] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wmnet: Update s8-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/784678 (https://phabricator.wikimedia.org/T303927) (owner: 10Ladsgroup)
[06:03:39] <marostegui>	 Amir1: let's finish the pending things first
[06:04:33] <marostegui>	 Amir1: before updating the task, refresh it, as you are reverting my changes :p
[06:04:45] <Amir1>	 you're too fast 🤬
[06:05:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1109 T303927', diff saved to https://phabricator.wikimedia.org/P25885 and previous config saved to /var/cache/conftool/dbconfig/20220421-060512-root.json
[06:05:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:05:39] <Amir1>	 so only checking zarcillo left?
[06:05:42] <marostegui>	 I did that
[06:05:53] <Amir1>	 awesome
[06:06:02] <Amir1>	 I let you have fun with db1109
[06:06:04] <marostegui>	 Amir1: I have depooled the old master for the schema changes
[06:06:05] <wikibugs>	 (03PS1) 10Marostegui: db1109: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/785071 (https://phabricator.wikimedia.org/T303927)
[06:06:10] <marostegui>	 But we need to repool it before the weekend
[06:06:24] <marostegui>	 Amir1: I don't have any schema changes pending for db1109, you do :p
[06:06:29] <Amir1>	 I can repool it once I wake up, does that sound good to you?
[06:06:36] <marostegui>	 sounds good
[06:06:40] <marostegui>	 remember to revert: https://gerrit.wikimedia.org/r/c/operations/puppet/+/785071/
[06:06:48] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1109: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/785071 (https://phabricator.wikimedia.org/T303927) (owner: 10Marostegui)
[06:07:02] <Amir1>	 I honestly can't keep track of schema changes I'm doing :P
[06:07:05] <Amir1>	 let me see
[06:07:07] <marostegui>	 you have the RO times?
[06:07:15] <marostegui>	 You can just add them to the task and close it
[06:08:19] <Amir1>	 hmm, only T300381 is pending?
[06:08:20] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[06:08:42] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on backup1006.eqiad.wmnet with reason: host reimage
[06:08:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:08:52] <marostegui>	 Amir1: Ah, I thought you had something else
[06:08:56] <marostegui>	 I will get that started now
[06:09:15] <Amir1>	 I don't remember having any
[06:09:49] <Amir1>	 I started this new thing last night but I ran it on master
[06:09:51] <marostegui>	 Just started it
[06:09:59] <marostegui>	 Anyways, I am going to get breakfast
[06:10:03] <Amir1>	 T306560
[06:10:04] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[06:10:16] <marostegui>	 Close the switchover task whenever you like (add the RO times)
[06:10:25] <Amir1>	 sure
[06:11:37] <logmsgbot>	 !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1006.eqiad.wmnet with reason: host reimage
[06:11:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:12:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P25886 and previous config saved to /var/cache/conftool/dbconfig/20220421-061200-ladsgroup.json
[06:12:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:12:31] <Amir1>	 done
[06:13:39] <Amir1>	 I go rest
[06:15:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P25887 and previous config saved to /var/cache/conftool/dbconfig/20220421-061558-ladsgroup.json
[06:16:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:16:03] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[06:20:32] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2046 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:22:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P25888 and previous config saved to /var/cache/conftool/dbconfig/20220421-062201-ladsgroup.json
[06:22:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:22:07] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[06:25:37] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[06:27:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P25889 and previous config saved to /var/cache/conftool/dbconfig/20220421-062705-ladsgroup.json
[06:27:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:30:31] <logmsgbot>	 !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1006.eqiad.wmnet with OS bullseye
[06:30:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:31:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P25890 and previous config saved to /var/cache/conftool/dbconfig/20220421-063103-ladsgroup.json
[06:31:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:02] <wikibugs>	 (03PS1) 10Majavah: P:openstack::rabbitmq: fix file permissions [puppet] - 10https://gerrit.wikimedia.org/r/785105 (https://phabricator.wikimedia.org/T297268)
[06:32:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[06:33:24] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34934/console" [puppet] - 10https://gerrit.wikimedia.org/r/785105 (https://phabricator.wikimedia.org/T297268) (owner: 10Majavah)
[06:34:00] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.reimage for host backup2006.codfw.wmnet with OS bullseye
[06:34:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:37:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P25891 and previous config saved to /var/cache/conftool/dbconfig/20220421-063706-ladsgroup.json
[06:37:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:42:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T306560)', diff saved to https://phabricator.wikimedia.org/P25892 and previous config saved to /var/cache/conftool/dbconfig/20220421-064210-ladsgroup.json
[06:42:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[06:42:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[06:42:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:42:16] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[06:42:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:42:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:42:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[06:42:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[06:42:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:42:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P25893 and previous config saved to /var/cache/conftool/dbconfig/20220421-064245-ladsgroup.json
[06:42:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:42:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:45:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P25894 and previous config saved to /var/cache/conftool/dbconfig/20220421-064514-ladsgroup.json
[06:45:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:46:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P25895 and previous config saved to /var/cache/conftool/dbconfig/20220421-064608-ladsgroup.json
[06:46:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:47:21] <wikibugs>	 (03PS3) 10Elukey: Add four new k8s worker nodes to ml-serve-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/784701 (https://phabricator.wikimedia.org/T306545)
[06:52:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P25896 and previous config saved to /var/cache/conftool/dbconfig/20220421-065211-ladsgroup.json
[06:52:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:00:05] <jouncebot>	 Amir1, apergos, and taavi: #bothumor My software never has bugs. It just develops random features. Rise for UTC morning backport and config training. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T0700).
[07:00:14] <apergos>	 hello.
[07:00:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P25897 and previous config saved to /var/cache/conftool/dbconfig/20220421-070019-ladsgroup.json
[07:00:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:00:29] <apergos>	 no patches in the window. no trainees signed up.
[07:00:53] <apergos>	 if anyone wants to self deploy something, add yourself to https://wikitech.wikimedia.org/wiki/Deployments and get it done, now's the tmie.
[07:01:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P25898 and previous config saved to /var/cache/conftool/dbconfig/20220421-070113-ladsgroup.json
[07:01:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:01:18] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[07:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[07:02:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[07:02:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[07:02:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:02:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:02:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25899 and previous config saved to /var/cache/conftool/dbconfig/20220421-070208-ladsgroup.json
[07:02:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:02:40] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on backup2006.codfw.wmnet with reason: host reimage
[07:02:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:04:18] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to nda for jmads - https://phabricator.wikimedia.org/T306117 (10Volans) Sorry, I did overlooked the request, as your account is with an `@wikimedia.org` email account I've granted you the `wmf` group in LDAP and revoked the `nda` one as they can't cohexist. But don't...
[07:06:13] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2006.codfw.wmnet with reason: host reimage
[07:06:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P25900 and previous config saved to /var/cache/conftool/dbconfig/20220421-070716-ladsgroup.json
[07:07:18] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[07:07:20] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[07:07:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[07:07:21] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[07:07:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[07:07:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P25901 and previous config saved to /var/cache/conftool/dbconfig/20220421-070729-ladsgroup.json
[07:07:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25902 and previous config saved to /var/cache/conftool/dbconfig/20220421-070744-ladsgroup.json
[07:07:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P25903 and previous config saved to /var/cache/conftool/dbconfig/20220421-071524-ladsgroup.json
[07:15:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:52] <wikibugs>	 (03CR) 10Muehlenhoff: admin: update samtar account (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/784704 (https://phabricator.wikimedia.org/T306518) (owner: 10Volans)
[07:22:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P25904 and previous config saved to /var/cache/conftool/dbconfig/20220421-072249-ladsgroup.json
[07:22:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P25905 and previous config saved to /var/cache/conftool/dbconfig/20220421-073029-ladsgroup.json
[07:30:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[07:30:32] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[07:30:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:35] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[07:30:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P25906 and previous config saved to /var/cache/conftool/dbconfig/20220421-073037-ladsgroup.json
[07:30:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:33:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P25907 and previous config saved to /var/cache/conftool/dbconfig/20220421-073306-ladsgroup.json
[07:33:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:37:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P25908 and previous config saved to /var/cache/conftool/dbconfig/20220421-073755-ladsgroup.json
[07:37:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:40:57] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] [beta] Enable Kartographer nearby feature on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784702 (https://phabricator.wikimedia.org/T304076) (owner: 10WMDE-Fisch)
[07:44:57] <wikibugs>	 (03PS1) 10Majavah: P:openstack::encapi: add tls for write endpoint [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666)
[07:46:45] <wikibugs>	 (03PS1) 10Elukey: profile::ores::web: allow undef passwords for redis [puppet] - 10https://gerrit.wikimedia.org/r/785111
[07:48:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P25909 and previous config saved to /var/cache/conftool/dbconfig/20220421-074811-ladsgroup.json
[07:48:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:49:43] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::ores::web: allow undef passwords for redis [puppet] - 10https://gerrit.wikimedia.org/r/785111 (owner: 10Elukey)
[07:53:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25910 and previous config saved to /var/cache/conftool/dbconfig/20220421-075300-ladsgroup.json
[07:53:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[07:53:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[07:53:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:07] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[07:53:09] <wikibugs>	 (03PS2) 10Majavah: P:openstack::encapi: add tls for write endpoint [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666)
[07:53:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:24] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:53:28] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2006.codfw.wmnet with OS bullseye
[07:53:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:49] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34937/console" [puppet] - 10https://gerrit.wikimedia.org/r/785110 (https://phabricator.wikimedia.org/T274666) (owner: 10Majavah)
[07:53:58] <wikibugs>	 (03PS2) 10Volans: admin: update samtar account [puppet] - 10https://gerrit.wikimedia.org/r/784704 (https://phabricator.wikimedia.org/T306518)
[07:54:06] <wikibugs>	 (03CR) 10Volans: "addressed comment" [puppet] - 10https://gerrit.wikimedia.org/r/784704 (https://phabricator.wikimedia.org/T306518) (owner: 10Volans)
[07:55:10] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/784704 (https://phabricator.wikimedia.org/T306518) (owner: 10Volans)
[07:55:16] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:55:38] <wikibugs>	 (03CR) 10Volans: [C: 03+2] admin: update samtar account [puppet] - 10https://gerrit.wikimedia.org/r/784704 (https://phabricator.wikimedia.org/T306518) (owner: 10Volans)
[07:55:47] <wikibugs>	 (03PS3) 10Volans: admin: update samtar account [puppet] - 10https://gerrit.wikimedia.org/r/784704 (https://phabricator.wikimedia.org/T306518)
[07:57:28] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[07:57:29] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[07:57:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25911 and previous config saved to /var/cache/conftool/dbconfig/20220421-075734-ladsgroup.json
[07:57:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:59:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestagemaster1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:03:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P25912 and previous config saved to /var/cache/conftool/dbconfig/20220421-080316-ladsgroup.json
[08:03:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:03:38] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:04:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25913 and previous config saved to /var/cache/conftool/dbconfig/20220421-080420-ladsgroup.json
[08:04:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:04:25] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[08:07:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P25914 and previous config saved to /var/cache/conftool/dbconfig/20220421-080744-ladsgroup.json
[08:07:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/784742 (owner: 10Vivian Rook)
[08:11:14] <wikibugs>	 (03CR) 10Svantje Lilienthal: [C: 03+1] [beta] Enable Kartographer nearby feature on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784702 (https://phabricator.wikimedia.org/T304076) (owner: 10WMDE-Fisch)
[08:11:44] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host backup1005.eqiad.wmnet with OS bullseye
[08:11:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:17:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: populate target index format and add pipeline diagnostics [puppet] - 10https://gerrit.wikimedia.org/r/775375 (https://phabricator.wikimedia.org/T305090) (owner: 10Cwhite)
[08:18:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: rewrite ecs settings [puppet] - 10https://gerrit.wikimedia.org/r/777887 (https://phabricator.wikimedia.org/T305013) (owner: 10Cwhite)
[08:18:14] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: transform rotation frequency values to datestamp format [puppet] - 10https://gerrit.wikimedia.org/r/777882 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[08:18:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P25915 and previous config saved to /var/cache/conftool/dbconfig/20220421-081821-ladsgroup.json
[08:18:23] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[08:18:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[08:18:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:18:28] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[08:18:29] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: set partition on legacy indexes [puppet] - 10https://gerrit.wikimedia.org/r/777880 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[08:18:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1098:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P25916 and previous config saved to /var/cache/conftool/dbconfig/20220421-081829-ladsgroup.json
[08:18:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:18:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:18:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:19:24] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for Samtar (TheresNoTime) - https://phabricator.wikimedia.org/T306518 (10KSiebert) @Volans Hey there, I am Sammy's manager and am approving!
[08:19:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P25917 and previous config saved to /var/cache/conftool/dbconfig/20220421-081925-ladsgroup.json
[08:19:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P25918 and previous config saved to /var/cache/conftool/dbconfig/20220421-082249-ladsgroup.json
[08:22:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:32] <WMDE-Fisch>	 !nowandnext
[08:23:37] <wikibugs>	 (03PS1) 10Muehlenhoff: Apply role::webperf::processors_and_site to webperf1003/2003 [puppet] - 10https://gerrit.wikimedia.org/r/785115 (https://phabricator.wikimedia.org/T305460)
[08:23:39] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch webperf1001/1003 for eventual removal [puppet] - 10https://gerrit.wikimedia.org/r/785116 (https://phabricator.wikimedia.org/T205460)
[08:23:49] <WMDE-Fisch>	 now
[08:24:05] <WMDE-Fisch>	 jouncebot: now
[08:24:05] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 35 minute(s)
[08:24:23] * WMDE-Fisch merging a beta cluster config change
[08:24:56] <wikibugs>	 (03CR) 10WMDE-Fisch: [C: 03+2] [beta] Enable Kartographer nearby feature on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784702 (https://phabricator.wikimedia.org/T304076) (owner: 10WMDE-Fisch)
[08:25:37] <wikibugs>	 (03Merged) 10jenkins-bot: [beta] Enable Kartographer nearby feature on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784702 (https://phabricator.wikimedia.org/T304076) (owner: 10WMDE-Fisch)
[08:25:48] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: host reimage
[08:25:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend Ferm rules for new webperf hosts [puppet] - 10https://gerrit.wikimedia.org/r/785117 (https://phabricator.wikimedia.org/T305460)
[08:27:34] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove obsolete webperf hosts [puppet] - 10https://gerrit.wikimedia.org/r/785118 (https://phabricator.wikimedia.org/T305460)
[08:28:07] * WMDE-Fisch done
[08:29:13] <logmsgbot>	 !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: host reimage
[08:29:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:34] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[08:30:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[08:30:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[08:30:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:43] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[08:30:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:22] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2022-04-21-081331-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/785120 (https://phabricator.wikimedia.org/T305115)
[08:34:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P25919 and previous config saved to /var/cache/conftool/dbconfig/20220421-083430-ladsgroup.json
[08:34:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:36:28] <wikibugs>	 (03PS5) 10Jcrespo: admin: Add placeholder to reserve uid and gid 914 for minio-user [puppet] - 10https://gerrit.wikimedia.org/r/784633 (https://phabricator.wikimedia.org/T305446)
[08:37:12] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] admin: Add placeholder to reserve uid and gid 914 for minio-user [puppet] - 10https://gerrit.wikimedia.org/r/784633 (https://phabricator.wikimedia.org/T305446) (owner: 10Jcrespo)
[08:37:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P25920 and previous config saved to /var/cache/conftool/dbconfig/20220421-083754-ladsgroup.json
[08:37:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:44:09] <wikibugs>	 (03CR) 10Volans: "I've left some comment/question inline." [puppet] - 10https://gerrit.wikimedia.org/r/784697 (owner: 10Ssingh)
[08:48:49] <logmsgbot>	 !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1005.eqiad.wmnet with OS bullseye
[08:48:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25921 and previous config saved to /var/cache/conftool/dbconfig/20220421-084935-ladsgroup.json
[08:49:37] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[08:49:39] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[08:49:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:40] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[08:49:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1130 (T298565)', diff saved to https://phabricator.wikimedia.org/P25922 and previous config saved to /var/cache/conftool/dbconfig/20220421-084943-ladsgroup.json
[08:49:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:59] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Samtar (TheresNoTime) - https://phabricator.wikimedia.org/T306518 (10Volans) 05Open→03Resolved a:03Volans @KSiebert thanks, it's all done. There was some confusion based on which email should the account be associated with. @TheresNoTime I'm res...
[08:52:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1130 (T298565)', diff saved to https://phabricator.wikimedia.org/P25923 and previous config saved to /var/cache/conftool/dbconfig/20220421-085214-ladsgroup.json
[08:52:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:52:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P25924 and previous config saved to /var/cache/conftool/dbconfig/20220421-085259-ladsgroup.json
[08:53:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[08:53:02] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[08:53:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:53:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:53:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P25925 and previous config saved to /var/cache/conftool/dbconfig/20220421-085307-ladsgroup.json
[08:53:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:53:12] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.reimage for host backup2005.codfw.wmnet with OS bullseye
[08:53:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:53:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:55:43] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host backup1004.eqiad.wmnet with OS bullseye
[08:55:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:02:44] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[09:04:00] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[09:06:51] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on backup2005.codfw.wmnet with reason: host reimage
[09:06:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:07:11] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on backup1004.eqiad.wmnet with reason: host reimage
[09:07:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:10] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2005.codfw.wmnet with reason: host reimage
[09:10:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:12:46] <logmsgbot>	 !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1004.eqiad.wmnet with reason: host reimage
[09:12:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:14:13] <wikibugs>	 (03PS1) 10Jcrespo: Revert "dumps: Block python requests UA" [puppet] - 10https://gerrit.wikimedia.org/r/784715
[09:14:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "dumps: Block python requests UA" [puppet] - 10https://gerrit.wikimedia.org/r/784715 (owner: 10Jcrespo)
[09:15:57] <wikibugs>	 (03PS2) 10Jcrespo: Revert "dumps: Block python requests UA" [puppet] - 10https://gerrit.wikimedia.org/r/784715
[09:18:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P25926 and previous config saved to /var/cache/conftool/dbconfig/20220421-091843-ladsgroup.json
[09:18:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:18:49] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[09:25:05] <wikibugs>	 10SRE, 10Traffic: haproxy tls terminator autobanning - https://phabricator.wikimedia.org/T306580 (10Volans) p:05Triage→03Medium
[09:26:34] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops: allow certain users to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10Volans) p:05Triage→03Medium
[09:28:39] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations: Validate all yaml files in puppet.git - https://phabricator.wikimedia.org/T305676 (10Volans) p:05Triage→03Medium
[09:32:54] <volans>	 Krinkle: when you have a moment could you set the priority of T305794, thanks
[09:32:54] <stashbot>	 T305794: Let X-Analytics response header pass through with WikimediaDebug - https://phabricator.wikimedia.org/T305794
[09:33:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P25927 and previous config saved to /var/cache/conftool/dbconfig/20220421-093348-ladsgroup.json
[09:33:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:30] <wikibugs>	 10SRE, 10conftool: Provide a meaningful Retry-After value - https://phabricator.wikimedia.org/T305824 (10Vgutierrez) p:05Triage→03Medium
[09:35:36] <logmsgbot>	 !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1004.eqiad.wmnet with OS bullseye
[09:35:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:17] <wikibugs>	 (03PS1) 10Elukey: ores: support Celery 4 and 5 configurations [puppet] - 10https://gerrit.wikimedia.org/r/785124 (https://phabricator.wikimedia.org/T303801)
[09:37:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[09:37:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[09:37:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:41:28] <moritzm>	 !log upgrading the Ganeti test cluster to 3.0 T306499
[09:41:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:41:33] <stashbot>	 T306499: Upgrade ganeti-test to Bullseye - https://phabricator.wikimedia.org/T306499
[09:41:47] <wikibugs>	 (03PS2) 10Elukey: ores: support Celery 4 and 5 configurations [puppet] - 10https://gerrit.wikimedia.org/r/785124 (https://phabricator.wikimedia.org/T303801)
[09:41:55] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[09:41:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[09:41:58] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[09:41:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:41:59] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2005.codfw.wmnet with OS bullseye
[09:42:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[09:42:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[09:43:49] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[09:43:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:01] <wikibugs>	 (03PS3) 10Elukey: ores: support Celery 4 and 5 configurations [puppet] - 10https://gerrit.wikimedia.org/r/785124 (https://phabricator.wikimedia.org/T303801)
[09:46:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:46:52] <wikibugs>	 (03PS4) 10Elukey: ores: support Celery 4 and 5 configurations [puppet] - 10https://gerrit.wikimedia.org/r/785124 (https://phabricator.wikimedia.org/T303801)
[09:47:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[09:48:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[09:48:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[09:48:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1110 (T298565)', diff saved to https://phabricator.wikimedia.org/P25928 and previous config saved to /var/cache/conftool/dbconfig/20220421-094807-ladsgroup.json
[09:48:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:13] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[09:48:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P25929 and previous config saved to /var/cache/conftool/dbconfig/20220421-094853-ladsgroup.json
[09:48:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:49:49] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34941/console" [puppet] - 10https://gerrit.wikimedia.org/r/785124 (https://phabricator.wikimedia.org/T303801) (owner: 10Elukey)
[09:51:57] <wikibugs>	 (03PS2) 10Vgutierrez: cache::haproxy: Log emergency messages to disk [puppet] - 10https://gerrit.wikimedia.org/r/784256 (https://phabricator.wikimedia.org/T306236)
[09:52:17] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] ores: support Celery 4 and 5 configurations [puppet] - 10https://gerrit.wikimedia.org/r/785124 (https://phabricator.wikimedia.org/T303801) (owner: 10Elukey)
[09:52:28] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ping1002.eqiad.wmnet
[09:52:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:53:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P25930 and previous config saved to /var/cache/conftool/dbconfig/20220421-095322-ladsgroup.json
[09:53:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:53:27] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[09:54:29] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1002.eqiad.wmnet
[09:54:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:03] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.07 ms
[09:55:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298565)', diff saved to https://phabricator.wikimedia.org/P25931 and previous config saved to /var/cache/conftool/dbconfig/20220421-095529-ladsgroup.json
[09:55:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:20] <wikibugs>	 10SRE, 10Analytics, 10Traffic: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) Thanks @Vgutierrez for bringing this to our attention. I agree that we should try to find the cause of these errors and eradicate it if at all...
[09:57:59] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) a:03BTullis
[10:00:05] <jouncebot>	 mvolz: Dear deployers, time to do the Services – Citoid / Zotero deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T1000).
[10:03:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T306560)', diff saved to https://phabricator.wikimedia.org/P25932 and previous config saved to /var/cache/conftool/dbconfig/20220421-100359-ladsgroup.json
[10:04:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[10:04:02] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[10:04:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:05] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[10:04:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:15] <wikibugs>	 (03PS1) 10Muehlenhoff: Fix up host globbing for ping servers [puppet] - 10https://gerrit.wikimedia.org/r/785125
[10:04:30] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
[10:04:31] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
[10:04:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
[10:04:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
[10:04:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P25933 and previous config saved to /var/cache/conftool/dbconfig/20220421-100827-ladsgroup.json
[10:08:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:28] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1109: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/784716
[10:10:31] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db1109: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/784716 (owner: 10Marostegui)
[10:10:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P25934 and previous config saved to /var/cache/conftool/dbconfig/20220421-101034-ladsgroup.json
[10:10:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 1%: After schema change', diff saved to https://phabricator.wikimedia.org/P25935 and previous config saved to /var/cache/conftool/dbconfig/20220421-101127-root.json
[10:11:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:46] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ping2002.codfw.wmnet
[10:14:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:30] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34942/console" [puppet] - 10https://gerrit.wikimedia.org/r/784701 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[10:17:44] <wikibugs>	 10SRE, 10Search-Console-access-request: Update Documentation and Process for Access to Search Consoles - https://phabricator.wikimedia.org/T303513 (10jcrespo) @SCherukuwada I've asked some of the people in charge of operational security at SRE and they advised that the easiest way to handle expiration is to mo...
[10:18:00] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2002.codfw.wmnet
[10:18:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:23:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P25936 and previous config saved to /var/cache/conftool/dbconfig/20220421-102332-ladsgroup.json
[10:23:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P25937 and previous config saved to /var/cache/conftool/dbconfig/20220421-102539-ladsgroup.json
[10:25:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:27] <wikibugs>	 10SRE: Upgrade ganeti-test to Bullseye - https://phabricator.wikimedia.org/T306499 (10MoritzMuehlenhoff)
[10:26:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P25938 and previous config saved to /var/cache/conftool/dbconfig/20220421-102631-root.json
[10:26:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:35] <icinga-wm>	 PROBLEM - Check systemd state on alert1001 is CRITICAL: CRITICAL - degraded: The following units failed: certspotter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:30:01] <logmsgbot>	 !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
[10:30:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:07] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.reimage for host backup2004.codfw.wmnet with OS bullseye
[10:32:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[10:38:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P25939 and previous config saved to /var/cache/conftool/dbconfig/20220421-103837-ladsgroup.json
[10:38:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:43] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[10:40:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298565)', diff saved to https://phabricator.wikimedia.org/P25940 and previous config saved to /var/cache/conftool/dbconfig/20220421-104044-ladsgroup.json
[10:40:46] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[10:40:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[10:40:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:49] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[10:40:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[10:40:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P25941 and previous config saved to /var/cache/conftool/dbconfig/20220421-104057-ladsgroup.json
[10:40:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:41:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:41:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:41:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P25942 and previous config saved to /var/cache/conftool/dbconfig/20220421-104135-root.json
[10:41:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:42:00] <wikibugs>	 10SRE, 10Search-Console-access-request: Update Documentation and Process for Access to Search Consoles - https://phabricator.wikimedia.org/T303513 (10jcrespo) > I will double check in case there is some pending expiration in the current calendar, These are the ones that should have been acted on (probably you...
[10:46:08] <wikibugs>	 (03PS1) 10Roman Stolar: Remove Thumbor Community Core as Wikimedia Thumbor dependency [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/785127 (https://phabricator.wikimedia.org/T305053)
[10:46:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Remove Thumbor Community Core as Wikimedia Thumbor dependency [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/785127 (https://phabricator.wikimedia.org/T305053) (owner: 10Roman Stolar)
[10:48:00] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ping3002.esams.wmnet
[10:48:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:50:38] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on backup2004.codfw.wmnet with reason: host reimage
[10:50:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:52:00] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping3002.esams.wmnet
[10:52:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:54:47] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2004.codfw.wmnet with reason: host reimage
[10:54:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:55:34] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[10:55:35] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[10:55:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:55:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:56:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P25944 and previous config saved to /var/cache/conftool/dbconfig/20220421-105638-root.json
[10:56:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:57:39] <wikibugs>	 (03PS2) 10Roman Stolar: Remove Thumbor Community Core as Wikimedia Thumbor dependency [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/785127 (https://phabricator.wikimedia.org/T305053)
[10:58:00] <icinga-wm>	 RECOVERY - Check systemd state on alert1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:58:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P25945 and previous config saved to /var/cache/conftool/dbconfig/20220421-105835-root.json
[10:58:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:59:29] <wikibugs>	 (03CR) 10Vgutierrez: "@fgiunchedi this is currently being used on traffic-cache-atstext-buster.traffic.eqiad1.wikimedia.cloud and it seems to be working as expe" [puppet] - 10https://gerrit.wikimedia.org/r/784256 (https://phabricator.wikimedia.org/T306236) (owner: 10Vgutierrez)
[11:01:16] <icinga-wm>	 PROBLEM - traffic_server backend process restarted on cp2036 is CRITICAL: 3 ge 2 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=codfw+prometheus/ops&var-instance=cp2036&var-layer=backend
[11:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[11:04:44] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:05:43] <wikibugs>	 (03PS1) 10Majavah: add dummy password for cloudinfra token validator [labs/private] - 10https://gerrit.wikimedia.org/r/785128 (https://phabricator.wikimedia.org/T274666)
[11:05:45] <logmsgbot>	 !log jynus@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS bullseye
[11:05:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:10:44] <wikibugs>	 (03PS1) 10Marostegui: db2088: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/785133 (https://phabricator.wikimedia.org/T306604)
[11:11:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P25946 and previous config saved to /var/cache/conftool/dbconfig/20220421-111144-root.json
[11:11:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:13:05] <wikibugs>	 (03PS1) 10Majavah: P:openstack::encapi: add keystone token verification [puppet] - 10https://gerrit.wikimedia.org/r/785134 (https://phabricator.wikimedia.org/T274666)
[11:13:35] <marostegui>	 !log dbmaint s2@codfw T306604
[11:13:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:13:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P25947 and previous config saved to /var/cache/conftool/dbconfig/20220421-111340-root.json
[11:13:41] <stashbot>	 T306604: db2088 filling up - https://phabricator.wikimedia.org/T306604
[11:13:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:14:12] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2004.codfw.wmnet with OS bullseye
[11:14:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:15:44] * kart_ updating cxserver..
[11:16:01] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2022-04-21-081331-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/785120 (https://phabricator.wikimedia.org/T305115) (owner: 10KartikMistry)
[11:19:24] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2088: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/785133 (https://phabricator.wikimedia.org/T306604) (owner: 10Marostegui)
[11:19:38] <icinga-wm>	 PROBLEM - Check systemd state on cp2036 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_nagios-nrpe-server.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:21:14] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2022-04-21-081331-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/785120 (https://phabricator.wikimedia.org/T305115) (owner: 10KartikMistry)
[11:22:44] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] START helmfile.d/services/cxserver: apply
[11:22:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:17] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[11:23:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:26:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P25948 and previous config saved to /var/cache/conftool/dbconfig/20220421-112648-root.json
[11:26:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:02] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[11:27:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:02] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[11:28:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P25949 and previous config saved to /var/cache/conftool/dbconfig/20220421-112843-root.json
[11:28:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:54] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[11:29:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:30:54] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[11:30:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:34:36] <kart_>	 !log Updated cxserver to 2022-04-21-081331-production (T287655, T304855, T304862, T304866, T305115)
[11:34:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:34:47] <stashbot>	 T287655: Generate template parameter alignments for en > de wikis - https://phabricator.wikimedia.org/T287655
[11:34:47] <stashbot>	 T304862: Enable Content and Section Translation for Basque Wikipedia - https://phabricator.wikimedia.org/T304862
[11:34:47] <stashbot>	 T304855: Enable Content and Section Translation for Czech Wikipedia - https://phabricator.wikimedia.org/T304855
[11:34:47] <stashbot>	 T305115: Generate template parameter alignments for wikis (April-June) - https://phabricator.wikimedia.org/T305115
[11:34:48] <stashbot>	 T304866: Enable Content and Section Translation for Central Kurdish Wikipedia - https://phabricator.wikimedia.org/T304866
[11:35:23] <moritzm>	 !log installing zlib security updates on stretch (buster/bullseye already fixed)
[11:35:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:15] <icinga-wm>	 PROBLEM - Varnish frontend child restarted on cp2036 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp2036&var-datasource=codfw+prometheus/ops
[11:39:48] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[11:39:50] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[11:39:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:41:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P25950 and previous config saved to /var/cache/conftool/dbconfig/20220421-114112-ladsgroup.json
[11:41:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:41:18] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[11:43:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P25951 and previous config saved to /var/cache/conftool/dbconfig/20220421-114347-root.json
[11:43:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:49:43] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:56:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P25952 and previous config saved to /var/cache/conftool/dbconfig/20220421-115617-ladsgroup.json
[11:56:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:58:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P25953 and previous config saved to /var/cache/conftool/dbconfig/20220421-115851-root.json
[11:58:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:59:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestagemaster1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[11:59:15] <icinga-wm>	 RECOVERY - Check systemd state on cp2036 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:59:53] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) My suspicion is that these workers need more CPU and/or memory. We recently doubled the number of replica...
[12:01:33] <wikibugs>	 (03PS1) 10Jelto: site: use appserver in codfw C3, cleanup duplicate insetup definition [puppet] - 10https://gerrit.wikimedia.org/r/785147 (https://phabricator.wikimedia.org/T290192)
[12:06:02] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrading Wikidough and durum VMs to bullseye - https://phabricator.wikimedia.org/T305589 (10ssingh) p:05Triage→03Medium
[12:08:44] <wikibugs>	 (03CR) 10Jelto: "mw2412 to mw2419 have role insetup and are not pooled. They should have a proper role assigned because racking and os installation happene" [puppet] - 10https://gerrit.wikimedia.org/r/785147 (https://phabricator.wikimedia.org/T290192) (owner: 10Jelto)
[12:10:54] <moritzm>	 !log installing subversion security updates
[12:10:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P25954 and previous config saved to /var/cache/conftool/dbconfig/20220421-121122-ladsgroup.json
[12:11:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:12:21] <wikibugs>	 (03CR) 10Vivian Rook: [C: 03+2] Add Vivian Rook to icinga [puppet] - 10https://gerrit.wikimedia.org/r/784742 (owner: 10Vivian Rook)
[12:13:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P25955 and previous config saved to /var/cache/conftool/dbconfig/20220421-121355-root.json
[12:13:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:15:06] <wikibugs>	 (03PS2) 10Jelto: site: use appserver in codfw C3, cleanup duplicate insetup definition [puppet] - 10https://gerrit.wikimedia.org/r/785147 (https://phabricator.wikimedia.org/T290192)
[12:16:35] <wikibugs>	 (03PS1) 10Muehlenhoff: No longer install subversion on Phabricator hosts [puppet] - 10https://gerrit.wikimedia.org/r/785149
[12:19:53] <wikibugs>	 (03PS1) 10Btullis: Increase the RAM request and limit for eventgate pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/785151 (https://phabricator.wikimedia.org/T306181)
[12:20:30] <moritzm>	 !log installing openjpeg2 security updates
[12:20:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:23:07] <RhinosF1>	 Moritzm: svn is used
[12:23:09] <RhinosF1>	 https://phabricator.wikimedia.org/diffusion/query/NCbVBYAxI8aR/#R
[12:23:58] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
[12:24:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
[12:24:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
[12:24:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:24:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:24:06] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
[12:24:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:24:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:24:26] <wikibugs>	 (03CR) 10RhinosF1: "uses https://phabricator.wikimedia.org/diffusion/query/NCbVBYAxI8aR/#R" [puppet] - 10https://gerrit.wikimedia.org/r/785149 (owner: 10Muehlenhoff)
[12:24:28] <wikibugs>	 (03CR) 10Btullis: "I'm hoping that this RAM increase will reduce the error rate affecting intake-analytics.wikimedia.org as well as reduce the apparent laten" [deployment-charts] - 10https://gerrit.wikimedia.org/r/785151 (https://phabricator.wikimedia.org/T306181) (owner: 10Btullis)
[12:24:39] <moritzm>	 RhinosF1: oh, I missed that! I'll abandon, then. thanks
[12:25:01] <moritzm>	 !log installing flac security updates
[12:25:04] <RhinosF1>	 moritzm: np
[12:25:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:26:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P25956 and previous config saved to /var/cache/conftool/dbconfig/20220421-122627-ladsgroup.json
[12:26:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:26:33] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[12:27:16] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[12:27:18] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[12:27:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:27:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1110 (T298565)', diff saved to https://phabricator.wikimedia.org/P25957 and previous config saved to /var/cache/conftool/dbconfig/20220421-122722-ladsgroup.json
[12:27:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:27:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:28:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P25958 and previous config saved to /var/cache/conftool/dbconfig/20220421-122859-root.json
[12:29:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:30:21] <moritzm>	 !log installing fribidi security updates
[12:30:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:33:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298565)', diff saved to https://phabricator.wikimedia.org/P25959 and previous config saved to /var/cache/conftool/dbconfig/20220421-123347-ladsgroup.json
[12:33:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:33:53] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[12:34:50] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host thumbor2003.codfw.wmnet
[12:34:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:44:49] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2003.codfw.wmnet
[12:44:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:45:11] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host thumbor2004.codfw.wmnet
[12:45:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P25960 and previous config saved to /var/cache/conftool/dbconfig/20220421-124852-ladsgroup.json
[12:48:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:38] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:50:55] <wikibugs>	 (03CR) 10Ayounsi: Fix up host globbing for ping servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/785125 (owner: 10Muehlenhoff)
[12:55:07] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2004.codfw.wmnet
[12:55:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:30] <wikibugs>	 (03CR) 10Muehlenhoff: Fix up host globbing for ping servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/785125 (owner: 10Muehlenhoff)
[12:55:36] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host thumbor2005.codfw.wmnet
[12:55:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:58:42] <wikibugs>	 (03Abandoned) 10Muehlenhoff: No longer install subversion on Phabricator hosts [puppet] - 10https://gerrit.wikimedia.org/r/785149 (owner: 10Muehlenhoff)
[13:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, and Urbanecm: How many deployers does it take to do UTC afternoon backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T1300).
[13:00:04] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[13:00:20] <urbanecm>	 let me steal the window
[13:00:33] <wikibugs>	 (03PS2) 10Urbanecm: plwiki: Fix cascading protection configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784619 (https://phabricator.wikimedia.org/T306300)
[13:00:38] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] plwiki: Fix cascading protection configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784619 (https://phabricator.wikimedia.org/T306300) (owner: 10Urbanecm)
[13:01:21] <wikibugs>	 (03Merged) 10jenkins-bot: plwiki: Fix cascading protection configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784619 (https://phabricator.wikimedia.org/T306300) (owner: 10Urbanecm)
[13:02:44] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[13:02:47] <vgutierrez>	 !log restart ats-be and varnish-fe on cp2036 to clear restarted service alerts
[13:02:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:31] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 7d5114e80567663cad7415e985fdb8191ef9d4b6: plwiki: Fix cascading protection configuration (T306300) (duration: 00m 55s)
[13:03:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:36] * urbanecm done
[13:03:36] <stashbot>	 T306300: Fix $wgCascadingRestrictionLevels for plwiki - https://phabricator.wikimedia.org/T306300
[13:03:50] <icinga-wm>	 RECOVERY - Varnish frontend child restarted on cp2036 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp2036&var-datasource=codfw+prometheus/ops
[13:03:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P25961 and previous config saved to /var/cache/conftool/dbconfig/20220421-130357-ladsgroup.json
[13:04:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:32] <icinga-wm>	 RECOVERY - traffic_server backend process restarted on cp2036 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=codfw+prometheus/ops&var-instance=cp2036&var-layer=backend
[13:04:51] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2005.codfw.wmnet
[13:04:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:07:45] <wikibugs>	 (03Restored) 10Dzahn: No longer install subversion on Phabricator hosts [puppet] - 10https://gerrit.wikimedia.org/r/785149 (owner: 10Muehlenhoff)
[13:08:47] <wikibugs>	 (03CR) 10Dzahn: "Let me use this as a reminder to find out if we can remove both. I was currently trying to shutdown git repos on Phabricator anyways. So w" [puppet] - 10https://gerrit.wikimedia.org/r/785149 (owner: 10Muehlenhoff)
[13:08:49] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:08:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:08:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:08:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:08:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:08:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:52] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host thumbor2006.codfw.wmnet
[13:09:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:30] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10cmooney) Thanks @Cmjohnson   Note the IP addresses assigned to the servers need to be updated to match those vlans.
[13:14:18] <wikibugs>	 (03PS1) 10Jcrespo: mediabackup: Preconfigure mc client config on worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/785156 (https://phabricator.wikimedia.org/T305446)
[13:14:35] <wikibugs>	 (03PS2) 10Jcrespo: mediabackup: Preconfigure mc client config on worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/785156 (https://phabricator.wikimedia.org/T305446)
[13:15:37] <wikibugs>	 (03PS3) 10Jcrespo: mediabackup: Preconfigure mc client config on worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/785156 (https://phabricator.wikimedia.org/T305446)
[13:17:06] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[13:17:08] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[13:17:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:17:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1179 (T298565)', diff saved to https://phabricator.wikimedia.org/P25962 and previous config saved to /var/cache/conftool/dbconfig/20220421-131713-ladsgroup.json
[13:17:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:17:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:17:19] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[13:19:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298565)', diff saved to https://phabricator.wikimedia.org/P25963 and previous config saved to /var/cache/conftool/dbconfig/20220421-131902-ladsgroup.json
[13:19:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[13:19:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[13:19:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:59] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2006.codfw.wmnet
[13:20:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:19] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[13:23:21] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[13:23:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[13:23:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[13:23:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[13:25:11] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[13:25:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:22] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10netops: Spicerack: add network devices support - https://phabricator.wikimedia.org/T306552 (10Volans) Thanks for opening the task to discuss details. As the first feedback I've a primary question that is how you envision this new third way...
[13:26:32] <taavi>	 jouncebot: nowandnext
[13:26:32] <jouncebot>	 For the next 0 hour(s) and 33 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T1300)
[13:26:32] <jouncebot>	 In 2 hour(s) and 33 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T1600)
[13:27:06] <taavi>	 ah, perfect - /me deploying an interwiki cache update
[13:27:16] <wikibugs>	 (03PS4) 10Jcrespo: mediabackup: Preconfigure mc client config on worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/785156 (https://phabricator.wikimedia.org/T305446)
[13:27:24] <wikibugs>	 10SRE, 10conftool: Annotate X-Analytics header with any matching actions - https://phabricator.wikimedia.org/T305582 (10CDanis) p:05Triage→03High
[13:27:25] <wikibugs>	 (03PS5) 10Jcrespo: mediabackup: Preconfigure mc client config on worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/785156 (https://phabricator.wikimedia.org/T305446)
[13:27:30] <wikibugs>	 10SRE, 10SRE-OnFire, 10observability, 10I18n: Internationalization (i18n) & localization (l10n) of www.wikimediastatus.net - https://phabricator.wikimedia.org/T305896 (10CDanis) p:05Triage→03Medium
[13:28:33] <wikibugs>	 (03PS1) 10Majavah: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785157
[13:28:46] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785157 (owner: 10Majavah)
[13:29:28] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[13:29:29] <wikibugs>	 (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785157 (owner: 10Majavah)
[13:29:30] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[13:29:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1130 (T298565)', diff saved to https://phabricator.wikimedia.org/P25964 and previous config saved to /var/cache/conftool/dbconfig/20220421-132935-ladsgroup.json
[13:29:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:41] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[13:30:54] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mediabackup: Preconfigure mc client config on worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/785156 (https://phabricator.wikimedia.org/T305446) (owner: 10Jcrespo)
[13:31:39] <logmsgbot>	 !log taavi@deploy1002 Synchronized wmf-config/interwiki.php: Config: [[gerrit:785157|Update interwiki cache]] (duration: 00m 51s)
[13:31:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:32:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1130 (T298565)', diff saved to https://phabricator.wikimedia.org/P25965 and previous config saved to /var/cache/conftool/dbconfig/20220421-133204-ladsgroup.json
[13:32:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:32:38] * taavi done
[13:34:14] <wikibugs>	 (03PS4) 10Elukey: Add four new k8s worker nodes to ml-serve-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/784701 (https://phabricator.wikimedia.org/T306545)
[13:34:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:34:20] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:34:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:34:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:25] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:34:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:31] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] Fix up host globbing for ping servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/785125 (owner: 10Muehlenhoff)
[13:36:27] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 2 NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34945/console" [puppet] - 10https://gerrit.wikimedia.org/r/784701 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[13:38:04] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] Add four new k8s worker nodes to ml-serve-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/784701 (https://phabricator.wikimedia.org/T306545) (owner: 10Elukey)
[13:40:08] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10netops: Spicerack: add network devices support - https://phabricator.wikimedia.org/T306552 (10ayounsi) Yeah, I'm expecting Netbox to always be the source of truth so a homer run after a spicerack run would be a NOOP.  `junos-eznc` is what I...
[13:45:23] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host thumbor1001.eqiad.wmnet
[13:45:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:45:52] <wikibugs>	 (03PS2) 10Muehlenhoff: Fix up host globbing for ping servers [puppet] - 10https://gerrit.wikimedia.org/r/785125
[13:46:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:47:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[13:49:22] <wikibugs>	 (03PS1) 10Jcrespo: mediabackups: Fix formatting and syntax error on mc config template [puppet] - 10https://gerrit.wikimedia.org/r/785161 (https://phabricator.wikimedia.org/T305446)
[13:50:05] <wikibugs>	 (03PS2) 10Jcrespo: mediabackups: Fix formatting and syntax error on mc config template [puppet] - 10https://gerrit.wikimedia.org/r/785161 (https://phabricator.wikimedia.org/T305446)
[13:50:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mediabackups: Fix formatting and syntax error on mc config template [puppet] - 10https://gerrit.wikimedia.org/r/785161 (https://phabricator.wikimedia.org/T305446) (owner: 10Jcrespo)
[13:54:40] <moritzm>	 !log powercycling thumbor1001, stuck on reboot
[13:54:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:43] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host parse1019.mgmt.eqiad.wmnet with reboot policy FORCED
[13:55:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:46] <wikibugs>	 (03PS3) 10Jcrespo: mediabackups: Fix formatting and syntax error on mc config template [puppet] - 10https://gerrit.wikimedia.org/r/785161 (https://phabricator.wikimedia.org/T305446)
[13:55:53] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host parse1020.mgmt.eqiad.wmnet with reboot policy FORCED
[13:55:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:56:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298565)', diff saved to https://phabricator.wikimedia.org/P25966 and previous config saved to /var/cache/conftool/dbconfig/20220421-135621-ladsgroup.json
[13:56:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:56:27] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[13:58:13] <wikibugs>	 (03PS4) 10Jcrespo: mediabackups: Fix formatting and syntax error on mc config template [puppet] - 10https://gerrit.wikimedia.org/r/785161 (https://phabricator.wikimedia.org/T305446)
[13:58:24] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1120.eqiad.wmnet with reason: Rebooting for T303174
[13:58:26] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1120.eqiad.wmnet with reason: Rebooting for T303174
[13:58:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:31] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1120 depooling: Rebooting for T303174', diff saved to https://phabricator.wikimedia.org/P25967 and previous config saved to /var/cache/conftool/dbconfig/20220421-135831-kormat.json
[13:58:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:53] <wikibugs>	 (03PS1) 10Gergő Tisza: [beta] WelcomeSurveyExperimentalGroups: Use enwiki instead of eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785165 (https://phabricator.wikimedia.org/T303240)
[13:59:15] <wikibugs>	 (03PS2) 10Gergő Tisza: [beta] WelcomeSurveyExperimentalGroups: Use enwiki instead of eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785165 (https://phabricator.wikimedia.org/T303240)
[13:59:39] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mediabackups: Fix formatting and syntax error on mc config template [puppet] - 10https://gerrit.wikimedia.org/r/785161 (https://phabricator.wikimedia.org/T305446) (owner: 10Jcrespo)
[13:59:49] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] add dummy password for cloudinfra token validator [labs/private] - 10https://gerrit.wikimedia.org/r/785128 (https://phabricator.wikimedia.org/T274666) (owner: 10Majavah)
[14:02:33] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1001.eqiad.wmnet
[14:02:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:54] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] [beta] WelcomeSurveyExperimentalGroups: Use enwiki instead of eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785165 (https://phabricator.wikimedia.org/T303240) (owner: 10Gergő Tisza)
[14:03:02] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1152.eqiad.wmnet with reason: Rebooting for T303174
[14:03:04] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1152.eqiad.wmnet with reason: Rebooting for T303174
[14:03:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:09] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1152 depooling: Rebooting for T303174', diff saved to https://phabricator.wikimedia.org/P25968 and previous config saved to /var/cache/conftool/dbconfig/20220421-140309-kormat.json
[14:03:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:33] <wikibugs>	 (03Merged) 10jenkins-bot: [beta] WelcomeSurveyExperimentalGroups: Use enwiki instead of eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785165 (https://phabricator.wikimedia.org/T303240) (owner: 10Gergő Tisza)
[14:05:30] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host thumbor1002.eqiad.wmnet
[14:05:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:20] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1152 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P25969 and previous config saved to /var/cache/conftool/dbconfig/20220421-140719-kormat.json
[14:07:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:09:43] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:09:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:44] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:09:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:48] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:09:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:51] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parse1020.mgmt.eqiad.wmnet with reboot policy FORCED
[14:10:53] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parse1019.mgmt.eqiad.wmnet with reboot policy FORCED
[14:10:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P25971 and previous config saved to /var/cache/conftool/dbconfig/20220421-141126-ladsgroup.json
[14:11:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:12:14] <icinga-wm>	 PROBLEM - Host ml-serve1006 is DOWN: PING CRITICAL - Packet loss = 100%
[14:12:18] <icinga-wm>	 PROBLEM - Host ml-serve1007 is DOWN: PING CRITICAL - Packet loss = 100%
[14:12:27] <elukey>	 this is me, downtime expired, new nodes --^
[14:13:42] <icinga-wm>	 RECOVERY - Host ml-serve1006 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms
[14:13:58] <icinga-wm>	 PROBLEM - Host ml-serve1008 is DOWN: PING CRITICAL - Packet loss = 100%
[14:15:16] <icinga-wm>	 RECOVERY - Host ml-serve1007 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms
[14:15:26] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1002.eqiad.wmnet
[14:15:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:38] <icinga-wm>	 RECOVERY - Host ml-serve1008 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms
[14:16:21] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Mail: MX: increasing disk space - https://phabricator.wikimedia.org/T305567 (10jhathaway) p:05Triage→03Medium
[14:16:26] <wikibugs>	 (03PS3) 10Jcrespo: Revert "dumps: Block python requests UA" [puppet] - 10https://gerrit.wikimedia.org/r/784715
[14:16:28] <wikibugs>	 (03PS1) 10Jcrespo: mediabackup: Hide diffs from mc config file [puppet] - 10https://gerrit.wikimedia.org/r/785166 (https://phabricator.wikimedia.org/T305446)
[14:16:37] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Mail: MX: increasing disk space - https://phabricator.wikimedia.org/T305567 (10jhathaway) a:03jhathaway
[14:16:52] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host thumbor1005.eqiad.wmnet
[14:16:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[14:17:22] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[14:17:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25972 and previous config saved to /var/cache/conftool/dbconfig/20220421-141727-ladsgroup.json
[14:17:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:33] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[14:18:10] <wikibugs>	 (03CR) 10JHathaway: [C: 03+1] Revert "dumps: Block python requests UA" [puppet] - 10https://gerrit.wikimedia.org/r/784715 (owner: 10Jcrespo)
[14:18:48] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[14:19:26] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mediabackup: Hide diffs from mc config file [puppet] - 10https://gerrit.wikimedia.org/r/785166 (https://phabricator.wikimedia.org/T305446) (owner: 10Jcrespo)
[14:19:32] <wikibugs>	 (03PS2) 10Jcrespo: mediabackup: Hide diffs from mc config file [puppet] - 10https://gerrit.wikimedia.org/r/785166 (https://phabricator.wikimedia.org/T305446)
[14:20:44] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[14:22:23] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1152 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P25973 and previous config saved to /var/cache/conftool/dbconfig/20220421-142223-kormat.json
[14:22:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:41] <wikibugs>	 (03PS1) 10Huji: Re-enable article editing by anonymous users on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784718
[14:22:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Re-enable article editing by anonymous users on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784718 (owner: 10Huji)
[14:24:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25974 and previous config saved to /var/cache/conftool/dbconfig/20220421-142413-ladsgroup.json
[14:24:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:19] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[14:25:46] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1117.eqiad.wmnet with reason: Rebooting for T303174
[14:25:48] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1117.eqiad.wmnet with reason: Rebooting for T303174
[14:25:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:16] <wikibugs>	 (03PS2) 10Huji: Re-enable article editing by anonymous users on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784718 (https://phabricator.wikimedia.org/T292781)
[14:26:20] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1005.eqiad.wmnet
[14:26:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P25975 and previous config saved to /var/cache/conftool/dbconfig/20220421-142631-ladsgroup.json
[14:26:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:40] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host thumbor1006.eqiad.wmnet
[14:26:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:55] <icinga-wm>	 ACKNOWLEDGEMENT - haproxy failover on dbproxy1014 is CRITICAL: CRITICAL check_failover servers up 1 down 1: Kormat Rebooting db1117 https://wikitech.wikimedia.org/wiki/HAProxy
[14:27:55] <icinga-wm>	 ACKNOWLEDGEMENT - haproxy failover on dbproxy1015 is CRITICAL: CRITICAL check_failover servers up 1 down 1: Kormat Rebooting db1117 https://wikitech.wikimedia.org/wiki/HAProxy
[14:27:55] <icinga-wm>	 ACKNOWLEDGEMENT - haproxy failover on dbproxy1016 is CRITICAL: CRITICAL check_failover servers up 1 down 1: Kormat Rebooting db1117 https://wikitech.wikimedia.org/wiki/HAProxy
[14:28:19] <wikibugs>	 (03CR) 10Ottomata: "Hmm, in this case, would it not be better to increase these values in the helmfile service values files, rather than the chart defaults?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/785151 (https://phabricator.wikimedia.org/T306181) (owner: 10Btullis)
[14:32:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[14:33:55] <urbanecm>	 jouncebot: nowandnext
[14:33:55] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 26 minute(s)
[14:33:56] <jouncebot>	 In 1 hour(s) and 26 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T1600)
[14:34:10] <Amir1>	 urbanecm: gonna deploy it?
[14:34:13] <Amir1>	 I can deploy itt
[14:34:21] <urbanecm>	 Amir1: yeah, the fawiki revert
[14:34:30] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Re-enable article editing by anonymous users on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784718 (https://phabricator.wikimedia.org/T292781) (owner: 10Huji)
[14:34:36] <Amir1>	 I do it, don't worry
[14:34:39] <urbanecm>	 thanks
[14:35:13] <wikibugs>	 (03Merged) 10jenkins-bot: Re-enable article editing by anonymous users on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784718 (https://phabricator.wikimedia.org/T292781) (owner: 10Huji)
[14:35:24] <wikibugs>	 (03CR) 10Btullis: Increase the RAM request and limit for eventgate pods (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/785151 (https://phabricator.wikimedia.org/T306181) (owner: 10Btullis)
[14:36:58] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] Extend Ferm rules for new webperf hosts [puppet] - 10https://gerrit.wikimedia.org/r/785117 (https://phabricator.wikimedia.org/T305460) (owner: 10Muehlenhoff)
[14:36:59] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized wmf-config: Config: [[gerrit:784718|Re-enable article editing by anonymous users on fawiki (T292781)]] (duration: 00m 51s)
[14:37:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:06] <stashbot>	 T292781: Measure impact of requiring login to edit articles on Persian Wikipedia - https://phabricator.wikimedia.org/T292781
[14:37:18] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1006.eqiad.wmnet
[14:37:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:27] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1152 (re)pooling @ 75%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P25976 and previous config saved to /var/cache/conftool/dbconfig/20220421-143727-kormat.json
[14:37:30] <wikibugs>	 (03PS2) 10Btullis: Increase the RAM request and limit for eventgate pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/785151 (https://phabricator.wikimedia.org/T306181)
[14:37:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:32] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] Apply role::webperf::processors_and_site to webperf1003/2003 [puppet] - 10https://gerrit.wikimedia.org/r/785115 (https://phabricator.wikimedia.org/T305460) (owner: 10Muehlenhoff)
[14:37:40] * Amir1 afks
[14:38:12] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] Switch webperf1001/1003 for eventual removal [puppet] - 10https://gerrit.wikimedia.org/r/785116 (https://phabricator.wikimedia.org/T205460) (owner: 10Muehlenhoff)
[14:39:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P25977 and previous config saved to /var/cache/conftool/dbconfig/20220421-143918-ladsgroup.json
[14:39:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:40:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:40:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:40:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:40:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298565)', diff saved to https://phabricator.wikimedia.org/P25978 and previous config saved to /var/cache/conftool/dbconfig/20220421-144137-ladsgroup.json
[14:41:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[14:41:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[14:41:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:42] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[14:41:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1175 (T298565)', diff saved to https://phabricator.wikimedia.org/P25979 and previous config saved to /var/cache/conftool/dbconfig/20220421-144145-ladsgroup.json
[14:41:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:03] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] "You could do eventgate-logging-external as well if you like.  Either way!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/785151 (https://phabricator.wikimedia.org/T306181) (owner: 10Btullis)
[14:48:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Fix up host globbing for ping servers [puppet] - 10https://gerrit.wikimedia.org/r/785125 (owner: 10Muehlenhoff)
[14:52:12] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic, 10Patch-For-Review: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) At the moment we are getting between ~30 and ~60 requests receiving 503 responses pe...
[14:52:31] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1152 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P25980 and previous config saved to /var/cache/conftool/dbconfig/20220421-145231-kormat.json
[14:52:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:57] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1153.eqiad.wmnet with reason: Rebooting for T303174
[14:52:58] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1153.eqiad.wmnet with reason: Rebooting for T303174
[14:53:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:04] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1153 depooling: Rebooting for T303174', diff saved to https://phabricator.wikimedia.org/P25981 and previous config saved to /var/cache/conftool/dbconfig/20220421-145303-kormat.json
[14:53:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:54:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P25982 and previous config saved to /var/cache/conftool/dbconfig/20220421-145424-ladsgroup.json
[14:54:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:55:11] <wikibugs>	 (03CR) 10Ssingh: P:wikidough: add a check to ensure service has been restarted (0310 comments) [puppet] - 10https://gerrit.wikimedia.org/r/784697 (owner: 10Ssingh)
[14:56:41] <wikibugs>	 (03PS4) 10Ssingh: P:wikidough: add a check to ensure service has been restarted [puppet] - 10https://gerrit.wikimedia.org/r/784697
[14:57:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298565)', diff saved to https://phabricator.wikimedia.org/P25983 and previous config saved to /var/cache/conftool/dbconfig/20220421-145758-ladsgroup.json
[14:58:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:58:05] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[14:59:15] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1153 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P25984 and previous config saved to /var/cache/conftool/dbconfig/20220421-145914-kormat.json
[14:59:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[15:08:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Extend Ferm rules for new webperf hosts [puppet] - 10https://gerrit.wikimedia.org/r/785117 (https://phabricator.wikimedia.org/T305460) (owner: 10Muehlenhoff)
[15:09:18] <icinga-wm>	 PROBLEM - Check systemd state on snapshot1008 is CRITICAL: CRITICAL - degraded: The following units failed: cirrussearch-dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:09:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25985 and previous config saved to /var/cache/conftool/dbconfig/20220421-150929-ladsgroup.json
[15:09:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[15:09:32] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[15:09:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:35] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:09:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25986 and previous config saved to /var/cache/conftool/dbconfig/20220421-150937-ladsgroup.json
[15:09:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:54] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
[15:09:54] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
[15:09:54] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
[15:09:54] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
[15:09:54] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
[15:09:54] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
[15:09:55] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
[15:09:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:00] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster
[15:10:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:04] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1143.eqiad.wmnet with OS buster
[15:10:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:05] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster
[15:10:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:11] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster
[15:10:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:17] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster
[15:10:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:23] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster
[15:10:25] <logmsgbot>	 !log cmjohnson@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1143.eqiad.wmnet with OS buster
[15:10:26] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1143.eqiad.wmnet with OS buster
[15:10:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:29] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1143.eqiad.wmnet with OS buster
[15:10:29] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster
[15:10:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:34] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1143.eqiad.wmnet with OS buster
[15:10:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:36] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster
[15:10:38] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1143.eqiad.wmnet with OS buster
[15:10:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:41] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1143.eqiad.wmnet with OS buster
[15:10:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:46] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster exec...
[15:10:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:52] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster exec...
[15:11:02] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster exec...
[15:11:08] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster exec...
[15:11:14] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster exec...
[15:11:20] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster exec...
[15:11:26] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster exec...
[15:11:30] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
[15:11:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:36] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1145.eqiad.wmnet with OS buster
[15:11:39] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
[15:11:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:44] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster
[15:11:49] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS buster
[15:11:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:55] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1144.eqiad.wmnet with OS buster
[15:12:01] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
[15:12:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:06] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1146.eqiad.wmnet with OS buster
[15:12:27] <logmsgbot>	 !log cmjohnson@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1146.eqiad.wmnet with OS buster
[15:12:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:32] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1146.eqiad.wmnet with OS buster exec...
[15:12:37] <logmsgbot>	 !log cmjohnson@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1145.eqiad.wmnet with OS buster
[15:12:40] <logmsgbot>	 !log cmjohnson@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1144.eqiad.wmnet with OS buster
[15:12:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:41] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1145.eqiad.wmnet with OS buster exec...
[15:12:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:45] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1144.eqiad.wmnet with OS buster exec...
[15:13:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P25987 and previous config saved to /var/cache/conftool/dbconfig/20220421-151303-ladsgroup.json
[15:13:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:27] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS buster
[15:13:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:33] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1144.eqiad.wmnet with OS buster
[15:13:46] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
[15:13:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:52] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1145.eqiad.wmnet with OS buster
[15:14:13] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
[15:14:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:19] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1153 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P25988 and previous config saved to /var/cache/conftool/dbconfig/20220421-151418-kormat.json
[15:14:19] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1146.eqiad.wmnet with OS buster
[15:14:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25989 and previous config saved to /var/cache/conftool/dbconfig/20220421-151610-ladsgroup.json
[15:16:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:16] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:20:54] <icinga-wm>	 PROBLEM - Query Service HTTP Port on wdqs1007 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 298 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[15:28:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P25990 and previous config saved to /var/cache/conftool/dbconfig/20220421-152809-ladsgroup.json
[15:28:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:22] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1153 (re)pooling @ 75%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P25991 and previous config saved to /var/cache/conftool/dbconfig/20220421-152922-kormat.json
[15:29:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:30:32] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Increase the RAM request and limit for eventgate pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/785151 (https://phabricator.wikimedia.org/T306181) (owner: 10Btullis)
[15:31:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P25992 and previous config saved to /var/cache/conftool/dbconfig/20220421-153115-ladsgroup.json
[15:31:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:51] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
[15:33:53] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
[15:33:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:35:01] <wikibugs>	 (03Merged) 10jenkins-bot: Increase the RAM request and limit for eventgate pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/785151 (https://phabricator.wikimedia.org/T306181) (owner: 10Btullis)
[15:36:03] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
[15:36:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:37] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
[15:36:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:36] <wikibugs>	 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul)
[15:37:42] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
[15:37:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:38:40] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
[15:38:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:39:49] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
[15:39:50] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1143.eqiad.wmnet with OS buster
[15:39:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:39:55] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster exec...
[15:39:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:35] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
[15:40:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:41:38] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1144.eqiad.wmnet with OS buster
[15:41:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:41:43] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1144.eqiad.wmnet with OS buster exec...
[15:41:57] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1145.eqiad.wmnet with OS buster
[15:42:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:42:02] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1145.eqiad.wmnet with OS buster exec...
[15:42:24] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1146.eqiad.wmnet with OS buster
[15:42:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:42:30] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1146.eqiad.wmnet with OS buster exec...
[15:42:37] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic, 10Patch-For-Review: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) I have increased the amount of RAM available to the eventgate-analytics-external dep...
[15:43:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298565)', diff saved to https://phabricator.wikimedia.org/P25993 and previous config saved to /var/cache/conftool/dbconfig/20220421-154314-ladsgroup.json
[15:43:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[15:43:17] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[15:43:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:19] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:43:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:44:26] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1153 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P25994 and previous config saved to /var/cache/conftool/dbconfig/20220421-154426-kormat.json
[15:44:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:46:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P25995 and previous config saved to /var/cache/conftool/dbconfig/20220421-154620-ladsgroup.json
[15:46:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:05] <wikibugs>	 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul)
[15:49:34] <wikibugs>	 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul)
[15:52:53] <wikibugs>	 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul)
[15:53:28] <wikibugs>	 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul) >>! In T304712#7857483, @ayounsi wrote: > Thanks! >  > I like your idea of putting the capacity in the table, I added dedicated columns for it. >  > Note that I don't know if there is enough total...
[15:59:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestagemaster1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[16:00:04] <jouncebot>	 jbond and rzl: It is that lovely time of the day again! You are hereby commanded to deploy Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T1600).
[16:00:04] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[16:01:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25996 and previous config saved to /var/cache/conftool/dbconfig/20220421-160125-ladsgroup.json
[16:01:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[16:01:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[16:01:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:01:32] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:01:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25997 and previous config saved to /var/cache/conftool/dbconfig/20220421-160133-ladsgroup.json
[16:01:34] <wikibugs>	 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul)
[16:01:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:01:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:01:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:04:56] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10Cmjohnson) @arzhel fixed the reboot issue, the external disk attached to the router was causing the reboots.  I updated JUNOS to junos-srxsme-20.2R3-S2....
[16:06:21] <wikibugs>	 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul)
[16:08:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P25998 and previous config saved to /var/cache/conftool/dbconfig/20220421-160804-ladsgroup.json
[16:08:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:08:10] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:17:05] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1120.eqiad.wmnet with reason: Rebooting for T303174
[16:17:07] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1120.eqiad.wmnet with reason: Rebooting for T303174
[16:17:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:17:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:39] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P25999 and previous config saved to /var/cache/conftool/dbconfig/20220421-162039-kormat.json
[16:20:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:23:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P26000 and previous config saved to /var/cache/conftool/dbconfig/20220421-162309-ladsgroup.json
[16:23:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:51] <wikibugs>	 10SRE, 10SRE-OnFire (FY2021/2022-Q3), 10Data-Engineering, 10Event-Platform, and 2 others: Banner sampling leading to a relatively wide site outage (mostly esams) - https://phabricator.wikimedia.org/T303036 (10Krinkle)
[16:30:06] <wikibugs>	 10SRE, 10SRE-OnFire (FY2021/2022-Q3), 10Data-Engineering, 10Event-Platform, and 2 others: Banner sampling leading to a relatively wide site outage (mostly esams) - https://phabricator.wikimedia.org/T303036 (10Krinkle)
[16:30:17] <wikibugs>	 (03PS1) 10Andrew Bogott: Renumber ns-recursor[0,1].openstack.codfw1dev.wikimediacloud.org [dns] - 10https://gerrit.wikimedia.org/r/785182
[16:30:25] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[16:30:26] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[16:30:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P26001 and previous config saved to /var/cache/conftool/dbconfig/20220421-163031-ladsgroup.json
[16:30:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:39] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:31:23] <wikibugs>	 (03CR) 10Andrew Bogott: "Now that a lot of this is delegated to netbox I'm not sure how to avoid reusing an IP that's already being used by netbox. Please advise!" [dns] - 10https://gerrit.wikimedia.org/r/785182 (owner: 10Andrew Bogott)
[16:34:54] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work): hw troubleshooting: memory error for elastic1097 - https://phabricator.wikimedia.org/T306449 (10Cmjohnson) DIMM has been shipped
[16:35:43] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26002 and previous config saved to /var/cache/conftool/dbconfig/20220421-163543-kormat.json
[16:35:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:07] <wikibugs>	 (03CR) 10Volans: "replies inline, no blockers for me" [puppet] - 10https://gerrit.wikimedia.org/r/784697 (owner: 10Ssingh)
[16:38:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P26003 and previous config saved to /var/cache/conftool/dbconfig/20220421-163814-ladsgroup.json
[16:38:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:24] <XioNoX>	 !log replace mr1-eqiad - T294474
[16:43:24] <wm-bot>	 Sorry, you are not authorized to perform this
[16:43:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:29] <stashbot>	 T294474: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474
[16:43:54] <taavi>	 wm-bot: wut?
[16:45:04] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host parse1015.mgmt.eqiad.wmnet with reboot policy FORCED
[16:45:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:13] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10Cmjohnson) a:05Jclark-ctr→03Cmjohnson
[16:50:47] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26004 and previous config saved to /var/cache/conftool/dbconfig/20220421-165047-kormat.json
[16:50:49] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10Cmjohnson)
[16:50:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298565)', diff saved to https://phabricator.wikimedia.org/P26005 and previous config saved to /var/cache/conftool/dbconfig/20220421-165319-ladsgroup.json
[16:53:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[16:53:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[16:53:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:24] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[16:53:26] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:53:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[16:53:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P26006 and previous config saved to /var/cache/conftool/dbconfig/20220421-165333-ladsgroup.json
[16:53:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:48] <icinga-wm>	 PROBLEM - Host ms-be1053.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:53:54] <icinga-wm>	 PROBLEM - Host db1140.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:53:54] <icinga-wm>	 PROBLEM - Host db1139.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:54:26] <icinga-wm>	 PROBLEM - SSH on ms-fe1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[16:54:40] <icinga-wm>	 PROBLEM - Host relforge1004.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:54:42] <icinga-wm>	 PROBLEM - Host relforge1003.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:55:14] <icinga-wm>	 PROBLEM - Swift https backend on ms-fe1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift
[16:55:18] <icinga-wm>	 PROBLEM - Swift https frontend on ms-fe1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift
[16:56:08] <icinga-wm>	 PROBLEM - Host ms-be1056.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:56:12] <icinga-wm>	 PROBLEM - Host ms-be1052.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:56:12] <icinga-wm>	 PROBLEM - Host ms-be1055.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:56:12] <icinga-wm>	 PROBLEM - Host ms-be1058.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:56:12] <icinga-wm>	 PROBLEM - Host ms-be1051.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:56:12] <icinga-wm>	 PROBLEM - Host ms-be1054.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:56:13] <icinga-wm>	 PROBLEM - Host ms-be1057.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:56:24] <icinga-wm>	 PROBLEM - Host ms-be1059.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:56:29] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[16:56:31] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[16:56:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:56:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1180 (T306560)', diff saved to https://phabricator.wikimedia.org/P26007 and previous config saved to /var/cache/conftool/dbconfig/20220421-165635-ladsgroup.json
[16:56:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:56:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:56:42] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[16:57:23] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10ayounsi) Swap has been done successfully!  Left to do: wipe the old one, rename the console server port of the new one.
[16:58:16] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10Cmjohnson) loaded, configuration file verified working moved cables to new mr1-eqiad left scs connection to old mr1 to wipe, still requires scs connecti...
[16:59:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T306560)', diff saved to https://phabricator.wikimedia.org/P26008 and previous config saved to /var/cache/conftool/dbconfig/20220421-165946-ladsgroup.json
[16:59:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:44] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[17:05:51] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P26009 and previous config saved to /var/cache/conftool/dbconfig/20220421-170551-kormat.json
[17:05:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:10:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P26010 and previous config saved to /var/cache/conftool/dbconfig/20220421-170959-ladsgroup.json
[17:10:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:10:06] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:14:28] <icinga-wm>	 RECOVERY - SSH on ms-fe1012 is OK: SSH OK - OpenSSH_8.4p1 Debian-5 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[17:14:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P26011 and previous config saved to /var/cache/conftool/dbconfig/20220421-171451-ladsgroup.json
[17:14:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:16] <icinga-wm>	 RECOVERY - Swift https backend on ms-fe1012 is OK: HTTP OK: HTTP/1.1 200 OK - 451 bytes in 0.020 second response time https://wikitech.wikimedia.org/wiki/Swift
[17:15:20] <icinga-wm>	 RECOVERY - Swift https frontend on ms-fe1012 is OK: HTTP OK: HTTP/1.1 200 OK - 245 bytes in 0.008 second response time https://wikitech.wikimedia.org/wiki/Swift
[17:21:03] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH) Summary update from multiple out of band support emails and conversations with our Dell account team:  * confirmed that the missing/spin down doesn't work for them either, and chipset manufacturer...
[17:25:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P26012 and previous config saved to /var/cache/conftool/dbconfig/20220421-172504-ladsgroup.json
[17:25:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:27:18] <wikibugs>	 10SRE-Access-Requests: WIP: request sudo access for Jclark-ctr - https://phabricator.wikimedia.org/T306654 (10lmata)
[17:29:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P26013 and previous config saved to /var/cache/conftool/dbconfig/20220421-172956-ladsgroup.json
[17:30:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P26015 and previous config saved to /var/cache/conftool/dbconfig/20220421-173046-ladsgroup.json
[17:30:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:51] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:40:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P26016 and previous config saved to /var/cache/conftool/dbconfig/20220421-174009-ladsgroup.json
[17:40:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T306560)', diff saved to https://phabricator.wikimedia.org/P26017 and previous config saved to /var/cache/conftool/dbconfig/20220421-174501-ladsgroup.json
[17:45:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[17:45:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[17:45:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:07] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[17:45:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1168 (T306560)', diff saved to https://phabricator.wikimedia.org/P26018 and previous config saved to /var/cache/conftool/dbconfig/20220421-174509-ladsgroup.json
[17:45:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P26019 and previous config saved to /var/cache/conftool/dbconfig/20220421-174551-ladsgroup.json
[17:45:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:46:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[17:47:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[17:51:01] <wikibugs>	 (03CR) 10Ssingh: P:wikidough: add a check to ensure service has been restarted (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/784697 (owner: 10Ssingh)
[17:52:37] <wikibugs>	 (03PS5) 10Ssingh: P:wikidough: add a check to ensure service has been restarted [puppet] - 10https://gerrit.wikimedia.org/r/784697
[17:53:31] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34946/console" [puppet] - 10https://gerrit.wikimedia.org/r/784697 (owner: 10Ssingh)
[17:55:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298565)', diff saved to https://phabricator.wikimedia.org/P26020 and previous config saved to /var/cache/conftool/dbconfig/20220421-175514-ladsgroup.json
[17:55:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:55:21] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[18:00:04] <jouncebot>	 jeena and brennen: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for MediaWiki train - Utc-7 Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T1800).
[18:00:13] <brennen>	 o/
[18:00:33] <jeena>	 time to deploy
[18:00:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P26021 and previous config saved to /var/cache/conftool/dbconfig/20220421-180056-ladsgroup.json
[18:01:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:01:41] <wikibugs>	 (03PS1) 10Jeena Huneidi: all wikis to 1.39.0-wmf.8  refs T305214 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785187
[18:01:43] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] all wikis to 1.39.0-wmf.8  refs T305214 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785187 (owner: 10Jeena Huneidi)
[18:02:26] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.39.0-wmf.8  refs T305214 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785187 (owner: 10Jeena Huneidi)
[18:03:43] <logmsgbot>	 !log jhuneidi@deploy1002 rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.8  refs T305214
[18:03:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:03:49] <stashbot>	 T305214: 1.39.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T305214
[18:04:53] <wikibugs>	 10SRE, 10SRE-Access-Requests: WIP: request sudo access for Jclark-ctr - https://phabricator.wikimedia.org/T306654 (10lmata)
[18:07:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[18:07:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[18:07:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[18:07:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[18:07:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:08:19] <wikibugs>	 10SRE, 10SRE-Access-Requests: WIP: request sudo access for Jclark-ctr - https://phabricator.wikimedia.org/T306654 (10lmata) @Volans, please correct or amend this task if I missed anything.  @wiki_willy we will need your approval as @Jclark-ctr 's manager  @MoritzMuehlenhoff: adding you for awareness and feedba...
[18:08:38] <wikibugs>	 (03PS1) 10Ssingh: monitoring_service: specify units for configuration attributes [puppet] - 10https://gerrit.wikimedia.org/r/785188
[18:08:57] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations (FY2021/2022-Q4): WIP: request sudo access for Jclark-ctr - https://phabricator.wikimedia.org/T306654 (10lmata)
[18:09:47] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "[Do not merge till Monday]. Thanks to BBlack, Daniel Zahn, and Volans for the review!" [puppet] - 10https://gerrit.wikimedia.org/r/784697 (owner: 10Ssingh)
[18:11:12] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations (FY2021/2022-Q4): WIP: request sudo access for Jclark-ctr - https://phabricator.wikimedia.org/T306654 (10MoritzMuehlenhoff) >>! In T306654#7872413, @lmata wrote: > @MoritzMuehlenhoff: adding you for awareness and feedback.   Yes, it sounds good to me...
[18:15:35] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) The RAM upgrade has not resulted in any improvement. {F35061891,width=60%}
[18:15:59] <wikibugs>	 (03PS1) 10Dzahn: gitlab: ensure home dir for runner_user exists when running as non-root [puppet] - 10https://gerrit.wikimedia.org/r/785189 (https://phabricator.wikimedia.org/T297659)
[18:16:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P26022 and previous config saved to /var/cache/conftool/dbconfig/20220421-181601-ladsgroup.json
[18:16:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[18:16:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[18:16:06] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[18:16:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:08] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[18:16:10] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[18:16:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P26023 and previous config saved to /var/cache/conftool/dbconfig/20220421-181614-ladsgroup.json
[18:16:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:49] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-1] "Need to reserve IPs ahead of time according to https://wikitech.wikimedia.org/wiki/DNS/Netbox#How_to_manually_allocate_a_special_purpose_I" [dns] - 10https://gerrit.wikimedia.org/r/785182 (owner: 10Andrew Bogott)
[18:17:55] <wikibugs>	 (03PS2) 10Dzahn: gitlab: ensure home dir for runner_user exists when running as non-root [puppet] - 10https://gerrit.wikimedia.org/r/785189 (https://phabricator.wikimedia.org/T297659)
[18:32:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[18:35:42] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Sustainability (Incident Followup): Add linecard diversity to the router-to-router interconnect in codfw - https://phabricator.wikimedia.org/T248506 (10Krinkle)
[18:35:51] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "checked this. the intervals can be in the form "1s" to mean actually 1 second or just "1" when it means 1 minute. yep!" [puppet] - 10https://gerrit.wikimedia.org/r/785188 (owner: 10Ssingh)
[18:37:36] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/pcc-worker1001/34947/gitlab-runner2001.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/785189 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[18:38:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P26024 and previous config saved to /var/cache/conftool/dbconfig/20220421-183807-ladsgroup.json
[18:38:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:38:13] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[18:43:48] <wikibugs>	 (03PS1) 10Dzahn: Revert "Revert "Revert "gitlab: temp set gitlab-runner user to root for bootstrapping gitlab-runner2001""" [puppet] - 10https://gerrit.wikimedia.org/r/784723
[18:45:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T306560)', diff saved to https://phabricator.wikimedia.org/P26025 and previous config saved to /var/cache/conftool/dbconfig/20220421-184523-ladsgroup.json
[18:45:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:45:29] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[18:46:28] <wikibugs>	 10SRE, 10observability, 10Patch-For-Review, 10SRE Observability (FY2021/2022-Q3), and 2 others: Fix unquoted URL parameters in Icgina health checks - https://phabricator.wikimedia.org/T304323 (10Krinkle)
[18:47:59] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Revert "Revert "Revert "gitlab: temp set gitlab-runner user to root for bootstrapping gitlab-runner2001""" [puppet] - 10https://gerrit.wikimedia.org/r/784723 (owner: 10Dzahn)
[18:49:24] <wikibugs>	 (03CR) 10Ssingh: monitoring_service: specify units for configuration attributes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/785188 (owner: 10Ssingh)
[18:49:33] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] monitoring_service: specify units for configuration attributes [puppet] - 10https://gerrit.wikimedia.org/r/785188 (owner: 10Ssingh)
[18:53:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P26026 and previous config saved to /var/cache/conftool/dbconfig/20220421-185312-ladsgroup.json
[18:53:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:59:23] <wikibugs>	 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Sustainability (Incident Followup), 10User-Ladsgroup: Create or modify an existing tool that quickly shows the db replication status in case of master failure - https://phabricator.wikimedia.org/T281249 (10Krinkle)
[19:00:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P26027 and previous config saved to /var/cache/conftool/dbconfig/20220421-190029-ladsgroup.json
[19:00:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[19:07:36] <wikibugs>	 (03PS1) 10Dzahn: gitlab_runner: use config_path variable when creating config file [puppet] - 10https://gerrit.wikimedia.org/r/785191 (https://phabricator.wikimedia.org/T297659)
[19:07:49] <wikibugs>	 (03PS1) 10Bking: Revert "elastic: increase recovery time" [cookbooks] - 10https://gerrit.wikimedia.org/r/784724
[19:08:07] <wikibugs>	 (03PS2) 10Dzahn: gitlab_runner: use config_path variable when creating config file [puppet] - 10https://gerrit.wikimedia.org/r/785191 (https://phabricator.wikimedia.org/T297659)
[19:08:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P26028 and previous config saved to /var/cache/conftool/dbconfig/20220421-190817-ladsgroup.json
[19:08:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:08:21] <ebernhardson>	 !log set index.unassigned.node_left.delayed_timeout to null for all indices on elasticsearch-eqiad-psi (:9200), reverting previous test of 10m back to defaults
[19:08:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:08:35] <wikibugs>	 (03PS2) 10Ryan Kemper: Revert "elastic: increase recovery time" [cookbooks] - 10https://gerrit.wikimedia.org/r/784724 (https://phabricator.wikimedia.org/T305994) (owner: 10Bking)
[19:09:19] <wikibugs>	 (03CR) 10Dzahn: [C: 04-2] "ah, no, this is the template that is used to create the actual config from.. hrmm" [puppet] - 10https://gerrit.wikimedia.org/r/785191 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[19:15:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P26029 and previous config saved to /var/cache/conftool/dbconfig/20220421-191534-ladsgroup.json
[19:15:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:30] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:16:46] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[19:16:54] <wikibugs>	 10SRE-OnFire, 10Data-Persistence (Consultation), 10Platform Engineering, 10Performance-Team (Radar), 10Sustainability (Incident Followup): 2022-03-10 MediaWiki availability affected due to a database query processing slowdown affecting most of the rest of the dat... - https://phabricator.wikimedia.org/T303499
[19:18:37] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[19:18:38] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[19:18:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[19:18:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:18:43] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[19:18:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:18:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:18:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P26030 and previous config saved to /var/cache/conftool/dbconfig/20220421-191847-ladsgroup.json
[19:18:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:18:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:18:57] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[19:19:31] <wikibugs>	 10SRE, 10Traffic-Icebox, 10Wikimedia-Incident: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 (10Krinkle) a:03Vgutierrez I believe this was resolved since and/or obsoleted by HAProxy, is that right?
[19:23:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P26031 and previous config saved to /var/cache/conftool/dbconfig/20220421-192322-ladsgroup.json
[19:23:24] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[19:23:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[19:23:26] <urandom>	 papaul: re: T305568, does that notation you added to the description for aqs200[1-4] (e.g. B6U35 ge-6/0/34) indicate the row?  Does this indicate these are in row 'B'? 
[19:23:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:23:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P26032 and previous config saved to /var/cache/conftool/dbconfig/20220421-192330-ladsgroup.json
[19:23:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:23:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:23:35] <stashbot>	 T305568: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568
[19:23:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:23:56] <cdanis>	 !log depooling & disabling puppet on cp2029 for some manual testing T303534
[19:24:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T306560)', diff saved to https://phabricator.wikimedia.org/P26033 and previous config saved to /var/cache/conftool/dbconfig/20220421-193039-ladsgroup.json
[19:30:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[19:30:43] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[19:30:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[19:30:45] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[19:30:47] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[19:30:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1165 (T306560)', diff saved to https://phabricator.wikimedia.org/P26034 and previous config saved to /var/cache/conftool/dbconfig/20220421-193052-ladsgroup.json
[19:30:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:31:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:33:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T306560)', diff saved to https://phabricator.wikimedia.org/P26035 and previous config saved to /var/cache/conftool/dbconfig/20220421-193302-ladsgroup.json
[19:33:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:34:09] <wikibugs>	 (03PS2) 10Andrew Bogott: Renumber ns-recursor[0,1].openstack.codfw1dev.wikimediacloud.org [dns] - 10https://gerrit.wikimedia.org/r/785182
[19:34:25] <wikibugs>	 10SRE, 10observability, 10Sustainability (Incident Followup), 10User-MoritzMuehlenhoff: Alert on ECC warnings in SEL - https://phabricator.wikimedia.org/T253810 (10Krinkle)
[19:35:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Renumber ns-recursor[0,1].openstack.codfw1dev.wikimediacloud.org [dns] - 10https://gerrit.wikimedia.org/r/785182 (owner: 10Andrew Bogott)
[19:38:59] <wikibugs>	 10SRE, 10observability, 10Patch-For-Review, 10SRE Observability (FY2021/2022-Q3), and 2 others: Fix unquoted URL parameters in Icinga health checks - https://phabricator.wikimedia.org/T304323 (10colewhite)
[19:39:44] <wikibugs>	 (03PS3) 10Dzahn: gitlab_runner: ensure the full path to the config location exists [puppet] - 10https://gerrit.wikimedia.org/r/785191 (https://phabricator.wikimedia.org/T297659)
[19:40:03] <wikibugs>	 (03PS4) 10Dzahn: gitlab_runner: ensure the full path to the config location exists [puppet] - 10https://gerrit.wikimedia.org/r/785191 (https://phabricator.wikimedia.org/T297659)
[19:41:26] <wikibugs>	 (03PS5) 10Dzahn: gitlab_runner: ensure the full path to the config location exists [puppet] - 10https://gerrit.wikimedia.org/r/785191 (https://phabricator.wikimedia.org/T297659)
[19:41:57] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] gitlab_runner: ensure the full path to the config location exists [puppet] - 10https://gerrit.wikimedia.org/r/785191 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[19:43:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P26036 and previous config saved to /var/cache/conftool/dbconfig/20220421-194303-ladsgroup.json
[19:43:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:43:10] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[19:44:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[19:44:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[19:44:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:44:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26037 and previous config saved to /var/cache/conftool/dbconfig/20220421-194419-ladsgroup.json
[19:44:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:44:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P26038 and previous config saved to /var/cache/conftool/dbconfig/20220421-194807-ladsgroup.json
[19:48:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:56] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.08 ms
[19:58:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P26039 and previous config saved to /var/cache/conftool/dbconfig/20220421-195808-ladsgroup.json
[19:58:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:59:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestagemaster1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[19:59:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26040 and previous config saved to /var/cache/conftool/dbconfig/20220421-195950-ladsgroup.json
[19:59:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:59:56] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:00:05] <jouncebot>	 brennen: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC late backport and config training deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T2000).
[20:01:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P26041 and previous config saved to /var/cache/conftool/dbconfig/20220421-200154-ladsgroup.json
[20:01:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P26042 and previous config saved to /var/cache/conftool/dbconfig/20220421-200312-ladsgroup.json
[20:03:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:44] <mutante>	 !log [puppetmaster1001:~] $ sudo puppet cert clean gitlab-runner2001.codfw.wmnet
[20:08:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:02] <mutante>	 !log [ganeti2021:~] $ sudo gnt-instance shutdown gitlab-runner2001.codfw.wmnet
[20:09:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:10:04] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH) a:05RobH→03Cmjohnson Ok, next steps:  I've set the disk to identify flash with: perccli64 /c0/e64/s13 start locate    ` root@dumpsdata1007:~# perccli64 /c0/e64/s13 start locate CLI Version = 00...
[20:13:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P26043 and previous config saved to /var/cache/conftool/dbconfig/20220421-201313-ladsgroup.json
[20:13:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:11] <mutante>	 !log reimaging gitlab-runner2001.codfw.wmnet one more time to confirm things work from scratch now  T297659
[20:14:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:16] <stashbot>	 T297659: upgrade gitlab-runners to bullseye - https://phabricator.wikimedia.org/T297659
[20:14:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P26044 and previous config saved to /var/cache/conftool/dbconfig/20220421-201455-ladsgroup.json
[20:15:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P26045 and previous config saved to /var/cache/conftool/dbconfig/20220421-201659-ladsgroup.json
[20:17:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:18:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T306560)', diff saved to https://phabricator.wikimedia.org/P26046 and previous config saved to /var/cache/conftool/dbconfig/20220421-201817-ladsgroup.json
[20:18:19] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
[20:18:20] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
[20:18:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:18:24] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[20:18:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1131 (T306560)', diff saved to https://phabricator.wikimedia.org/P26047 and previous config saved to /var/cache/conftool/dbconfig/20220421-201825-ladsgroup.json
[20:18:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:18:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:18:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:21:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131 (T306560)', diff saved to https://phabricator.wikimedia.org/P26048 and previous config saved to /var/cache/conftool/dbconfig/20220421-202135-ladsgroup.json
[20:21:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:23:36] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[20:24:24] <wikibugs>	 (03PS5) 10Juan90264: Enable '$wgCopyUploadsDomains' to viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784717
[20:28:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P26049 and previous config saved to /var/cache/conftool/dbconfig/20220421-202818-ladsgroup.json
[20:28:19] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[20:28:21] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[20:28:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:25] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:28:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P26050 and previous config saved to /var/cache/conftool/dbconfig/20220421-202826-ladsgroup.json
[20:28:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P26051 and previous config saved to /var/cache/conftool/dbconfig/20220421-203003-ladsgroup.json
[20:30:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:32:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P26052 and previous config saved to /var/cache/conftool/dbconfig/20220421-203204-ladsgroup.json
[20:32:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:35:53] <Juan_90264>	 Hello brennen
[20:36:17] <Juan_90264>	 Sorry for taking so long, I put two changes in Deployments
[20:36:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P26053 and previous config saved to /var/cache/conftool/dbconfig/20220421-203640-ladsgroup.json
[20:36:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:36:51] <Juan_90264>	 If it's still available, could you deploy it?
[20:39:10] <logmsgbot>	 !log nokafor@deploy1002 Started deploy [airflow-dags/analytics@bd28d80]: (no justification provided)
[20:39:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:39:37] <logmsgbot>	 !log nokafor@deploy1002 Finished deploy [airflow-dags/analytics@bd28d80]: (no justification provided) (duration: 00m 27s)
[20:39:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:41:51] <logmsgbot>	 !log nokafor@deploy1002 Started deploy [airflow-dags/analytics@bd28d80]: (no justification provided)
[20:41:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:41:58] <logmsgbot>	 !log nokafor@deploy1002 Finished deploy [airflow-dags/analytics@bd28d80]: (no justification provided) (duration: 00m 07s)
[20:42:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:43:17] <Juan_90264>	 thcipriani ?
[20:43:42] <icinga-wm>	 PROBLEM - Check size of conntrack table on gitlab-runner2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.32.72: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[20:44:20] <mutante>	 ^ me..reimage in progress
[20:44:28] <icinga-wm>	 PROBLEM - Check for large files in client bucket on gitlab-runner2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.32.72: Connection reset by peer https://wikitech.wikimedia.org/wiki/Puppet%23check_client_bucket_large_file
[20:44:38] <icinga-wm>	 PROBLEM - puppet last run on gitlab-runner2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.32.72: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[20:45:00] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-runner2001.codfw.wmnet with reason: reimage
[20:45:03] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab-runner2001.codfw.wmnet with reason: reimage
[20:45:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:45:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:45:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26054 and previous config saved to /var/cache/conftool/dbconfig/20220421-204508-ladsgroup.json
[20:45:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:45:14] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:45:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[20:45:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[20:45:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:45:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26055 and previous config saved to /var/cache/conftool/dbconfig/20220421-204532-ladsgroup.json
[20:45:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:45:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:45:46] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10Eevans) @Papaul does this notation you added refer to the row (and rack) location?  For example: aqs2001: !!B6U35 ge-6/0/34!!, does mean row 'B'?
[20:47:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P26056 and previous config saved to /var/cache/conftool/dbconfig/20220421-204709-ladsgroup.json
[20:47:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:47:52] <Juan_90264>	 mutante: Can you tell me if the backport will occur?
[20:48:10] <mutante>	 Juan_90264: I don't know the answer, sorry
[20:48:12] <mutante>	 jouncebot: now
[20:48:12] <jouncebot>	 For the next 0 hour(s) and 11 minute(s): UTC late backport and config training (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220421T2000)
[20:48:33] <Juan_90264>	 Okay
[20:49:00] <mutante>	 Juan_90264: oooh.. I think it's because tomorrow is a WMF holiday
[20:49:04] <mutante>	 so nobody will be working
[20:49:13] <mutante>	 that likely means today is "like Friday"
[20:49:17] <mutante>	 and no deploys on Friday
[20:49:26] <mutante>	 thcipriani: right? ^
[20:50:50] <cdanis>	 !log re-enabled puppet and repooled cp2029
[20:50:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:51:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P26057 and previous config saved to /var/cache/conftool/dbconfig/20220421-205145-ladsgroup.json
[20:51:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:52:19] <Juan_90264>	 mutante: I understand, if thcipriani is online I wait to confirm this
[20:52:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P26058 and previous config saved to /var/cache/conftool/dbconfig/20220421-205256-ladsgroup.json
[20:53:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:53:02] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:53:04] <icinga-wm>	 RECOVERY - Check for large files in client bucket on gitlab-runner2001 is OK: OK: client bucket file ok https://wikitech.wikimedia.org/wiki/Puppet%23check_client_bucket_large_file
[20:54:10] <icinga-wm>	 RECOVERY - Check size of conntrack table on gitlab-runner2001 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[20:54:55] <thcipriani>	 mutante: Juan_90264 yes, often that is the case; however, I wasn't planning on doing that today. I wasn't around because I thought there weren't any patches in the window and walked away from the computer for a bit :)
[20:55:47] <thcipriani>	 Juan_90264: this patch is failing CI so I won't deploy that one today https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMessages/+/784722, also it's not a backport
[20:56:12] <thcipriani>	 Juan_90264: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/784717 seems safe -- is there a bug it's attached to?
[20:59:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26059 and previous config saved to /var/cache/conftool/dbconfig/20220421-205906-ladsgroup.json
[20:59:07] <wikibugs>	 (03PS6) 10Juan90264: Enable '$wgCopyUploadsDomains' to viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784717 (https://phabricator.wikimedia.org/T303577)
[20:59:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:59:13] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:59:28] <Juan_90264>	 thcipriani: Yes, https://phabricator.wikimedia.org/T303577
[20:59:42] <mutante>	 Juan_90264: I think it's failing CI because the arrows are facing left?  <=   vs  =>  ?
[21:00:36] <wikibugs>	 (03PS1) 10Urbanecm: GlobalUserSelectQueryBuilder: Do not fatal when no users are returned [extensions/CentralAuth] (wmf/1.39.0-wmf.8) - 10https://gerrit.wikimedia.org/r/785207 (https://phabricator.wikimedia.org/T306535)
[21:01:08] <Juan_90264>	 mutante: I know they're doing it, but that's because of the writing. This writing is leaving the arrow backwards
[21:01:09] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] Enable '$wgCopyUploadsDomains' to viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784717 (https://phabricator.wikimedia.org/T303577) (owner: 10Juan90264)
[21:01:30] <mutante>	 because it's a RTL language? I was wondering that
[21:01:58] <wikibugs>	 (03Merged) 10jenkins-bot: Enable '$wgCopyUploadsDomains' to viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/784717 (https://phabricator.wikimedia.org/T303577) (owner: 10Juan90264)
[21:02:34] <Juan_90264>	 mutante: Yes
[21:02:44] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[21:02:45] <Juan_90264>	 Perfect change merged!
[21:03:06] <Juan_90264>	 *Perfect merged!
[21:03:09] <thcipriani>	 Juan_90264: live on mwdebug1002, check please :)
[21:03:29] <Juan_90264>	 Okay, I will test
[21:04:08] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[21:04:10] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[21:04:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P26060 and previous config saved to /var/cache/conftool/dbconfig/20220421-210414-ladsgroup.json
[21:04:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[21:04:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[21:04:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[21:04:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:23] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[21:04:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[21:04:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:06:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131 (T306560)', diff saved to https://phabricator.wikimedia.org/P26061 and previous config saved to /var/cache/conftool/dbconfig/20220421-210650-ladsgroup.json
[21:06:52] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[21:06:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[21:06:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:06:56] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[21:06:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3316 (T306560)', diff saved to https://phabricator.wikimedia.org/P26062 and previous config saved to /var/cache/conftool/dbconfig/20220421-210658-ladsgroup.json
[21:06:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:07:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:07:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:08:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P26063 and previous config saved to /var/cache/conftool/dbconfig/20220421-210801-ladsgroup.json
[21:08:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:08:38] <wikibugs>	 (03PS1) 10Dzahn: gitlab::runner: ensure config dir is owned by non-privileged user [puppet] - 10https://gerrit.wikimedia.org/r/785198 (https://phabricator.wikimedia.org/T297659)
[21:09:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] gitlab::runner: ensure config dir is owned by non-privileged user [puppet] - 10https://gerrit.wikimedia.org/r/785198 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[21:09:39] <Juan_90264>	 thcipriani: Okay, everything seems to be ok, but I was missing activating "$wgCopyUploadsFromSpecialUpload"
[21:10:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T306560)', diff saved to https://phabricator.wikimedia.org/P26064 and previous config saved to /var/cache/conftool/dbconfig/20220421-211018-ladsgroup.json
[21:10:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:11:07] <thcipriani>	 Juan_90264: do you need to make a change to your patch?
[21:13:01] <Juan_90264>	 thcipriani: Yes, because if there is no way to use this privilege
[21:13:24] <wikibugs>	 (03PS2) 10Dzahn: gitlab::runner: ensure config dir is owned by non-privileged user [puppet] - 10https://gerrit.wikimedia.org/r/785198 (https://phabricator.wikimedia.org/T297659)
[21:13:30] <thcipriani>	 makes sense, thank you
[21:14:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P26065 and previous config saved to /var/cache/conftool/dbconfig/20220421-211411-ladsgroup.json
[21:14:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:15:04] <wikibugs>	 (03PS1) 10Thcipriani: Revert "Enable '$wgCopyUploadsDomains' to viwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785200
[21:15:27] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] Revert "Enable '$wgCopyUploadsDomains' to viwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785200 (owner: 10Thcipriani)
[21:15:57] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/pcc-worker1001/34947/" [puppet] - 10https://gerrit.wikimedia.org/r/785198 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[21:16:17] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Enable '$wgCopyUploadsDomains' to viwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785200 (owner: 10Thcipriani)
[21:18:49] <Juan_90264>	 Okay reverted
[21:19:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[21:19:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[21:19:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:19:43] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[21:19:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:19:46] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[21:19:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:19:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:20:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P26066 and previous config saved to /var/cache/conftool/dbconfig/20220421-212022-ladsgroup.json
[21:20:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:20:29] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[21:20:44] <Juan_90264>	 thcipriani: I'll have to create another change including what was missing, right? (I ask because I never had to do this)
[21:21:56] <icinga-wm>	 RECOVERY - puppet last run on gitlab-runner2001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[21:22:23] <thcipriani>	 Juan_90264: yes, sorry, should have said—please make a new patch set for that change. Let's schedule it for a different backport window as we're 20 minutes over on this window already. Your second patch shouldn't be merged in this window---you should get code review from someone who works on that extension.
[21:22:29] <mutante>	 ^ yay, got it working (so that a gitlab-runner runs as non-root on bullseye)
[21:22:44] <thcipriani>	 neat! kudos mutante 
[21:23:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P26067 and previous config saved to /var/cache/conftool/dbconfig/20220421-212306-ladsgroup.json
[21:23:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:23:12] <brennen>	 nice mutante 
[21:23:20] <mutante>	 ;) ty, credit to J.elto as well
[21:24:55] <Juan_90264>	 thcipriani: Okay, if I still have time I can create another change to resolve soon. And then I see the problem of the second change
[21:25:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P26068 and previous config saved to /var/cache/conftool/dbconfig/20220421-212523-ladsgroup.json
[21:25:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:25:48] <mutante>	 except.. it fails to "unregister" the existing runner .. 
[21:26:03] <thcipriani>	 Juan_90264: sounds good, and we'll have to deploy that change another day :)
[21:26:08] <mutante>	 "status=only http or https scheme supported"  hmmm
[21:29:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P26069 and previous config saved to /var/cache/conftool/dbconfig/20220421-212916-ladsgroup.json
[21:29:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:44] <Juan_90264>	 thcipriani: Perfect, change created: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/785208
[21:35:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P26070 and previous config saved to /var/cache/conftool/dbconfig/20220421-213529-ladsgroup.json
[21:35:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:37] <wikibugs>	 (03CR) 10Sharvaniharan: "Please review when you get a chance :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/783874 (https://phabricator.wikimedia.org/T306385) (owner: 10Sharvaniharan)
[21:38:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P26071 and previous config saved to /var/cache/conftool/dbconfig/20220421-213811-ladsgroup.json
[21:38:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
[21:38:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
[21:38:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:38:17] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[21:38:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P26072 and previous config saved to /var/cache/conftool/dbconfig/20220421-213819-ladsgroup.json
[21:38:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:38:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:38:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:38:45] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH) perccli64 /c0 show all also shows a physical disk list, we'll want to run to confirm it sees the disk gone when removed.
[21:40:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P26073 and previous config saved to /var/cache/conftool/dbconfig/20220421-214027-ladsgroup.json
[21:40:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:40:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P26074 and previous config saved to /var/cache/conftool/dbconfig/20220421-214035-ladsgroup.json
[21:40:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:40:39] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-runner1001.eqiad.wmnet with reason: reimage
[21:40:42] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab-runner1001.eqiad.wmnet with reason: reimage
[21:40:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:40:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:36] <Juan_90264>	 thcipriani: It's still available? If not, no problem
[21:42:22] <wikibugs>	 (03CR) 10Cwhite: "Manually applied to grafana-next.wm.o for testing.  @phedenskog, does it function as you expect?" [puppet] - 10https://gerrit.wikimedia.org/r/774380 (https://phabricator.wikimedia.org/T304583) (owner: 10Phedenskog)
[21:42:35] <mutante>	 !log shutting down and reimaging gitlab-runner1001 T297659
[21:42:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:42:41] <stashbot>	 T297659: upgrade gitlab-runners to bullseye - https://phabricator.wikimedia.org/T297659
[21:44:18] <wikibugs>	 (03PS3) 10Dzahn: site: use appserver in codfw C3, cleanup duplicate insetup definition [puppet] - 10https://gerrit.wikimedia.org/r/785147 (https://phabricator.wikimedia.org/T290192) (owner: 10Jelto)
[21:44:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26075 and previous config saved to /var/cache/conftool/dbconfig/20220421-214422-ladsgroup.json
[21:44:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:44:28] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[21:44:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[21:44:41] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[21:44:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:44:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26076 and previous config saved to /var/cache/conftool/dbconfig/20220421-214445-ladsgroup.json
[21:44:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:44:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:45:14] <wikibugs>	 (03CR) 10Dzahn: "thank you! the root issue is that we have no workflow that ensures a follow-up task is created after dcops is done procuring" [puppet] - 10https://gerrit.wikimedia.org/r/785147 (https://phabricator.wikimedia.org/T290192) (owner: 10Jelto)
[21:46:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[21:46:54] <icinga-wm>	 PROBLEM - Check systemd state on gitlab-runner2001 is CRITICAL: CRITICAL - degraded: The following units failed: docker-gc.service,docker-resource-monitor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:47:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for cloudcontrol2001-dev:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[21:50:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P26077 and previous config saved to /var/cache/conftool/dbconfig/20220421-215034-ladsgroup.json
[21:50:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:50:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[21:53:29] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10Papaul) @Eevans yes B is row B , 6 is the rack number and U35 is the position of the server in the rack (row  B rack 6 position 35)
[21:55:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P26078 and previous config saved to /var/cache/conftool/dbconfig/20220421-215532-ladsgroup.json
[21:55:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:55:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T306560)', diff saved to https://phabricator.wikimedia.org/P26079 and previous config saved to /var/cache/conftool/dbconfig/20220421-215540-ladsgroup.json
[21:55:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[21:55:43] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[21:55:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:55:45] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[21:55:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3316 (T306560)', diff saved to https://phabricator.wikimedia.org/P26080 and previous config saved to /var/cache/conftool/dbconfig/20220421-215547-ladsgroup.json
[21:55:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:55:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:55:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:58:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T306560)', diff saved to https://phabricator.wikimedia.org/P26081 and previous config saved to /var/cache/conftool/dbconfig/20220421-215807-ladsgroup.json
[21:58:09] <wikibugs>	 10SRE, 10Wikimedia-Site-requests, 10Chinese-Sites: Enable "upload by url" feature at zhwiki - https://phabricator.wikimedia.org/T142991 (10Stang) Boldly move forward since viwiki depolyed this feature ~
[21:58:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:00:37] <mutante>	 !log gitlab-runner2001 - installing apparmor ('apparmor' is the user utilities package and was NOT installed, libapparmor1 WAS installed), this caused bug https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1808456.html after upgrading gitlab-runner to bullseye because bullseye comes with libapparmor1 by default as opposed to before  T297659
[22:00:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:00:44] <stashbot>	 T297659: upgrade gitlab-runners to bullseye - https://phabricator.wikimedia.org/T297659
[22:02:01] <mutante>	 !log gitlab-runner2001 - systemctl start docker-resource-monitor ; systemctl start docker-gc 
[22:02:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:02:26] <icinga-wm>	 RECOVERY - Check systemd state on gitlab-runner2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:05:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P26082 and previous config saved to /var/cache/conftool/dbconfig/20220421-220539-ladsgroup.json
[22:05:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[22:05:43] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[22:05:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[22:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:05:45] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[22:05:47] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[22:05:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:05:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:05:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P26083 and previous config saved to /var/cache/conftool/dbconfig/20220421-220552-ladsgroup.json
[22:05:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:05:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:06:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:10:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P26084 and previous config saved to /var/cache/conftool/dbconfig/20220421-221037-ladsgroup.json
[22:10:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:13:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P26085 and previous config saved to /var/cache/conftool/dbconfig/20220421-221312-ladsgroup.json
[22:13:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:14:30] <wikibugs>	 (03PS1) 10Dzahn: docker: ensure apparmor package is installed if on bullseye [puppet] - 10https://gerrit.wikimedia.org/r/785226
[22:15:03] <wikibugs>	 (03PS2) 10Dzahn: docker: ensure apparmor package is installed if on bullseye [puppet] - 10https://gerrit.wikimedia.org/r/785226
[22:15:14] <wikibugs>	 (03PS3) 10Dzahn: docker: ensure apparmor package is installed if on bullseye [puppet] - 10https://gerrit.wikimedia.org/r/785226
[22:15:20] <wikibugs>	 (03PS4) 10Dzahn: docker: ensure apparmor package is installed if on bullseye [puppet] - 10https://gerrit.wikimedia.org/r/785226
[22:16:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] docker: ensure apparmor package is installed if on bullseye [puppet] - 10https://gerrit.wikimedia.org/r/785226 (owner: 10Dzahn)
[22:17:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26086 and previous config saved to /var/cache/conftool/dbconfig/20220421-221728-ladsgroup.json
[22:17:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:17:34] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[22:18:19] <wikibugs>	 (03PS5) 10Dzahn: docker: ensure apparmor package is installed if on bullseye [puppet] - 10https://gerrit.wikimedia.org/r/785226
[22:21:17] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH) a:05Cmjohnson→03RobH Update:  Chris pulled the offline SSD and I confirmed OS saw it go away, then after 5 minutes put it back into place and the system detected it and started an automatic reb...
[22:22:32] <wikibugs>	 (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/pcc-worker1002/34948/" [puppet] - 10https://gerrit.wikimedia.org/r/785226 (owner: 10Dzahn)
[22:24:38] <wikibugs>	 (03PS1) 10Dzahn: gitlab::runner: if on buster, ensure apparmor package is installed [puppet] - 10https://gerrit.wikimedia.org/r/785228 (https://phabricator.wikimedia.org/T297659)
[22:25:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P26087 and previous config saved to /var/cache/conftool/dbconfig/20220421-222534-ladsgroup.json
[22:25:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:25:41] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[22:25:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P26088 and previous config saved to /var/cache/conftool/dbconfig/20220421-222542-ladsgroup.json
[22:25:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
[22:25:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
[22:25:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:25:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P26089 and previous config saved to /var/cache/conftool/dbconfig/20220421-222550-ladsgroup.json
[22:25:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:25:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:26:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:28:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P26090 and previous config saved to /var/cache/conftool/dbconfig/20220421-222817-ladsgroup.json
[22:28:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:28:34] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/pcc-worker1002/34949/gitlab-runner2001.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/785228 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[22:30:56] <wikibugs>	 (03PS1) 10Stang: Enable "upload_by_url" feature on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/785229 (https://phabricator.wikimedia.org/T142991)
[22:32:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P26091 and previous config saved to /var/cache/conftool/dbconfig/20220421-223233-ladsgroup.json
[22:32:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:32:53] <Juan_90264>	 Returned
[22:32:55] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[22:33:51] <wikibugs>	 10SRE, 10Wikimedia-Site-requests, 10Chinese-Sites, 10Patch-For-Review: Enable "upload by url" feature at zhwiki - https://phabricator.wikimedia.org/T142991 (10Stang) 05Stalled→03Open a:03Stang
[22:33:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P26092 and previous config saved to /var/cache/conftool/dbconfig/20220421-223357-ladsgroup.json
[22:34:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:34:03] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[22:34:51] <wikibugs>	 (03CR) 10Dzahn: "I noticed when running puppet for the first time on a new host there are errors because profile::systemd::timesyncd tries to ensure file /" [puppet] - 10https://gerrit.wikimedia.org/r/730852 (owner: 10Jbond)
[22:36:06] <Juan_90264>	 thcipriani: I'll leave it for another backport window, thanks for taking the time on the first change! (Which unfortunately was later reverted, to add what was missing)
[22:36:33] <wikibugs>	 (03CR) 10Dzahn: standard::ntp: move standard ntp to its own profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/730852 (owner: 10Jbond)
[22:37:39] <Juan_90264>	 Goodbye and good morning, good afternoon or good night!
[22:40:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P26093 and previous config saved to /var/cache/conftool/dbconfig/20220421-224039-ladsgroup.json
[22:40:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:41:52] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH)
[22:42:07] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH)
[22:43:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T306560)', diff saved to https://phabricator.wikimedia.org/P26094 and previous config saved to /var/cache/conftool/dbconfig/20220421-224322-ladsgroup.json
[22:43:23] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH) Todo:  * test on other distros we use * get partman to work with this, as our existing recipes expect the flexbays to be the SDA virtual drive and the new controller always puts them at a higher ID...
[22:43:23] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[22:43:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[22:43:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:28] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[22:43:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
[22:43:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
[22:43:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[22:43:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[22:43:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:44:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[22:44:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[22:44:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:44:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:44:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[22:44:32] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[22:44:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:44:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1098:3316 (T306560)', diff saved to https://phabricator.wikimedia.org/P26095 and previous config saved to /var/cache/conftool/dbconfig/20220421-224437-ladsgroup.json
[22:44:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:44:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:46:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T306560)', diff saved to https://phabricator.wikimedia.org/P26096 and previous config saved to /var/cache/conftool/dbconfig/20220421-224657-ladsgroup.json
[22:47:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:47:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P26097 and previous config saved to /var/cache/conftool/dbconfig/20220421-224738-ladsgroup.json
[22:47:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:49:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P26098 and previous config saved to /var/cache/conftool/dbconfig/20220421-224902-ladsgroup.json
[22:49:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:52:16] <mutante>	 !log gitlab - deleting runner 'ubuntu..something' that has been offline for 2 months, not sure who made it
[22:52:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:53:48] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Kanban): Decom cloudservices200[2,3]-dev.wikimedia.org - https://phabricator.wikimedia.org/T306669 (10Andrew)
[22:54:03] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Kanban): Decom cloudservices200[2,3]-dev.wikimedia.org - https://phabricator.wikimedia.org/T306669 (10Andrew) a:05Papaul→03Andrew
[22:55:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P26099 and previous config saved to /var/cache/conftool/dbconfig/20220421-225544-ladsgroup.json
[22:55:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:01:55] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[23:02:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P26100 and previous config saved to /var/cache/conftool/dbconfig/20220421-230202-ladsgroup.json
[23:02:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:02:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26101 and previous config saved to /var/cache/conftool/dbconfig/20220421-230243-ladsgroup.json
[23:02:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:02:49] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[23:03:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[23:03:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[23:03:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:03:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26102 and previous config saved to /var/cache/conftool/dbconfig/20220421-230307-ladsgroup.json
[23:03:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:03:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:04:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P26103 and previous config saved to /var/cache/conftool/dbconfig/20220421-230408-ladsgroup.json
[23:04:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:10:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P26104 and previous config saved to /var/cache/conftool/dbconfig/20220421-231049-ladsgroup.json
[23:10:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[23:10:52] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[23:10:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:10:56] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[23:10:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:11:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:17:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P26105 and previous config saved to /var/cache/conftool/dbconfig/20220421-231707-ladsgroup.json
[23:17:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:19:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P26106 and previous config saved to /var/cache/conftool/dbconfig/20220421-231913-ladsgroup.json
[23:19:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:19:17] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[23:19:17] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[23:19:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[23:19:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:19:20] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[23:19:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:19:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:19:26] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[23:19:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:21:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T298565)', diff saved to https://phabricator.wikimedia.org/P26107 and previous config saved to /var/cache/conftool/dbconfig/20220421-232153-ladsgroup.json
[23:21:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:25:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[23:25:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[23:25:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:25:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:32:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T306560)', diff saved to https://phabricator.wikimedia.org/P26108 and previous config saved to /var/cache/conftool/dbconfig/20220421-233212-ladsgroup.json
[23:32:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:32:17] <stashbot>	 T306560: Fix nullability of img_major_mime and oi_major_mime - https://phabricator.wikimedia.org/T306560
[23:36:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P26109 and previous config saved to /var/cache/conftool/dbconfig/20220421-233658-ladsgroup.json
[23:37:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:52:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P26110 and previous config saved to /var/cache/conftool/dbconfig/20220421-235203-ladsgroup.json
[23:52:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:58:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[23:58:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[23:58:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:58:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:58:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1175 (T298565)', diff saved to https://phabricator.wikimedia.org/P26111 and previous config saved to /var/cache/conftool/dbconfig/20220421-235814-ladsgroup.json
[23:58:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:58:18] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[23:59:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestagemaster1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown