[00:00:57] RECOVERY - Check systemd state on ml-serve-ctrl1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:03:26] 10SRE, 10Maps: Allow Wikimedia Maps usage on bbcrewind.co.uk - https://phabricator.wikimedia.org/T297968 (10awight) Awkwardly, I went to bbcrewind.co.uk to get an idea of whether they're running MediaWiki and generally how they plan to host Kartotherian-backed maps, but I'm served a page explaining that it's f... [00:22:33] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.629 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [00:26:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23143 and previous config saved to /var/cache/conftool/dbconfig/20220326-002653-ladsgroup.json [00:26:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:26:59] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [00:29:17] RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: All metrics within thresholds. https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [00:41:59] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23144 and previous config saved to /var/cache/conftool/dbconfig/20220326-004159-ladsgroup.json [00:42:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:04] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23145 and previous config saved to /var/cache/conftool/dbconfig/20220326-005704-ladsgroup.json [00:57:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:09:13] PROBLEM - WDQS SPARQL on wdqs2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [01:12:09] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23146 and previous config saved to /var/cache/conftool/dbconfig/20220326-011209-ladsgroup.json [01:12:10] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance [01:12:12] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance [01:12:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:12:15] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [01:12:17] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23147 and previous config saved to /var/cache/conftool/dbconfig/20220326-011216-ladsgroup.json [01:12:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:12:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:12:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:12:45] (03PS1) 10Sharvaniharan: Config for new android schemas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773896 [01:15:50] (03PS2) 10Sharvaniharan: Event stream config for new android schemas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773896 (https://phabricator.wikimedia.org/T304336) [01:16:42] (03CR) 10Sharvaniharan: "Please review when you get a chance you get a chance :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773896 (https://phabricator.wikimedia.org/T304336) (owner: 10Sharvaniharan) [01:16:53] (03CR) 10jerkins-bot: [V: 04-1] Event stream config for new android schemas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773896 (https://phabricator.wikimedia.org/T304336) (owner: 10Sharvaniharan) [01:38:45] (JobUnavailable) firing: (2) Reduced availability for job sidekiq in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:43:45] (JobUnavailable) firing: (2) Reduced availability for job sidekiq in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:12:31] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23148 and previous config saved to /var/cache/conftool/dbconfig/20220326-021231-ladsgroup.json [02:12:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:12:38] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [02:27:36] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23149 and previous config saved to /var/cache/conftool/dbconfig/20220326-022736-ladsgroup.json [02:27:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:42:42] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23150 and previous config saved to /var/cache/conftool/dbconfig/20220326-024241-ladsgroup.json [02:42:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:43:27] 10SRE, 10SRE-OnFire (FY2021/2022-Q3), 10Infrastructure-Foundations, 10SRE Observability (FY2021/2022-Q3): Implement an accurate and easy to understand status page for all wikis - https://phabricator.wikimedia.org/T202061 (10lmata) Thanks @cdanis! also +1 to the webhook/slack proposal [02:57:47] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23151 and previous config saved to /var/cache/conftool/dbconfig/20220326-025746-ladsgroup.json [02:57:48] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance [02:57:50] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance [02:57:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:57:52] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [02:57:55] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23152 and previous config saved to /var/cache/conftool/dbconfig/20220326-025754-ladsgroup.json [02:57:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:57:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:58:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:36:21] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23153 and previous config saved to /var/cache/conftool/dbconfig/20220326-033621-ladsgroup.json [03:36:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:36:27] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [03:51:26] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23154 and previous config saved to /var/cache/conftool/dbconfig/20220326-035126-ladsgroup.json [03:51:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:06:31] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23155 and previous config saved to /var/cache/conftool/dbconfig/20220326-040631-ladsgroup.json [04:06:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:21:37] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23156 and previous config saved to /var/cache/conftool/dbconfig/20220326-042136-ladsgroup.json [04:21:40] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance [04:21:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:21:42] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance [04:21:42] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [04:21:43] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance [04:21:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:21:49] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance [04:21:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:21:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:21:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:27:34] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance [04:27:35] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance [04:27:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:27:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:49:46] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance [04:49:48] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance [04:49:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:49:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:29] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance [05:11:31] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance [05:11:32] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [05:11:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:35] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [05:11:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:40] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23157 and previous config saved to /var/cache/conftool/dbconfig/20220326-051140-ladsgroup.json [05:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:47] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [05:36:08] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23158 and previous config saved to /var/cache/conftool/dbconfig/20220326-053607-ladsgroup.json [05:36:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:36:14] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [05:44:00] (JobUnavailable) firing: Reduced availability for job trafficserver in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:51:13] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23159 and previous config saved to /var/cache/conftool/dbconfig/20220326-055113-ladsgroup.json [05:51:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:40] (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [06:06:18] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23160 and previous config saved to /var/cache/conftool/dbconfig/20220326-060618-ladsgroup.json [06:06:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:23] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23161 and previous config saved to /var/cache/conftool/dbconfig/20220326-062123-ladsgroup.json [06:21:25] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance [06:21:26] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance [06:21:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:29] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [06:21:31] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23162 and previous config saved to /var/cache/conftool/dbconfig/20220326-062131-ladsgroup.json [06:21:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:39] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23163 and previous config saved to /var/cache/conftool/dbconfig/20220326-064139-ladsgroup.json [06:41:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:46] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [06:56:44] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23164 and previous config saved to /var/cache/conftool/dbconfig/20220326-065644-ladsgroup.json [06:56:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220326T0700) [07:04:00] (03PS5) 10NguoiDungKhongDinhDanh: Fix I7ce58529cdd320a9500dc215291ef1c369cee9d3: Rearranging restriction levels and add editautopatrolprotected for eliminators. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773320 (https://phabricator.wikimedia.org/T303579) [07:07:16] (03CR) 10NguoiDungKhongDinhDanh: Fix I7ce58529cdd320a9500dc215291ef1c369cee9d3: Rearranging restriction levels and add editautopatrolprotected for eliminators. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773320 (https://phabricator.wikimedia.org/T303579) (owner: 10NguoiDungKhongDinhDanh) [07:11:49] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23165 and previous config saved to /var/cache/conftool/dbconfig/20220326-071149-ladsgroup.json [07:11:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:26:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23166 and previous config saved to /var/cache/conftool/dbconfig/20220326-072654-ladsgroup.json [07:26:56] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance [07:26:57] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance [07:26:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:00] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [07:27:02] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23167 and previous config saved to /var/cache/conftool/dbconfig/20220326-072702-ladsgroup.json [07:27:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:52:16] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23168 and previous config saved to /var/cache/conftool/dbconfig/20220326-075215-ladsgroup.json [07:52:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:52:21] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [08:07:21] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23169 and previous config saved to /var/cache/conftool/dbconfig/20220326-080720-ladsgroup.json [08:07:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:26] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23170 and previous config saved to /var/cache/conftool/dbconfig/20220326-082225-ladsgroup.json [08:22:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:31] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23171 and previous config saved to /var/cache/conftool/dbconfig/20220326-083731-ladsgroup.json [08:37:32] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [08:37:34] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [08:37:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:36] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [08:37:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:32] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance [08:59:33] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance [08:59:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:38] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23172 and previous config saved to /var/cache/conftool/dbconfig/20220326-085938-ladsgroup.json [08:59:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:43] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [09:23:56] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23173 and previous config saved to /var/cache/conftool/dbconfig/20220326-092355-ladsgroup.json [09:24:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:24:02] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [09:39:01] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23174 and previous config saved to /var/cache/conftool/dbconfig/20220326-093900-ladsgroup.json [09:39:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:00] (JobUnavailable) firing: Reduced availability for job trafficserver in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [09:54:06] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23175 and previous config saved to /var/cache/conftool/dbconfig/20220326-095405-ladsgroup.json [09:54:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:55] (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [10:09:11] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23176 and previous config saved to /var/cache/conftool/dbconfig/20220326-100911-ladsgroup.json [10:09:12] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance [10:09:14] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance [10:09:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:17] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [10:09:19] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23177 and previous config saved to /var/cache/conftool/dbconfig/20220326-100918-ladsgroup.json [10:09:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:38] (03PS1) 10Majavah: hieradata: remove bastion-eqiad1-01,02 [puppet] - 10https://gerrit.wikimedia.org/r/773921 [10:35:59] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23178 and previous config saved to /var/cache/conftool/dbconfig/20220326-103559-ladsgroup.json [10:36:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:04] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [10:51:04] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23179 and previous config saved to /var/cache/conftool/dbconfig/20220326-105104-ladsgroup.json [10:51:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:31] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3387 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [11:06:09] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23180 and previous config saved to /var/cache/conftool/dbconfig/20220326-110609-ladsgroup.json [11:06:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23181 and previous config saved to /var/cache/conftool/dbconfig/20220326-112114-ladsgroup.json [11:21:16] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance [11:21:17] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance [11:21:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:20] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [11:21:22] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23182 and previous config saved to /var/cache/conftool/dbconfig/20220326-112122-ladsgroup.json [11:21:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:35] RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: (C)0.3 gt (W)0.1 gt 0.01613 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [12:04:14] (03PS7) 10Daimona Eaytoy: Relax CSP rules for taint-check-demo [puppet] - 10https://gerrit.wikimedia.org/r/680337 (https://phabricator.wikimedia.org/T257301) [12:05:04] (03CR) 10Daimona Eaytoy: Relax CSP rules for taint-check-demo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/680337 (https://phabricator.wikimedia.org/T257301) (owner: 10Daimona Eaytoy) [12:21:38] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23183 and previous config saved to /var/cache/conftool/dbconfig/20220326-122136-ladsgroup.json [12:21:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:46] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [12:36:43] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23184 and previous config saved to /var/cache/conftool/dbconfig/20220326-123643-ladsgroup.json [12:36:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:51:48] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23185 and previous config saved to /var/cache/conftool/dbconfig/20220326-125148-ladsgroup.json [12:51:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:44] (03CR) 10Andrew Bogott: [C: 03+2] hieradata: remove bastion-eqiad1-01,02 [puppet] - 10https://gerrit.wikimedia.org/r/773921 (owner: 10Majavah) [13:06:53] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23186 and previous config saved to /var/cache/conftool/dbconfig/20220326-130653-ladsgroup.json [13:06:55] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance [13:06:56] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance [13:06:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:59] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [13:07:01] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23187 and previous config saved to /var/cache/conftool/dbconfig/20220326-130701-ladsgroup.json [13:07:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:23] PROBLEM - SSH on db2090.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [13:27:57] PROBLEM - SSH on aqs1009.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [13:33:49] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23188 and previous config saved to /var/cache/conftool/dbconfig/20220326-133349-ladsgroup.json [13:33:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:55] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [13:44:00] (JobUnavailable) firing: Reduced availability for job trafficserver in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [13:48:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23189 and previous config saved to /var/cache/conftool/dbconfig/20220326-134854-ladsgroup.json [13:48:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:54:55] PROBLEM - Query Service HTTP Port on wdqs2002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 298 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [13:59:09] PROBLEM - SSH on thumbor2004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:01:55] (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [14:03:59] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23190 and previous config saved to /var/cache/conftool/dbconfig/20220326-140359-ladsgroup.json [14:04:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:06] (03PS1) 10Majavah: kubeadm: label nodes with nfs mounts [puppet] - 10https://gerrit.wikimedia.org/r/773933 (https://phabricator.wikimedia.org/T304708) [14:09:05] (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34571/console" [puppet] - 10https://gerrit.wikimedia.org/r/773933 (https://phabricator.wikimedia.org/T304708) (owner: 10Majavah) [14:19:05] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23191 and previous config saved to /var/cache/conftool/dbconfig/20220326-141904-ladsgroup.json [14:19:06] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance [14:19:07] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance [14:19:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:11] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [14:19:12] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23192 and previous config saved to /var/cache/conftool/dbconfig/20220326-141912-ladsgroup.json [14:19:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:55] RECOVERY - SSH on aqs1009.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:43:21] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23193 and previous config saved to /var/cache/conftool/dbconfig/20220326-144320-ladsgroup.json [14:43:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:26] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [14:58:26] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23194 and previous config saved to /var/cache/conftool/dbconfig/20220326-145825-ladsgroup.json [14:58:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:17] RECOVERY - SSH on thumbor2004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:13:31] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23195 and previous config saved to /var/cache/conftool/dbconfig/20220326-151330-ladsgroup.json [15:13:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:36] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23196 and previous config saved to /var/cache/conftool/dbconfig/20220326-152835-ladsgroup.json [15:28:37] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [15:28:39] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [15:28:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:42] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [15:28:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:53] 10SRE-swift-storage, 10Commons, 10MediaWiki-File-management, 10MediaWiki-extensions-PagedTiffHandler, 10Thumbor: Specific Thumbnail generation for a broken invalid TIF file not working anymore - https://phabricator.wikimedia.org/T133175 (10Stang) Another file [[ https://commons.wikimedia.org/wiki/File:S... [15:50:19] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance [15:50:21] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance [15:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:26] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23197 and previous config saved to /var/cache/conftool/dbconfig/20220326-155025-ladsgroup.json [15:50:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:31] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [15:54:55] (03PS1) 10Ladsgroup: Enable videojs in the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773938 (https://phabricator.wikimedia.org/T248418) [16:00:16] !log start of mwscript maintenance/migrateLinksTable.php --wiki enwiki --table templatelinks --sleep 2 on beta cluster (T299424) [16:00:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:21] T299424: Run maintenance script backfilling tl_title_id - https://phabricator.wikimedia.org/T299424 [16:06:39] 10SRE, 10Wikimedia-Mailing-lists: Email spam from varying tawk.email addresses - https://phabricator.wikimedia.org/T304390 (10Ladsgroup) `.+\.tawk\.email$` is a valid regex :/ IIRC mm2 had something to imply that it's a regrex. Can you try `^.+\.tawk\.email$`? [16:11:22] RECOVERY - SSH on db2090.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:15:24] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23198 and previous config saved to /var/cache/conftool/dbconfig/20220326-161523-ladsgroup.json [16:15:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:30] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [16:30:29] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23199 and previous config saved to /var/cache/conftool/dbconfig/20220326-163029-ladsgroup.json [16:30:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:34] PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:39:36] (03CR) 10Krinkle: [C: 03+1] Relax CSP rules for taint-check-demo [puppet] - 10https://gerrit.wikimedia.org/r/680337 (https://phabricator.wikimedia.org/T257301) (owner: 10Daimona Eaytoy) [16:45:34] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23200 and previous config saved to /var/cache/conftool/dbconfig/20220326-164534-ladsgroup.json [16:45:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:39] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23201 and previous config saved to /var/cache/conftool/dbconfig/20220326-170039-ladsgroup.json [17:00:40] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance [17:00:42] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance [17:00:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:46] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [17:00:47] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23202 and previous config saved to /var/cache/conftool/dbconfig/20220326-170047-ladsgroup.json [17:00:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:06:14] PROBLEM - Check systemd state on ms-be1064 is CRITICAL: CRITICAL - degraded: The following units failed: swift-drive-audit.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:07:46] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23203 and previous config saved to /var/cache/conftool/dbconfig/20220326-170745-ladsgroup.json [17:07:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:07:51] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [17:22:52] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23204 and previous config saved to /var/cache/conftool/dbconfig/20220326-172250-ladsgroup.json [17:22:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:58] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23205 and previous config saved to /var/cache/conftool/dbconfig/20220326-173757-ladsgroup.json [17:38:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:42] RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [17:44:00] (JobUnavailable) firing: Reduced availability for job trafficserver in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [17:53:03] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23206 and previous config saved to /var/cache/conftool/dbconfig/20220326-175302-ladsgroup.json [17:53:05] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance [17:53:06] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance [17:53:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:07] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [17:53:08] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [17:53:11] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [17:53:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:16] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23207 and previous config saved to /var/cache/conftool/dbconfig/20220326-175315-ladsgroup.json [17:53:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:01:46] RECOVERY - Check systemd state on ms-be1064 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:01:55] (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [18:17:29] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23208 and previous config saved to /var/cache/conftool/dbconfig/20220326-181729-ladsgroup.json [18:17:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:34] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [18:32:34] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23209 and previous config saved to /var/cache/conftool/dbconfig/20220326-183234-ladsgroup.json [18:32:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:39] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23210 and previous config saved to /var/cache/conftool/dbconfig/20220326-184739-ladsgroup.json [18:47:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:44] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23211 and previous config saved to /var/cache/conftool/dbconfig/20220326-190244-ladsgroup.json [19:02:46] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance [19:02:47] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance [19:02:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:50] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [19:02:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:04] PROBLEM - SSH on db2090.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:24:48] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance [19:24:49] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance [19:24:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:46] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance [19:46:47] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance [19:46:48] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance [19:46:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:54] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance [19:46:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:39] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance [19:52:41] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance [19:52:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:46] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23212 and previous config saved to /var/cache/conftool/dbconfig/20220326-195245-ladsgroup.json [19:52:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:52] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [20:18:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23213 and previous config saved to /var/cache/conftool/dbconfig/20220326-201854-ladsgroup.json [20:18:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:19:00] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [20:27:32] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.371 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [20:33:59] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23214 and previous config saved to /var/cache/conftool/dbconfig/20220326-203359-ladsgroup.json [20:34:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:45:28] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.371 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [20:49:04] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23216 and previous config saved to /var/cache/conftool/dbconfig/20220326-204904-ladsgroup.json [20:49:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:49:56] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.4194 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [20:51:39] 10SRE, 10Wikimedia-Mailing-lists: Mailman3: 550-Support for list subscription via email has been disabled. - https://phabricator.wikimedia.org/T303888 (10Ladsgroup) I looked at the default templates (in `modules/mailman3/files/templates`) and couldn't find anything advertising subscription via email. Can you t... [21:04:10] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23217 and previous config saved to /var/cache/conftool/dbconfig/20220326-210409-ladsgroup.json [21:04:11] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance [21:04:12] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance [21:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:16] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [21:04:18] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23218 and previous config saved to /var/cache/conftool/dbconfig/20220326-210417-ladsgroup.json [21:04:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:27:18] RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: (C)0.3 gt (W)0.1 gt 0.08065 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [21:44:00] (JobUnavailable) firing: Reduced availability for job trafficserver in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [21:47:26] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3871 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [22:00:46] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3226 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [22:01:55] (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [22:04:32] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23219 and previous config saved to /var/cache/conftool/dbconfig/20220326-220432-ladsgroup.json [22:04:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:04:39] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [22:08:21] 10SRE, 10Wikimedia-Mailing-lists: Mailman3: 550-Support for list subscription via email has been disabled. - https://phabricator.wikimedia.org/T303888 (10Urbanecm) >>! In T303888#7808476, @Ladsgroup wrote: > I looked at the default templates (in `modules/mailman3/files/templates`) and couldn't find anything ad... [22:19:37] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23220 and previous config saved to /var/cache/conftool/dbconfig/20220326-221937-ladsgroup.json [22:19:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:22:48] (03PS2) 10Krinkle: tests: Remove leftover wmfConfigDir global [mediawiki-config] - 10https://gerrit.wikimedia.org/r/769757 [22:23:08] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.371 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [22:25:22] RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: (C)0.3 gt (W)0.1 gt 0.09677 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [22:30:44] (03CR) 10Krinkle: [C: 03+2] tests: Remove leftover wmfConfigDir global [mediawiki-config] - 10https://gerrit.wikimedia.org/r/769757 (owner: 10Krinkle) [22:31:25] (03Merged) 10jenkins-bot: tests: Remove leftover wmfConfigDir global [mediawiki-config] - 10https://gerrit.wikimedia.org/r/769757 (owner: 10Krinkle) [22:32:08] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3871 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [22:34:42] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23221 and previous config saved to /var/cache/conftool/dbconfig/20220326-223442-ladsgroup.json [22:34:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:37:06] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [22:37:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:07] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [22:38:08] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [22:38:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:39:02] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [22:39:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:49:47] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23222 and previous config saved to /var/cache/conftool/dbconfig/20220326-224947-ladsgroup.json [22:49:49] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance [22:49:50] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance [22:49:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:49:53] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [22:49:55] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23223 and previous config saved to /var/cache/conftool/dbconfig/20220326-224955-ladsgroup.json [22:49:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:49:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:50:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:54:26] RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: (C)0.3 gt (W)0.1 gt 0.08065 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [23:12:24] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3226 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [23:15:07] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23224 and previous config saved to /var/cache/conftool/dbconfig/20220326-231507-ladsgroup.json [23:15:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:15:13] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565 [23:27:12] PROBLEM - SSH on ml-serve-ctrl1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [23:27:58] (KubernetesCalicoDown) firing: ml-serve-ctrl1002.eqiad.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [23:28:06] RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: (C)0.3 gt (W)0.1 gt 0.06452 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [23:28:45] (JobUnavailable) firing: (2) Reduced availability for job k8s-api in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [23:30:13] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23225 and previous config saved to /var/cache/conftool/dbconfig/20220326-233012-ladsgroup.json [23:30:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:34:52] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.4032 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [23:35:58] (KubernetesRsyslogDown) firing: rsyslog on ml-serve1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [23:36:08] RECOVERY - SSH on ml-serve-ctrl1002 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [23:39:22] PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3387 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1 [23:40:58] (KubernetesRsyslogDown) resolved: rsyslog on ml-serve1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [23:45:14] PROBLEM - SSH on ml-serve-ctrl1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [23:45:18] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23226 and previous config saved to /var/cache/conftool/dbconfig/20220326-234517-ladsgroup.json [23:45:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:08] RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: (C)0.3 gt (W)0.1 gt 0.06452 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1