[00:02:13] <icinga-wm>	 RECOVERY - Check systemd state on ml-staging-ctrl2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:02:16] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: Blazegraph instance wdqs1005:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[00:02:45] <icinga-wm>	 PROBLEM - Maps tiles generation on alert1001 is CRITICAL: CRITICAL: 100.00% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/d/000000305/maps-performances?orgId=1&viewPanel=8
[00:03:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24350 and previous config saved to /var/cache/conftool/dbconfig/20220411-000302-ladsgroup.json
[00:03:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:03:08] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[00:04:17] <icinga-wm>	 PROBLEM - Widespread puppet agent failures- no resources reported on alert1001 is CRITICAL: 0.01002 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[00:13:01] <icinga-wm>	 RECOVERY - Check systemd state on elastic1054 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:18:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24351 and previous config saved to /var/cache/conftool/dbconfig/20220411-001807-ladsgroup.json
[00:18:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:19:45] <wikibugs>	 (03PS1) 10BryanDavis: dev: Update Vagrantfile to Debian Bullseye [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/778682
[00:19:47] <wikibugs>	 (03PS1) 10BryanDavis: Add perl532-sssd [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/778683 (https://phabricator.wikimedia.org/T214343)
[00:22:01] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] dev: Update Vagrantfile to Debian Bullseye [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/778682 (owner: 10BryanDavis)
[00:23:07] <wikibugs>	 (03Merged) 10jenkins-bot: dev: Update Vagrantfile to Debian Bullseye [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/778682 (owner: 10BryanDavis)
[00:33:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24352 and previous config saved to /var/cache/conftool/dbconfig/20220411-003312-ladsgroup.json
[00:33:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:48:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24353 and previous config saved to /var/cache/conftool/dbconfig/20220411-004817-ladsgroup.json
[00:48:19] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[00:48:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:48:21] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[00:48:22] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[00:48:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:48:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:48:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24354 and previous config saved to /var/cache/conftool/dbconfig/20220411-004826-ladsgroup.json
[00:48:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:59:39] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is CRITICAL: 49.59 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6
[01:00:03] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on alert1001 is CRITICAL: 48.43 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6
[01:01:55] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6
[01:02:19] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6
[01:15:59] <icinga-wm>	 RECOVERY - SSH on aqs1009.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:38:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:43:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24355 and previous config saved to /var/cache/conftool/dbconfig/20220411-014316-ladsgroup.json
[01:43:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:43:22] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[01:43:45] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:58:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24356 and previous config saved to /var/cache/conftool/dbconfig/20220411-015822-ladsgroup.json
[01:58:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:11:16] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1004:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[02:13:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24357 and previous config saved to /var/cache/conftool/dbconfig/20220411-021327-ladsgroup.json
[02:13:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:19:15] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s3 on db2139 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1359.40 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[02:28:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24358 and previous config saved to /var/cache/conftool/dbconfig/20220411-022832-ladsgroup.json
[02:28:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[02:28:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:28:35] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[02:28:37] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[02:28:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:28:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:28:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24359 and previous config saved to /var/cache/conftool/dbconfig/20220411-022840-ladsgroup.json
[02:28:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:44:27] <icinga-wm>	 PROBLEM - SSH on wtp1048.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:11:03] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s3 on db1145 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1312.94 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[03:21:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24360 and previous config saved to /var/cache/conftool/dbconfig/20220411-032132-ladsgroup.json
[03:21:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:21:38] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[03:36:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24361 and previous config saved to /var/cache/conftool/dbconfig/20220411-033638-ladsgroup.json
[03:36:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:45:35] <icinga-wm>	 RECOVERY - SSH on wtp1048.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:51:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24362 and previous config saved to /var/cache/conftool/dbconfig/20220411-035143-ladsgroup.json
[03:51:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:02:16] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: Blazegraph instance wdqs1005:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[04:06:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24363 and previous config saved to /var/cache/conftool/dbconfig/20220411-040648-ladsgroup.json
[04:06:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[04:06:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:06:51] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[04:06:52] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[04:06:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:06:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:06:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24364 and previous config saved to /var/cache/conftool/dbconfig/20220411-040656-ladsgroup.json
[04:06:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:16:29] <wikibugs>	 (03CR) 10Santhosh: [C: 03+1] Add SectionTranslation entry points as campaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778381 (https://phabricator.wikimedia.org/T298029) (owner: 10KartikMistry)
[04:16:37] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s3 on db2139 is OK: OK slave_sql_lag Replication lag: 0.45 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[04:40:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1164', diff saved to https://phabricator.wikimedia.org/P24365 and previous config saved to /var/cache/conftool/dbconfig/20220411-044058-root.json
[04:41:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:42:56] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[04:42:57] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[04:42:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:42:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:43:02] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1127 (T297189)', diff saved to https://phabricator.wikimedia.org/P24366 and previous config saved to /var/cache/conftool/dbconfig/20220411-044302-marostegui.json
[04:43:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:43:06] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namesapce - https://phabricator.wikimedia.org/T297189
[04:47:57] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1134: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/778634
[04:53:24] <wikibugs>	 (03CR) 10Marostegui: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/778634 (owner: 10Marostegui)
[04:55:33] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db1134: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/778634 (owner: 10Marostegui)
[04:58:14] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Promote db1160 to s4 master [puppet] - 10https://gerrit.wikimedia.org/r/778688 (https://phabricator.wikimedia.org/T304933)
[04:58:33] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Wait for the failover date" [puppet] - 10https://gerrit.wikimedia.org/r/778688 (https://phabricator.wikimedia.org/T304933) (owner: 10Marostegui)
[04:59:36] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Update s4 CNAME [dns] - 10https://gerrit.wikimedia.org/r/778689 (https://phabricator.wikimedia.org/T304933)
[05:00:01] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Wait for the failover date" [dns] - 10https://gerrit.wikimedia.org/r/778689 (https://phabricator.wikimedia.org/T304933) (owner: 10Marostegui)
[05:00:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24367 and previous config saved to /var/cache/conftool/dbconfig/20220411-050055-ladsgroup.json
[05:00:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:00:59] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[05:15:01] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s3 on db1145 is OK: OK slave_sql_lag Replication lag: 0.37 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[05:16:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24368 and previous config saved to /var/cache/conftool/dbconfig/20220411-051600-ladsgroup.json
[05:16:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:20:59] <icinga-wm>	 PROBLEM - SSH on aqs1009.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:24:49] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Marostegui)
[05:26:39] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Marostegui) @Papaul the databases and es2029/es2030 are ready for relocation. Please turn them ON once you are done For what is worth, es2029 and es2030 are scheduled to be done 14th, whic...
[05:31:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24369 and previous config saved to /var/cache/conftool/dbconfig/20220411-053105-ladsgroup.json
[05:31:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:35:07] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:43:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1164', diff saved to https://phabricator.wikimedia.org/P24370 and previous config saved to /var/cache/conftool/dbconfig/20220411-054306-root.json
[05:43:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:45:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1164', diff saved to https://phabricator.wikimedia.org/P24371 and previous config saved to /var/cache/conftool/dbconfig/20220411-054508-root.json
[05:45:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:46:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24372 and previous config saved to /var/cache/conftool/dbconfig/20220411-054610-ladsgroup.json
[05:46:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[05:46:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:46:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[05:46:14] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[05:46:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:46:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:46:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24373 and previous config saved to /var/cache/conftool/dbconfig/20220411-054618-ladsgroup.json
[05:46:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:49:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1164', diff saved to https://phabricator.wikimedia.org/P24374 and previous config saved to /var/cache/conftool/dbconfig/20220411-054902-root.json
[05:49:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:50:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T297189)', diff saved to https://phabricator.wikimedia.org/P24375 and previous config saved to /var/cache/conftool/dbconfig/20220411-055037-marostegui.json
[05:50:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:50:41] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namesapce - https://phabricator.wikimedia.org/T297189
[05:56:34] <icinga-wm>	 PROBLEM - SSH on wtp1045.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:05:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24376 and previous config saved to /var/cache/conftool/dbconfig/20220411-060542-marostegui.json
[06:05:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:11:16] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1004:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[06:16:37] <wikibugs>	 (03CR) 10Nik Gkountas: [C: 03+1] Add SectionTranslation entry points as campaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778381 (https://phabricator.wikimedia.org/T298029) (owner: 10KartikMistry)
[06:20:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24377 and previous config saved to /var/cache/conftool/dbconfig/20220411-062047-marostegui.json
[06:20:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:35:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T297189)', diff saved to https://phabricator.wikimedia.org/P24378 and previous config saved to /var/cache/conftool/dbconfig/20220411-063552-marostegui.json
[06:35:54] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[06:35:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:35:56] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[06:35:57] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namesapce - https://phabricator.wikimedia.org/T297189
[06:35:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:36:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:36:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1174 (T297189)', diff saved to https://phabricator.wikimedia.org/P24379 and previous config saved to /var/cache/conftool/dbconfig/20220411-063601-marostegui.json
[06:36:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:40:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24380 and previous config saved to /var/cache/conftool/dbconfig/20220411-064033-ladsgroup.json
[06:40:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:40:37] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[06:55:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24381 and previous config saved to /var/cache/conftool/dbconfig/20220411-065538-ladsgroup.json
[06:55:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:57:48] <icinga-wm>	 RECOVERY - SSH on wtp1045.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:10:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24382 and previous config saved to /var/cache/conftool/dbconfig/20220411-071043-ladsgroup.json
[07:10:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:25:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24383 and previous config saved to /var/cache/conftool/dbconfig/20220411-072548-ladsgroup.json
[07:25:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[07:25:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:25:51] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[07:25:52] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[07:25:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:25:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:25:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24384 and previous config saved to /var/cache/conftool/dbconfig/20220411-072556-ladsgroup.json
[07:25:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:35:21] <dcausse>	 !log restarting blazegraph on wdqs1004 (BlazegraphFreeAllocatorsDecreasingRapidly fired over the week-end)
[07:35:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:36:15] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T297189)', diff saved to https://phabricator.wikimedia.org/P24385 and previous config saved to /var/cache/conftool/dbconfig/20220411-073615-marostegui.json
[07:36:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:36:19] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namesapce - https://phabricator.wikimedia.org/T297189
[07:45:29] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10MoritzMuehlenhoff)
[07:46:01] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs1004:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[07:51:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24386 and previous config saved to /var/cache/conftool/dbconfig/20220411-075120-marostegui.json
[07:51:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:52:15] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1163', diff saved to https://phabricator.wikimedia.org/P24387 and previous config saved to /var/cache/conftool/dbconfig/20220411-075214-root.json
[07:52:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:51] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics@63cbb55]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@63cbb55]
[07:57:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:47] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1163', diff saved to https://phabricator.wikimedia.org/P24388 and previous config saved to /var/cache/conftool/dbconfig/20220411-080047-root.json
[08:00:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:02:13] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics@63cbb55]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@63cbb55] (duration: 04m 21s)
[08:02:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:02:16] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: Blazegraph instance wdqs1005:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[08:03:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1163', diff saved to https://phabricator.wikimedia.org/P24389 and previous config saved to /var/cache/conftool/dbconfig/20220411-080344-root.json
[08:03:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:04:02] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1135', diff saved to https://phabricator.wikimedia.org/P24390 and previous config saved to /var/cache/conftool/dbconfig/20220411-080402-root.json
[08:04:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:06:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24391 and previous config saved to /var/cache/conftool/dbconfig/20220411-080625-marostegui.json
[08:06:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:19:00] <icinga-wm>	 PROBLEM - Check systemd state on dumpsdata1002 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_rasdaemon.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:21:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T297189)', diff saved to https://phabricator.wikimedia.org/P24392 and previous config saved to /var/cache/conftool/dbconfig/20220411-082130-marostegui.json
[08:21:32] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[08:21:33] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[08:21:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:36] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namesapce - https://phabricator.wikimedia.org/T297189
[08:21:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:42] <kart_>	 Can anyone update Deployments page on Wikitech? I'm not sure how to do it.
[08:22:56] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics_test@a337e34]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics_test@a337e34]
[08:22:57] <kart_>	 ie https://wikitech.wikimedia.org/wiki/Deployments lacking this and next week's schedule.
[08:22:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:04] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics_test@a337e34]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics_test@a337e34] (duration: 00m 08s)
[08:23:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:18] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics@a337e34]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@a337e34]
[08:23:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:26] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics@a337e34]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@a337e34] (duration: 00m 07s)
[08:23:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24393 and previous config saved to /var/cache/conftool/dbconfig/20220411-082456-ladsgroup.json
[08:24:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:25:00] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[08:27:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) resolved: Blazegraph instance wdqs1005:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[08:29:18] <wikibugs>	 10SRE, 10Prod-Kubernetes, 10Traffic, 10serviceops, 10Kubernetes: service:.catalog entries and dnsdisc for Kubernetes services under Ingress - https://phabricator.wikimedia.org/T305358 (10JMeybohm)
[08:34:52] <Lucas_WMDE>	 jouncebot: nowandnext
[08:34:52] <jouncebot>	 No deployments scheduled for the forseeable future!
[08:34:52] <jouncebot>	 No deployments scheduled for the forseeable future!
[08:35:33] <kart_>	 Lucas_WMDE: https://wikitech.wikimedia.org/wiki/Deployments seems not updated :)
[08:35:37] <Lucas_WMDE>	 yes
[08:38:48] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:40:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24394 and previous config saved to /var/cache/conftool/dbconfig/20220411-084001-ladsgroup.json
[08:40:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: Blazegraph instance wdqs1005:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[08:55:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24395 and previous config saved to /var/cache/conftool/dbconfig/20220411-085506-ladsgroup.json
[08:55:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:57:57] * Lucas_WMDE experimenting on mwdebug1001
[08:59:33] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2022-04-11-085026-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/778988 (https://phabricator.wikimedia.org/T305125)
[08:59:45] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.reimage for host backup2007.codfw.wmnet with OS bullseye
[08:59:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:01:30] * Lucas_WMDE done
[09:07:57] * kart_ updating cxserver..
[09:10:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24396 and previous config saved to /var/cache/conftool/dbconfig/20220411-091011-ladsgroup.json
[09:10:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:17] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[09:10:18] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[09:10:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[09:10:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:20] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
[09:10:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:30] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
[09:10:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:11:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1135', diff saved to https://phabricator.wikimedia.org/P24397 and previous config saved to /var/cache/conftool/dbconfig/20220411-091103-root.json
[09:11:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:11:51] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2022-04-11-085026-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/778988 (https://phabricator.wikimedia.org/T305125) (owner: 10KartikMistry)
[09:13:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1135', diff saved to https://phabricator.wikimedia.org/P24398 and previous config saved to /var/cache/conftool/dbconfig/20220411-091319-root.json
[09:13:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:14:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) resolved: Blazegraph instance wdqs1005:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[09:14:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1135', diff saved to https://phabricator.wikimedia.org/P24399 and previous config saved to /var/cache/conftool/dbconfig/20220411-091455-root.json
[09:14:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:15:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P24400 and previous config saved to /var/cache/conftool/dbconfig/20220411-091512-root.json
[09:15:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:05] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2022-04-11-085026-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/778988 (https://phabricator.wikimedia.org/T305125) (owner: 10KartikMistry)
[09:17:28] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] START helmfile.d/services/cxserver: apply
[09:17:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:17:59] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[09:18:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:19:50] <logmsgbot>	 !log jynus@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on backup2007.codfw.wmnet with reason: host reimage
[09:19:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:22:24] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[09:22:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:23:17] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[09:23:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:23:49] <wikibugs>	 (03PS1) 10MMandere: cache::varnish: Merge repeating host data to site data [puppet] - 10https://gerrit.wikimedia.org/r/778989 (https://phabricator.wikimedia.org/T290005)
[09:24:29] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
[09:24:31] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
[09:24:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:32] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 16:00:00 on 9 hosts with reason: Maintenance
[09:24:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:39] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 9 hosts with reason: Maintenance
[09:24:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:05] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2007.codfw.wmnet with reason: host reimage
[09:25:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:50] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[09:25:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:55] <wikibugs>	 (03CR) 10Volans: [C: 03+2] "PCC confirms noop https://puppet-compiler.wmflabs.org/pcc-worker1001/34763/" [puppet] - 10https://gerrit.wikimedia.org/r/778331 (owner: 10Volans)
[09:26:45] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[09:26:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:09] <kart_>	 !log Updated cxserver to 2022-04-11-085026-production (T305125)
[09:28:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:13] <stashbot>	 T305125: Enable Flores as the default service for Icelandic, Igbo and Zulu - https://phabricator.wikimedia.org/T305125
[09:28:22] <wikibugs>	 (03PS6) 10Volans: spicerack: install service::catalog configuration [puppet] - 10https://gerrit.wikimedia.org/r/778333
[09:29:42] <wikibugs>	 (03CR) 10MMandere: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34765/console" [puppet] - 10https://gerrit.wikimedia.org/r/778989 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[09:30:11] <wikibugs>	 (03CR) 10Volans: [C: 03+2] "PCC happy https://puppet-compiler.wmflabs.org/pcc-worker1001/34764/cumin1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/778333 (owner: 10Volans)
[09:36:54] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] wmnet: Update s4 CNAME [dns] - 10https://gerrit.wikimedia.org/r/778689 (https://phabricator.wikimedia.org/T304933) (owner: 10Marostegui)
[09:37:12] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] mariadb: Promote db1160 to s4 master [puppet] - 10https://gerrit.wikimedia.org/r/778688 (https://phabricator.wikimedia.org/T304933) (owner: 10Marostegui)
[09:38:52] <Amir1>	 jouncebot: nowandnext
[09:38:52] <jouncebot>	 No deployments scheduled for the forseeable future!
[09:38:52] <jouncebot>	 No deployments scheduled for the forseeable future!
[09:39:01] <Amir1>	 interesting
[09:39:34] <zabe>	 https://wikitech.wikimedia.org/wiki/Deployments has not been updated for this week yet
[09:39:35] <logmsgbot>	 !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2007.codfw.wmnet with OS bullseye
[09:39:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:41] <Lucas_WMDE>	 thcipriani: can we haz new calendar?
[09:39:54] <Amir1>	 it can be because of Easter?
[09:40:22] <Lucas_WMDE>	 https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar only has Friday as a no-deploy day, not the whole week
[09:40:44] <Amir1>	 noted
[09:41:04] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Older browser do not return a promise from .play() [extensions/TimedMediaHandler] (wmf/1.39.0-wmf.6) - 10https://gerrit.wikimedia.org/r/778238 (https://phabricator.wikimedia.org/T304705) (owner: 10TheDJ)
[09:41:47] <wikibugs>	 (03PS2) 10Ladsgroup: Enable videojs on wiktionary wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778197 (https://phabricator.wikimedia.org/T248418)
[09:41:51] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Enable videojs on wiktionary wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778197 (https://phabricator.wikimedia.org/T248418) (owner: 10Ladsgroup)
[09:42:36] <wikibugs>	 (03Merged) 10jenkins-bot: Enable videojs on wiktionary wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778197 (https://phabricator.wikimedia.org/T248418) (owner: 10Ladsgroup)
[09:44:26] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:778197|Enable videojs on wiktionary wikis (T248418)]] (duration: 00m 52s)
[09:44:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:44:30] <stashbot>	 T248418: Roll out videojs as the only video/audio player on all Wikimedia wikis - https://phabricator.wikimedia.org/T248418
[09:46:33] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[09:46:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:36] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[09:46:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[09:46:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[09:46:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:26] <wikibugs>	 (03Merged) 10jenkins-bot: Older browser do not return a promise from .play() [extensions/TimedMediaHandler] (wmf/1.39.0-wmf.6) - 10https://gerrit.wikimedia.org/r/778238 (https://phabricator.wikimedia.org/T304705) (owner: 10TheDJ)
[09:57:38] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Kormat) >>! In T305469#7843940, @Marostegui wrote: > For what is worth, es2029 and es2030 are scheduled to be done 14th, which is a bank holiday for me, so someone else would need to bring...
[09:58:19] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[09:58:21] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[09:58:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:58:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:58:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24401 and previous config saved to /var/cache/conftool/dbconfig/20220411-095826-ladsgroup.json
[09:58:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:58:29] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[09:58:35] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Marostegui) Thanks @Kormat
[09:58:59] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.39.0-wmf.6/extensions/TimedMediaHandler/resources/ext.tmh.player.element.js: Backport: [[gerrit:778238|Older browser do not return a promise from .play() (T304705)]] (duration: 00m 52s)
[09:59:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:59:02] <stashbot>	 T304705: videojs TypeError: Cannot read property 'then' of undefined - https://phabricator.wikimedia.org/T304705
[10:01:21] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[10:01:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:01:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[10:01:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:01:50] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[10:01:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[10:01:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:01:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:01:55] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[10:01:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:16] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:06:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[10:06:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:07:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[10:07:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[10:07:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:07:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:07:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[10:07:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:21] <wikibugs>	 10SRE, 10conftool: Provide a meaningful Retry-After value - https://phabricator.wikimedia.org/T305824 (10Vgutierrez)
[10:10:44] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[10:10:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:11] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main
[10:11:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:38] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2019 is CRITICAL: /en.wikipedia.org/v1/page/talk/{title} (Get structured talk page for enwiki Salt article) is CRITICAL: Test Get structured talk page for enwiki Salt article returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:27:52] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:30:18] <wikibugs>	 (03CR) 10MVernon: [C: 04-1] "Hi," [puppet] - 10https://gerrit.wikimedia.org/r/773298 (https://phabricator.wikimedia.org/T269108) (owner: 10Jcrespo)
[10:33:25] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[10:33:27] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[10:33:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[10:33:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:32] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[10:33:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1158 (T297189)', diff saved to https://phabricator.wikimedia.org/P24402 and previous config saved to /var/cache/conftool/dbconfig/20220411-103336-marostegui.json
[10:33:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:40] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[10:34:54] <wikibugs>	 (03CR) 10Filippo Giunchedi: prometheus: enable prometheus web access via proxy with IDP (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944) (owner: 10Herron)
[10:37:18] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 04-1] "Generally LGTM, see inline" [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944) (owner: 10Herron)
[10:37:36] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Rebooting primary T303174
[10:37:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:44] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Rebooting primary T303174
[10:37:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:15] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db2121.codfw.wmnet with reason: Rebooting for T303174
[10:38:17] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2121.codfw.wmnet with reason: Rebooting for T303174
[10:38:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:50] <wikibugs>	 (03CR) 10Filippo Giunchedi: sre.kafka.reboot-workers: remove systemctl stop calls (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/778517 (https://phabricator.wikimedia.org/T305652) (owner: 10Herron)
[10:39:11] <wikibugs>	 (03PS3) 10Jcrespo: swift: Create a new read-only role on mw account for backup taking [puppet] - 10https://gerrit.wikimedia.org/r/773298 (https://phabricator.wikimedia.org/T269108)
[10:39:41] <wikibugs>	 (03CR) 10Filippo Giunchedi: sre.kafka.reboot-workers: add --skip-mirrormaker option (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/778325 (https://phabricator.wikimedia.org/T305652) (owner: 10Herron)
[10:39:49] <wikibugs>	 (03PS4) 10Jcrespo: swift: Create a new read-only role on mw account for backup taking [puppet] - 10https://gerrit.wikimedia.org/r/773298 (https://phabricator.wikimedia.org/T269108)
[10:39:51] <wikibugs>	 10ops-eqiad: Port with no description on access switch - https://phabricator.wikimedia.org/T304849 (10phaultfinder)
[10:40:29] <wikibugs>	 (03CR) 10Jcrespo: "Done." [puppet] - 10https://gerrit.wikimedia.org/r/773298 (https://phabricator.wikimedia.org/T269108) (owner: 10Jcrespo)
[10:41:37] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db2121.codfw.wmnet with reason: Rebooting for T303174
[10:41:39] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2121.codfw.wmnet with reason: Rebooting for T303174
[10:41:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:41:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:43:06] <wikibugs>	 (03CR) 10MVernon: [C: 03+1] "LGTM, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/773298 (https://phabricator.wikimedia.org/T269108) (owner: 10Jcrespo)
[10:43:47] <wikibugs>	 (03PS2) 10MMandere: cache::varnish: Merge repeating host data to common data [puppet] - 10https://gerrit.wikimedia.org/r/778989 (https://phabricator.wikimedia.org/T290005)
[10:44:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM overall, see inline" [puppet] - 10https://gerrit.wikimedia.org/r/778485 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[10:48:14] <icinga-wm>	 RECOVERY - SSH on aqs1009.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:53:23] <wikibugs>	 (03CR) 10MVernon: "One nit inline, and I agree with Filippo's comments, but otherwise this looks good to me, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/778485 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[10:55:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24403 and previous config saved to /var/cache/conftool/dbconfig/20220411-105525-ladsgroup.json
[10:55:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:55:30] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[11:00:39] <wikibugs>	 (03PS7) 10Zabe: swift: migrate stats_account cron to systemd timer job [puppet] - 10https://gerrit.wikimedia.org/r/778485 (https://phabricator.wikimedia.org/T273673)
[11:01:13] <wikibugs>	 (03CR) 10Zabe: swift: migrate stats_account cron to systemd timer job (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/778485 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[11:02:28] <icinga-wm>	 PROBLEM - SSH on mw2258.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:02:58] <wikibugs>	 (03CR) 10MVernon: "LGTM, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/778485 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[11:04:32] <wikibugs>	 (03CR) 10Klausman: [C: 03+2] ml-services: add plwiki, ptwiki & rowiki editquality isvcs [deployment-charts] - 10https://gerrit.wikimedia.org/r/778251 (https://phabricator.wikimedia.org/T301415) (owner: 10Kevin Bazira)
[11:05:40] <wikibugs>	 (03CR) 10MMandere: [V: 03+1] "PCC SUCCESS (NOOP 12): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34766/console" [puppet] - 10https://gerrit.wikimedia.org/r/778989 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[11:06:06] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] cache::varnish: Merge repeating host data to common data [puppet] - 10https://gerrit.wikimedia.org/r/778989 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[11:08:54] <wikibugs>	 (03PS5) 10Btullis: Configure LDAP authentication for DataHub [deployment-charts] - 10https://gerrit.wikimedia.org/r/778345 (https://phabricator.wikimedia.org/T301462)
[11:08:57] <wikibugs>	 (03CR) 10MMandere: [V: 03+1 C: 03+2] cache::varnish: Merge repeating host data to common data [puppet] - 10https://gerrit.wikimedia.org/r/778989 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[11:10:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24404 and previous config saved to /var/cache/conftool/dbconfig/20220411-111030-ladsgroup.json
[11:10:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:12:06] <icinga-wm>	 PROBLEM - Disk space on ml-staging-ctrl2002 is CRITICAL: DISK CRITICAL - free space: / 1129 MB (5% inode=95%): /tmp 1129 MB (5% inode=95%): /var/tmp 1129 MB (5% inode=95%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-staging-ctrl2002&var-datasource=codfw+prometheus/ops
[11:14:44] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to LDAP group NDA for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10jcrespo) @soworu: Apologies for the confusion- the procedure for requesting access to the Google Search Console has recently changed (2 weeks ago), as it is being now oversee...
[11:18:03] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10jcrespo)
[11:18:10] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
[11:18:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:48] <wikibugs>	 (03CR) 10Btullis: Configure LDAP authentication for DataHub (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/778345 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[11:22:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P24405 and previous config saved to /var/cache/conftool/dbconfig/20220411-112229-root.json
[11:22:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:24:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P24406 and previous config saved to /var/cache/conftool/dbconfig/20220411-112452-root.json
[11:24:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:25:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24407 and previous config saved to /var/cache/conftool/dbconfig/20220411-112536-ladsgroup.json
[11:25:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:25:42] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10jcrespo)
[11:27:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1106', diff saved to https://phabricator.wikimedia.org/P24408 and previous config saved to /var/cache/conftool/dbconfig/20220411-112741-root.json
[11:27:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P24409 and previous config saved to /var/cache/conftool/dbconfig/20220411-112825-root.json
[11:28:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:32] <wikibugs>	 (03CR) 10JMeybohm: "Apart from having just one LDAP server, this LGTM" [deployment-charts] - 10https://gerrit.wikimedia.org/r/778345 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[11:32:14] <wikibugs>	 (03CR) 10Awight: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779014 (owner: 10Awight)
[11:32:26] <wikibugs>	 (03CR) 10Jcrespo: "Any preference in key generation (method/length) on the private server? I use openssl usually." [puppet] - 10https://gerrit.wikimedia.org/r/773298 (https://phabricator.wikimedia.org/T269108) (owner: 10Jcrespo)
[11:32:33] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Add perl532-sssd [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/778683 (https://phabricator.wikimedia.org/T214343) (owner: 10BryanDavis)
[11:32:48] <wikibugs>	 (03CR) 10Awight: Remove configuration which is the same as the extension's default (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779014 (owner: 10Awight)
[11:34:56] <topranks>	 !log Adjust loopback filter on cr3-ulsfo to align with L3 switch config.  T304553.
[11:34:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:00] <stashbot>	 T304553: Unify loopback filters between CR routers and L3 switches - https://phabricator.wikimedia.org/T304553
[11:36:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T297189)', diff saved to https://phabricator.wikimedia.org/P24410 and previous config saved to /var/cache/conftool/dbconfig/20220411-113657-marostegui.json
[11:37:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:01] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[11:40:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24411 and previous config saved to /var/cache/conftool/dbconfig/20220411-114041-ladsgroup.json
[11:40:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:40:46] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[11:40:46] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[11:40:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[11:40:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:40:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:40:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24412 and previous config saved to /var/cache/conftool/dbconfig/20220411-114053-ladsgroup.json
[11:40:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:48] <topranks>	 !log Adjust loopback filter on asw1-b12-drmrs to align with CR router config.  T304553.
[11:46:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:52] <stashbot>	 T304553: Unify loopback filters between CR routers and L3 switches - https://phabricator.wikimedia.org/T304553
[11:48:04] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] hieradata: make grafana-cloud the preferred hostname [puppet] - 10https://gerrit.wikimedia.org/r/778674 (owner: 10Majavah)
[11:49:27] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "Please collect +1 from Vivian." [puppet] - 10https://gerrit.wikimedia.org/r/778673 (https://phabricator.wikimedia.org/T304716) (owner: 10Majavah)
[11:52:02] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P24413 and previous config saved to /var/cache/conftool/dbconfig/20220411-115202-marostegui.json
[11:52:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:23] <wikibugs>	 (03CR) 10Btullis: Configure LDAP authentication for DataHub (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/778345 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[11:56:31] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
[11:56:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:00] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
[12:02:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:24] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10SCherukuwada)
[12:03:29] <wikibugs>	 (03PS1) 10Zabe: snapshot: migrate adds-changes cron to systemd timer job [puppet] - 10https://gerrit.wikimedia.org/r/779016 (https://phabricator.wikimedia.org/T273673)
[12:03:31] <wikibugs>	 (03PS1) 10Zabe: snapshot: remove absented add-changes cron [puppet] - 10https://gerrit.wikimedia.org/r/779017 (https://phabricator.wikimedia.org/T273673)
[12:04:50] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10SCherukuwada)
[12:07:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P24414 and previous config saved to /var/cache/conftool/dbconfig/20220411-120707-marostegui.json
[12:07:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:43] <wikibugs>	 (03CR) 10Zabe: [V: 03+1] "PCC: https://puppet-compiler.wmflabs.org/pcc-worker1003/34768/" [puppet] - 10https://gerrit.wikimedia.org/r/779016 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[12:15:44] <icinga-wm>	 PROBLEM - Disk space on ml-staging-ctrl2002 is CRITICAL: DISK CRITICAL - free space: / 1074 MB (5% inode=95%): /tmp 1074 MB (5% inode=95%): /var/tmp 1074 MB (5% inode=95%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-staging-ctrl2002&var-datasource=codfw+prometheus/ops
[12:18:17] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10SCherukuwada) @jcrespo  I've been involved in this discussion so I know what's going on here. I've updated the ticket to reflect what they need. I can take care of providing...
[12:21:55] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10RhinosF1) @KFrancis normally confirms NDAs
[12:22:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T297189)', diff saved to https://phabricator.wikimedia.org/P24415 and previous config saved to /var/cache/conftool/dbconfig/20220411-122212-marostegui.json
[12:22:14] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[12:22:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:22:16] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[12:22:16] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[12:22:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:22:19] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+1] "Looks equivalent, thanks for picking this work back up." [puppet] - 10https://gerrit.wikimedia.org/r/779016 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[12:22:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:22:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1098:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24416 and previous config saved to /var/cache/conftool/dbconfig/20220411-122220-marostegui.json
[12:22:23] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10jcrespo) >>! In T304502#7844616, @SCherukuwada wrote: > @jcrespo  I've been involved in this discussion so I know what's going on here. I've updated the ticket to reflect wha...
[12:22:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:22] <wikibugs>	 (03CR) 10Zabe: "{{ping}} slight reminder that this still needs deployment :)" [puppet] - 10https://gerrit.wikimedia.org/r/751207 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[12:25:25] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Rebooting x2 codfw primary T303174
[12:25:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:29] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Rebooting x2 codfw primary T303174
[12:25:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:57] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting for T303174
[12:25:58] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting for T303174
[12:25:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:26:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:31:25] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:30:00 on db1151.eqiad.wmnet with reason: Rebooting for T303174
[12:31:27] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1151.eqiad.wmnet with reason: Rebooting for T303174
[12:31:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:31:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:31:33] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for drochford - https://phabricator.wikimedia.org/T305634 (10jcrespo) a:03jcrespo
[12:32:26] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:34:08] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10jcrespo)
[12:35:37] <wikibugs>	 (03PS1) 10Zabe: graphite: migrate update_graphite_index cron to systemd timer job [puppet] - 10https://gerrit.wikimedia.org/r/779022 (https://phabricator.wikimedia.org/T273673)
[12:35:39] <wikibugs>	 (03PS1) 10Zabe: graphite: remove absented update_graphite_index cron [puppet] - 10https://gerrit.wikimedia.org/r/779023 (https://phabricator.wikimedia.org/T273673)
[12:36:29] <aqu>	 !log About to deploy analytics/refinery "Migrate mediarequest hourly from Oozie to Airflow"
[12:36:30] <topranks>	 ^^^ this BGP alert was due to BFD session failing towards doh1001.  Restored without intervention about a minute later.
[12:36:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:37:39] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@f0a1656]: Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656]
[12:37:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24417 and previous config saved to /var/cache/conftool/dbconfig/20220411-123906-ladsgroup.json
[12:39:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:10] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[12:39:58] <wikibugs>	 (03CR) 10Zabe: [V: 03+1] "PCC: https://puppet-compiler.wmflabs.org/pcc-worker1003/34769/graphite2003.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/779022 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[12:44:57] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10jcrespo) Hey, @drochford,  While I check and process your access request, would you mind linking your Wikitech/LDAP account on your...
[12:47:49] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics@cae0024]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@cae0024]
[12:47:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:21] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics@cae0024]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@cae0024] (duration: 00m 32s)
[12:48:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:52] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10jcrespo)
[12:50:11] <wikibugs>	 (03CR) 10Filippo Giunchedi: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/778485 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[12:54:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24418 and previous config saved to /var/cache/conftool/dbconfig/20220411-125411-ladsgroup.json
[12:54:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:54:32] <wikibugs>	 10SRE, 10observability, 10Patch-For-Review, 10SRE Observability (FY2021/2022-Q3), 10Sustainability (Incident Followup): Most Icinga http checks ignore the URL parameter - https://phabricator.wikimedia.org/T304321 (10fgiunchedi) Apologies for the delay, python implementation looks good to me and I agree p...
[12:54:34] <sukhe>	 topranks: thanks, that's interesting though. I will check!
[12:55:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, though given how critical (hah) this plugin is I'd recommend even basic unit/integration tests" [puppet] - 10https://gerrit.wikimedia.org/r/773272 (https://phabricator.wikimedia.org/T304321) (owner: 10Jbond)
[12:56:09] <wikibugs>	 10SRE, 10observability, 10Patch-For-Review, 10SRE Observability (FY2021/2022-Q4), 10Sustainability (Incident Followup): Most Icinga http checks ignore the URL parameter - https://phabricator.wikimedia.org/T304321 (10fgiunchedi)
[12:57:28] <icinga-wm>	 PROBLEM - Disk space on ml-staging-ctrl2002 is CRITICAL: DISK CRITICAL - free space: / 1096 MB (5% inode=95%): /tmp 1096 MB (5% inode=95%): /var/tmp 1096 MB (5% inode=95%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-staging-ctrl2002&var-datasource=codfw+prometheus/ops
[12:57:34] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: hieradata: use ntp servers private ip addresses [puppet] - 10https://gerrit.wikimedia.org/r/777755 (owner: 10Majavah)
[12:57:36] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] hieradata: use puppet-enc hostname in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/778574 (https://phabricator.wikimedia.org/T295247) (owner: 10Majavah)
[12:57:43] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10jcrespo)
[12:58:02] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@f0a1656]: Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656] (duration: 20m 23s)
[12:58:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:58:29] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] hieradata: use ntp servers private ip addresses [puppet] - 10https://gerrit.wikimedia.org/r/777755 (owner: 10Majavah)
[12:59:10] <icinga-wm>	 PROBLEM - Disk space on ml-staging-ctrl2001 is CRITICAL: DISK CRITICAL - free space: / 1116 MB (5% inode=95%): /tmp 1116 MB (5% inode=95%): /var/tmp 1116 MB (5% inode=95%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-staging-ctrl2001&var-datasource=codfw+prometheus/ops
[13:01:06] <wikibugs>	 10SRE, 10Goal, 10MW-1.38-notes (1.38.0-wmf.4; 2021-10-12), 10Patch-For-Review, and 2 others: Fully migrate producers off statsd - https://phabricator.wikimedia.org/T205870 (10lmata)
[13:03:21] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@f0a1656] (thin): Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656]
[13:03:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:29] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@f0a1656] (thin): Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656] (duration: 00m 07s)
[13:03:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:09] <logmsgbot>	 !log aqu@deploy1002 Started deploy [analytics/refinery@f0a1656] (hadoop-test): Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656]
[13:04:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:52] <icinga-wm>	 RECOVERY - SSH on mw2258.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:05:05] <wikibugs>	 (03PS1) 10Jcrespo: admin: Add drochford to analytics-privatedata-users for superset [puppet] - 10https://gerrit.wikimedia.org/r/779024 (https://phabricator.wikimedia.org/T305634)
[13:05:30] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "Blocked on data engineering's ok." [puppet] - 10https://gerrit.wikimedia.org/r/779024 (https://phabricator.wikimedia.org/T305634) (owner: 10Jcrespo)
[13:06:06] <wikibugs>	 (03PS1) 10Ottomata: eventlogging - ReadingDepth schema has been deleted, don't attempt to ingest it [puppet] - 10https://gerrit.wikimedia.org/r/779025
[13:07:20] <wikibugs>	 (03CR) 10Ottomata: "'Deleting' the schema caused errors as the Hive ingestion step tried to look up the latest schema for new ReadingDepth events that were st" [puppet] - 10https://gerrit.wikimedia.org/r/779025 (owner: 10Ottomata)
[13:09:01] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventlogging - ReadingDepth schema has been deleted, don't attempt to ingest it [puppet] - 10https://gerrit.wikimedia.org/r/779025 (owner: 10Ottomata)
[13:09:04] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10jcrespo) a:05jcrespo→03Ottomata This is only blocked on Data Engineering, as owners of the service, to ap...
[13:09:13] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10jcrespo) p:05Triage→03High
[13:09:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24419 and previous config saved to /var/cache/conftool/dbconfig/20220411-130916-ladsgroup.json
[13:09:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:12] <wikibugs>	 (03PS1) 10Ladsgroup: admin: Fix real name [puppet] - 10https://gerrit.wikimedia.org/r/779026
[13:10:38] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10jcrespo) a:05TomekSikora.Monsoon→03None
[13:10:57] <wikibugs>	 (03CR) 10Ssingh: [C: 03+1] "Per discussion." [puppet] - 10https://gerrit.wikimedia.org/r/779026 (owner: 10Ladsgroup)
[13:11:10] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [analytics/refinery@f0a1656] (hadoop-test): Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656] (duration: 07m 00s)
[13:11:12] <wikibugs>	 (03PS2) 10Ladsgroup: admin: Fix real name [puppet] - 10https://gerrit.wikimedia.org/r/779026
[13:11:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:16] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] admin: Fix real name [puppet] - 10https://gerrit.wikimedia.org/r/779026 (owner: 10Ladsgroup)
[13:15:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: Blazegraph instance wdqs1013:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[13:15:08] <wikibugs>	 10SRE, 10Traffic, 10SRE Observability (FY2021/2022-Q4), 10User-fgiunchedi: Migrate Traffic Prometheus alerts from Icinga to Alertmanager - https://phabricator.wikimedia.org/T300723 (10lmata)
[13:18:29] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Configure LDAP authentication for DataHub [deployment-charts] - 10https://gerrit.wikimedia.org/r/778345 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[13:20:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) resolved: Blazegraph instance wdqs1013:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[13:20:12] <wikibugs>	 (03CR) 10MSantos: [C: 03+1] maps: Re-enable OSM sync for on eqiad master [puppet] - 10https://gerrit.wikimedia.org/r/772453 (https://phabricator.wikimedia.org/T304984) (owner: 10Jgiannelos)
[13:20:17] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (paramita_das) - https://phabricator.wikimedia.org/T305298 (10Ottomata) Hello!  Yes:  https://wikitech.wikimedia.org/wiki/SRE/Production_access#Setting_up_your_access
[13:21:47] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10drochford) >>! In T305634#7844679, @jcrespo wrote: > Hey, @drochford, >  > While I check and process your acc...
[13:22:03] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10Ottomata) Approved
[13:22:33] <wikibugs>	 (03CR) 10JMeybohm: Configure LDAP authentication for DataHub (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/778345 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[13:22:54] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10MatthewVernon) a:03MatthewVernon I'm not sure I'm going to do anything about xfs options, but I am going to start reimaging hosts to Bullseye, and going to use this task to trac...
[13:22:56] <wikibugs>	 (03Merged) 10jenkins-bot: Configure LDAP authentication for DataHub [deployment-charts] - 10https://gerrit.wikimedia.org/r/778345 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[13:23:17] <wikibugs>	 10SRE, 10Prod-Kubernetes, 10Traffic, 10serviceops, 10Kubernetes: service::catalog entries and dnsdisc for Kubernetes services under Ingress - https://phabricator.wikimedia.org/T305358 (10JMeybohm)
[13:24:17] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10MatthewVernon)
[13:24:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24420 and previous config saved to /var/cache/conftool/dbconfig/20220411-132422-ladsgroup.json
[13:24:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:26] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[13:24:28] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[13:24:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:29] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[13:24:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
[13:24:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
[13:24:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:19] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for drochford (superset access with no server access) - https://phabricator.wikimedia.org/T305634 (10jcrespo) a:05Ottomata→03jcrespo Thank you a lot, drochford. Anything that helps us process request faster...
[13:25:47] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: x2 #page on db1153 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 3256.88 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[13:25:54] * volans here
[13:25:59] <wikibugs>	 (03CR) 10Jcrespo: admin: Add drochford to analytics-privatedata-users for superset [puppet] - 10https://gerrit.wikimedia.org/r/779024 (https://phabricator.wikimedia.org/T305634) (owner: 10Jcrespo)
[13:26:05] <volans>	 isn't x2 not yet in production?
[13:26:06] <jayme>	 o/
[13:26:08] <Amir1>	 x2 is not used
[13:26:10] <marostegui>	 it is not
[13:26:11] <volans>	 kormat: ^^^
[13:26:12] <Amir1>	 don't worry
[13:26:23] <Amir1>	 I resolve it
[13:26:23] <Emperor>	 probably shouldn't be p.age enabled then :)
[13:26:30] <volans>	 was about to say the same
[13:26:33] <volans>	 #FALSE_ALARM
[13:26:43] <Emperor>	 for once I am quicker than volans! \o/
[13:26:54] <Emperor>	 ;)
[13:26:56] <volans>	 lol
[13:26:56] <topranks>	 that's an accomplishment and a half right there :)
[13:26:58] <marostegui>	 the reason why it was enabled is caused we were told months ago that it would go to production 
[13:27:12] <Amir1>	 soonTM
[13:29:13] <kormat>	 `Slave_IO_State: Waiting to reconnect after a failed master event read`
[13:29:23] <kormat>	 there's something unhappy between db1153 and db1151, which was rebooted earlier.
[13:30:08] <kormat>	 `show slave hosts` on db1151 whos db1153 ~28 times
[13:30:24] <kormat>	 i've no idea what's going on there
[13:30:29] <kormat>	 and i'm out sick
[13:30:33] <kormat>	 marostegui: can i leave this with  you?
[13:30:39] <marostegui>	 yes
[13:30:42] <kormat>	 ty <3
[13:32:37] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: x2 #page on db1153 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[13:33:16] <marostegui>	 I have fixed it, but db2143 has been down for 8h too, why is that? kormat?
[13:33:46] <marostegui>	 Ah, it is the one for the onsite maintenance
[13:33:49] <marostegui>	 Anyways
[13:36:52] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10SRE-OnFire, 10WMF-Legal: Grant Zabe access to the T302047 gdoc incident report - https://phabricator.wikimedia.org/T302163 (10jcrespo) @KFrancis can you help us confirm this (SREs don't have access to the legal ticket system).
[13:38:09] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10MatthewVernon)
[13:39:25] <wikibugs>	 (03PS1) 10Btullis: Add the codfw LDAP server to the DataHub JAAS configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/779031 (https://phabricator.wikimedia.org/T301454)
[13:40:32] <wikibugs>	 (03CR) 10Btullis: Add the codfw LDAP server to the DataHub JAAS configuration (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/779031 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[13:42:39] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Update git repo to correspond to the actual running files [wikitech-static] - 10https://gerrit.wikimedia.org/r/775396 (owner: 10Andrew Bogott)
[13:42:51] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] import-wikitech.sh: nukeNS.php --ns 8 before import [wikitech-static] - 10https://gerrit.wikimedia.org/r/775397 (owner: 10Andrew Bogott)
[13:44:06] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] Add the codfw LDAP server to the DataHub JAAS configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/779031 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[13:44:17] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Add the codfw LDAP server to the DataHub JAAS configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/779031 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[13:44:32] <wikibugs>	 (03PS1) 10Zabe: acme_chief: migrate acme-chief-designate-tidyup cron to systemd timer job [puppet] - 10https://gerrit.wikimedia.org/r/779032 (https://phabricator.wikimedia.org/T273673)
[13:44:34] <wikibugs>	 (03PS1) 10Zabe: acme_chief: remove absented acme-chief-designate-tidyup cron [puppet] - 10https://gerrit.wikimedia.org/r/779033 (https://phabricator.wikimedia.org/T273673)
[13:45:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] acme_chief: migrate acme-chief-designate-tidyup cron to systemd timer job [puppet] - 10https://gerrit.wikimedia.org/r/779032 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[13:45:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] acme_chief: remove absented acme-chief-designate-tidyup cron [puppet] - 10https://gerrit.wikimedia.org/r/779033 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[13:46:43] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] maps: Re-enable OSM sync for on eqiad master [puppet] - 10https://gerrit.wikimedia.org/r/772453 (https://phabricator.wikimedia.org/T304984) (owner: 10Jgiannelos)
[13:46:46] <wikibugs>	 (03PS2) 10Zabe: acme_chief: migrate acme-chief-designate-tidyup cron to systemd timer job [puppet] - 10https://gerrit.wikimedia.org/r/779032 (https://phabricator.wikimedia.org/T273673)
[13:48:16] <wikibugs>	 (03Merged) 10jenkins-bot: Add the codfw LDAP server to the DataHub JAAS configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/779031 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[13:48:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24421 and previous config saved to /var/cache/conftool/dbconfig/20220411-134848-marostegui.json
[13:48:49] <wikibugs>	 (03CR) 10MVernon: [C: 03+1] swift: Create a new read-only role on mw account for backup taking (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773298 (https://phabricator.wikimedia.org/T269108) (owner: 10Jcrespo)
[13:48:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:53] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[13:51:50] <icinga-wm>	 PROBLEM - SSH on aqs1009.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:52:10] <wikibugs>	 (03PS1) 10Marostegui: x2: Disable notifications for x2 DBs [puppet] - 10https://gerrit.wikimedia.org/r/779034
[13:53:05] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reimage for host ms-fe1012.eqiad.wmnet with OS bullseye
[13:53:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:08] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1001 for host ms-fe1012.eqiad.wmnet with OS bullseye
[13:53:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P24422 and previous config saved to /var/cache/conftool/dbconfig/20220411-135343-root.json
[13:53:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: Blazegraph instance wdqs1007:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[13:54:03] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] x2: Disable notifications for x2 DBs [puppet] - 10https://gerrit.wikimedia.org/r/779034 (owner: 10Marostegui)
[13:55:34] <wikibugs>	 (03Abandoned) 10Andrew Bogott: openstack:haproxy add tls for nova metadata service [puppet] - 10https://gerrit.wikimedia.org/r/732398 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott)
[13:56:51] <icinga-wm>	 PROBLEM - MariaDB read only x2 #page on db2142 is CRITICAL: CRIT: read_only: True, expected False: OK: Version 10.4.22-MariaDB-log, Uptime 5241s, event_scheduler: True, 16.60 QPS, connection latency: 0.004217s, query latency: 0.000511s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[13:57:06] <volans>	 #FALSE_ALARM ... again
[13:57:34] * volans acked on VO
[13:57:39] <sobanski>	 m.arostegui is disabling paging for it
[13:57:39] <marostegui>	 And I just pushed the codw to disable notifications
[13:57:53] <marostegui>	 Anyways, I also fixed that too
[13:59:09] <icinga-wm>	 RECOVERY - MariaDB read only x2 #page on db2142 is OK: Version 10.4.22-MariaDB-log, Uptime 5379s, read_only: False, event_scheduler: True, 16.55 QPS, connection latency: 0.004429s, query latency: 0.000496s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[13:59:21] <volans>	 thanks for the fix
[13:59:46] <marostegui>	 Also fixed db1151 which would have paged too
[14:03:00] <wikibugs>	 (03PS1) 10Btullis: Use the LDAP read-only replicas for datahub authentication [deployment-charts] - 10https://gerrit.wikimedia.org/r/779039 (https://phabricator.wikimedia.org/T301462)
[14:03:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P24423 and previous config saved to /var/cache/conftool/dbconfig/20220411-140353-marostegui.json
[14:03:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:04:15] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P24424 and previous config saved to /var/cache/conftool/dbconfig/20220411-140415-root.json
[14:04:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:25] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] Use the LDAP read-only replicas for datahub authentication [deployment-charts] - 10https://gerrit.wikimedia.org/r/779039 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[14:07:42] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
[14:07:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:08:41] <wikibugs>	 (03PS1) 10Zabe: ci: migrate gitcache crons to systemd timer jobs [puppet] - 10https://gerrit.wikimedia.org/r/779040 (https://phabricator.wikimedia.org/T273673)
[14:08:43] <wikibugs>	 (03PS1) 10Zabe: ci: remove absented gitcache crons [puppet] - 10https://gerrit.wikimedia.org/r/779041 (https://phabricator.wikimedia.org/T273673)
[14:09:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ci: migrate gitcache crons to systemd timer jobs [puppet] - 10https://gerrit.wikimedia.org/r/779040 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[14:09:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ci: remove absented gitcache crons [puppet] - 10https://gerrit.wikimedia.org/r/779041 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[14:09:42] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Use the LDAP read-only replicas for datahub authentication [deployment-charts] - 10https://gerrit.wikimedia.org/r/779039 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[14:09:56] <wikibugs>	 (03PS2) 10Zabe: acme_chief: remove absented acme-chief-designate-tidyup cron [puppet] - 10https://gerrit.wikimedia.org/r/779033 (https://phabricator.wikimedia.org/T273673)
[14:10:35] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
[14:10:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:24] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[14:11:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[14:11:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:28] <wikibugs>	 (03PS2) 10Zabe: ci: migrate gitcache crons to systemd timer jobs [puppet] - 10https://gerrit.wikimedia.org/r/779040 (https://phabricator.wikimedia.org/T273673)
[14:13:59] <wikibugs>	 (03PS2) 10Zabe: ci: remove absented gitcache crons [puppet] - 10https://gerrit.wikimedia.org/r/779041 (https://phabricator.wikimedia.org/T273673)
[14:14:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P24425 and previous config saved to /var/cache/conftool/dbconfig/20220411-141428-root.json
[14:14:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:14:36] <wikibugs>	 (03Merged) 10jenkins-bot: Use the LDAP read-only replicas for datahub authentication [deployment-charts] - 10https://gerrit.wikimedia.org/r/779039 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[14:14:47] <icinga-wm>	 PROBLEM - Host an-worker1099 is DOWN: PING CRITICAL - Packet loss = 100%
[14:15:09] <icinga-wm>	 RECOVERY - Host an-worker1099 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms
[14:17:09] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[14:17:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:13] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[14:18:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P24426 and previous config saved to /var/cache/conftool/dbconfig/20220411-141858-marostegui.json
[14:19:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:20:05] <wikibugs>	 10SRE, 10ops-eqsin, 10DC-Ops, 10Traffic: cp5002 memory errors on DIMM A4 - https://phabricator.wikimedia.org/T305423 (10RobH) p:05Triage→03Medium
[14:21:27] <wikibugs>	 10SRE, 10ops-ulsfo, 10DC-Ops, 10Traffic: ganeti4002 dimm error - https://phabricator.wikimedia.org/T303318 (10RobH) I'll chase this down today, I got the notice of processing but no shipment so I'll need to email Dell and find out what happened with this.
[14:21:40] <wikibugs>	 10SRE, 10ops-ulsfo, 10DC-Ops, 10Traffic: ganeti4002 dimm error - https://phabricator.wikimedia.org/T303318 (10RobH) p:05Medium→03High
[14:22:15] <Guest9647>	 !log powerdown ganeti2019 for relocation 
[14:22:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:39] <Lucas_WMDE>	 this might be a stupid question, but how does one actually schedule a dedicated deployment window, if you think you need one? (in this case, for a maintenance script that might need more than an hour)
[14:22:51] <wikibugs>	 (03Abandoned) 10Majavah: wmcs: toolforge: add_grid_webgrid_generic_node: fix description [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/749711 (owner: 10Majavah)
[14:22:53] <Lucas_WMDE>	 can I just add it to the deployment calendar myself? (once the calendar for this week materializes, that is ^^)
[14:23:00] <Lucas_WMDE>	 that part isn’t really clear to me from https://wikitech.wikimedia.org/wiki/Deployments/Inclusion_criteria
[14:23:14] <taavi>	 Lucas_WMDE: yes, just add it to the calendar
[14:23:20] <Lucas_WMDE>	 ok thanks :)
[14:24:49] <icinga-wm>	 PROBLEM - Host ganeti2019 is DOWN: PING CRITICAL - Packet loss = 100%
[14:26:29] <icinga-wm>	 PROBLEM - Host ganeti2019.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:27:04] <wikibugs>	 (03PS1) 10Btullis: Remove override for datahub-frontend staging egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/779045 (https://phabricator.wikimedia.org/T301462)
[14:29:25] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1101 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:31:55] <icinga-wm>	 RECOVERY - Host ganeti2019.mgmt is UP: PING OK - Packet loss = 0%, RTA = 38.78 ms
[14:33:14] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Remove override for datahub-frontend staging egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/779045 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[14:33:23] <wikibugs>	 (03PS1) 10JMeybohm: Add all members of the ops group to the deployment group [puppet] - 10https://gerrit.wikimedia.org/r/779047 (https://phabricator.wikimedia.org/T305729)
[14:33:55] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Update ldap role names [labs/private] - 10https://gerrit.wikimedia.org/r/776188 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[14:34:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24427 and previous config saved to /var/cache/conftool/dbconfig/20220411-143403-marostegui.json
[14:34:05] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[14:34:06] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[14:34:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:08] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[14:34:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24428 and previous config saved to /var/cache/conftool/dbconfig/20220411-143411-marostegui.json
[14:34:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:44] <wikibugs>	 (03PS3) 10Majavah: Rename O:ldap::labs to O:ldap::rw [puppet] - 10https://gerrit.wikimedia.org/r/776187 (https://phabricator.wikimedia.org/T295150)
[14:34:56] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1012.eqiad.wmnet with OS bullseye
[14:34:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:00] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1001 for host ms-fe1012.eqiad.wmnet with OS bullseye completed: - ms-fe1012 (**WARN**)   - Downtim...
[14:35:39] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[14:36:11] <icinga-wm>	 RECOVERY - Host ganeti2019 is UP: PING OK - Packet loss = 0%, RTA = 171.67 ms
[14:36:22] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (NOOP 2 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34770/console" [puppet] - 10https://gerrit.wikimedia.org/r/776187 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[14:37:37] <icinga-wm>	 PROBLEM - Host db2076.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:37:37] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "I guess the main thing to be careful with this is to rename any hiera files in the real private git repo." [puppet] - 10https://gerrit.wikimedia.org/r/776187 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[14:38:03] <wikibugs>	 (03Merged) 10jenkins-bot: Remove override for datahub-frontend staging egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/779045 (https://phabricator.wikimedia.org/T301462) (owner: 10Btullis)
[14:41:37] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Rename O:ldap::labs to O:ldap::rw (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/776187 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[14:43:41] <icinga-wm>	 RECOVERY - Host db2076.mgmt is UP: PING OK - Packet loss = 0%, RTA = 44.96 ms
[14:47:26] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[14:47:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:40] <wikibugs>	 (03PS1) 10JMeybohm: Switch default group for Kubernetes credentials files to deployer [puppet] - 10https://gerrit.wikimedia.org/r/779048 (https://phabricator.wikimedia.org/T305729)
[14:48:16] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] striker: Use ldap-rw hostname for ldap [puppet] - 10https://gerrit.wikimedia.org/r/776189 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[14:48:40] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "looking good" [puppet] - 10https://gerrit.wikimedia.org/r/777899 (https://phabricator.wikimedia.org/T305581) (owner: 10RLazarus)
[14:49:19] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34773/console" [puppet] - 10https://gerrit.wikimedia.org/r/779048 (https://phabricator.wikimedia.org/T305729) (owner: 10JMeybohm)
[14:49:28] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main
[14:49:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:01] <wikibugs>	 (03PS1) 10Majavah: hieradata: switch eqiad1 to use the new enc server [puppet] - 10https://gerrit.wikimedia.org/r/779049 (https://phabricator.wikimedia.org/T295247)
[14:50:09] <wikibugs>	 (03PS2) 10Andrew Bogott: dynamicproxy: remove support for x-novaproxy-edit-dns [puppet] - 10https://gerrit.wikimedia.org/r/777316 (https://phabricator.wikimedia.org/T295246) (owner: 10Majavah)
[14:50:15] <icinga-wm>	 PROBLEM - Host db2086.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:50:16] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[14:50:39] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[14:50:55] <wikibugs>	 (03PS1) 10MVernon: swift: handle new installs where there are no rings [puppet] - 10https://gerrit.wikimedia.org/r/779050
[14:52:07] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] dynamicproxy: remove support for x-novaproxy-edit-dns [puppet] - 10https://gerrit.wikimedia.org/r/777316 (https://phabricator.wikimedia.org/T295246) (owner: 10Majavah)
[14:52:19] <wikibugs>	 (03PS1) 10Majavah: P:toolforge: use puppetdb for grid hba data [puppet] - 10https://gerrit.wikimedia.org/r/779051 (https://phabricator.wikimedia.org/T153163)
[14:52:40] <logmsgbot>	 !log mvernon@cumin1001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=swift,service=nginx
[14:52:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:51] <logmsgbot>	 !log mvernon@cumin1001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=swift,service=swift-fe
[14:52:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: swift: handle new installs where there are no rings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/779050 (owner: 10MVernon)
[14:55:18] <icinga-wm>	 RECOVERY - Host db2086.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.60 ms
[14:55:57] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34774/console" [puppet] - 10https://gerrit.wikimedia.org/r/779051 (https://phabricator.wikimedia.org/T153163) (owner: 10Majavah)
[14:56:20] <wikibugs>	 (03CR) 10MVernon: swift: handle new installs where there are no rings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/779050 (owner: 10MVernon)
[14:57:23] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[14:59:48] <icinga-wm>	 PROBLEM - Host db2107.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[15:01:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[15:01:12] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[15:01:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24429 and previous config saved to /var/cache/conftool/dbconfig/20220411-150117-ladsgroup.json
[15:01:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:21] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:05:18] <icinga-wm>	 RECOVERY - Host db2107.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.54 ms
[15:05:55] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[15:07:19] <wikibugs>	 (03PS14) 10Herron: prometheus: enable prometheus web access via proxy with IDP [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944)
[15:07:42] <icinga-wm>	 PROBLEM - Host db2137.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[15:08:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] prometheus: enable prometheus web access via proxy with IDP [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944) (owner: 10Herron)
[15:11:52] <wikibugs>	 (03PS15) 10Herron: prometheus: enable prometheus web access via proxy with IDP [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944)
[15:14:22] <icinga-wm>	 RECOVERY - Host db2137.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.15 ms
[15:17:32] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[15:21:12] <wikibugs>	 (03CR) 10Herron: prometheus: enable prometheus web access via proxy with IDP (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944) (owner: 10Herron)
[15:24:54] <wikibugs>	 (03CR) 10Ahmon Dancy: Add all members of the ops group to the deployment group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/779047 (https://phabricator.wikimedia.org/T305729) (owner: 10JMeybohm)
[15:26:10] <wikibugs>	 (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779060 (https://phabricator.wikimedia.org/T128546)
[15:26:17] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 04-1] Switch default group for Kubernetes credentials files to deployer (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/779048 (https://phabricator.wikimedia.org/T305729) (owner: 10JMeybohm)
[15:26:34] <wikibugs>	 (03CR) 10Jcrespo: "Waiting for a review from someone else for merging." [puppet] - 10https://gerrit.wikimedia.org/r/779024 (https://phabricator.wikimedia.org/T305634) (owner: 10Jcrespo)
[15:27:12] <wikibugs>	 (03CR) 10Ahmon Dancy: "There's a commit message typo but I'm in favor of the change." [puppet] - 10https://gerrit.wikimedia.org/r/779047 (https://phabricator.wikimedia.org/T305729) (owner: 10JMeybohm)
[15:27:32] <icinga-wm>	 PROBLEM - Host db2147.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[15:28:00] <wikibugs>	 (03CR) 10Majavah: [C: 04-1] "'deployment' needs to be added to the special ops groups list in modules/openldap/files/cross-validate-accounts.py" [puppet] - 10https://gerrit.wikimedia.org/r/779047 (https://phabricator.wikimedia.org/T305729) (owner: 10JMeybohm)
[15:30:05] <jouncebot>	 jan_drewniak: #bothumor I � Unicode. All rise for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220411T1530).
[15:30:25] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[15:30:34] <wikibugs>	 (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779060 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[15:31:12] <wikibugs>	 (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779060 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[15:31:31] <wikibugs>	 (03PS4) 10Lucas Werkmeister (WMDE): Use wgRestAPIAdditionalRouteFiles for WB REST API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774901 (owner: 10Jakob)
[15:33:08] <logmsgbot>	 !log jdrewniak@deploy1002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:779060| Bumping portals to master (T128546)]] (duration: 00m 56s)
[15:33:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:13] <stashbot>	 T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
[15:33:50] <icinga-wm>	 RECOVERY - Host db2147.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.76 ms
[15:34:02] <logmsgbot>	 !log jdrewniak@deploy1002 Synchronized portals: Wikimedia Portals Update: [[gerrit:779060| Bumping portals to master (T128546)]] (duration: 00m 53s)
[15:34:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:35:17] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10soworu) Hi @SCherukuwada. Charlene confirmed that there's an MSA on file. According to her feedback   > "Monsoon signed out standard MSA for consulting work. It includes conf...
[15:35:33] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[15:35:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:35:36] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[15:35:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[15:35:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:35:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:35:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[15:35:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:30] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Marostegui) mysql started on db* hosts
[15:43:54] <wikibugs>	 (03PS2) 10MVernon: swift: handle new installs where there are no rings [puppet] - 10https://gerrit.wikimedia.org/r/779050
[15:44:12] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "Confirmed: https://codesearch.wmcloud.org/search/?q=KartographerUsePageLanguage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779014 (owner: 10Awight)
[15:44:35] <wikibugs>	 (03CR) 10MVernon: swift: handle new installs where there are no rings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/779050 (owner: 10MVernon)
[15:46:14] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:46:38] <icinga-wm>	 PROBLEM - Host es2029.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[15:47:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24430 and previous config saved to /var/cache/conftool/dbconfig/20220411-154725-marostegui.json
[15:47:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:30] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[15:49:32] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM overall, see inline" [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944) (owner: 10Herron)
[15:50:05] <wikibugs>	 (03PS1) 10CDanis: upload VCL: Only apply requestctl rules to external clients [puppet] - 10https://gerrit.wikimedia.org/r/779064
[15:51:25] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Ship it!" [puppet] - 10https://gerrit.wikimedia.org/r/779050 (owner: 10MVernon)
[15:53:00] <icinga-wm>	 RECOVERY - Host es2029.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.75 ms
[15:53:43] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] swift: handle new installs where there are no rings [puppet] - 10https://gerrit.wikimedia.org/r/779050 (owner: 10MVernon)
[15:54:24] <wikibugs>	 (03PS5) 10Cathal Mooney: Add template to configure IPv6 RAs on CRs and L3 Switches [homer/public] - 10https://gerrit.wikimedia.org/r/773587 (https://phabricator.wikimedia.org/T299758)
[15:56:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24431 and previous config saved to /var/cache/conftool/dbconfig/20220411-155620-ladsgroup.json
[15:56:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:24] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:58:08] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Add template to configure IPv6 RAs on CRs and L3 Switches (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/773587 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[15:58:43] <wikibugs>	 (03Merged) 10jenkins-bot: Add template to configure IPv6 RAs on CRs and L3 Switches [homer/public] - 10https://gerrit.wikimedia.org/r/773587 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[16:00:10] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[16:00:54] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] upload VCL: Only apply requestctl rules to external clients [puppet] - 10https://gerrit.wikimedia.org/r/779064 (owner: 10CDanis)
[16:01:49] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] external_clouds_vendors: Support entity types besides "cloud" [puppet] - 10https://gerrit.wikimedia.org/r/777899 (https://phabricator.wikimedia.org/T305581) (owner: 10RLazarus)
[16:02:00] <Lucas_WMDE>	 jouncebot: nowandnext
[16:02:00] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 57 minute(s)
[16:02:00] <jouncebot>	 In 0 hour(s) and 57 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220411T1700)
[16:02:15] <Lucas_WMDE>	 ok, I’ll deploy a config change that *should* only affect beta
[16:02:24] <Lucas_WMDE>	 (but it’s in a non-labs file so I’ll still test and sync it)
[16:02:27] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Use wgRestAPIAdditionalRouteFiles for WB REST API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774901 (owner: 10Jakob)
[16:02:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P24432 and previous config saved to /var/cache/conftool/dbconfig/20220411-160230-marostegui.json
[16:02:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:02:34] <wikibugs>	 (03PS2) 10Majavah: hieradata: switch eqiad1 to use the new enc server [puppet] - 10https://gerrit.wikimedia.org/r/779049 (https://phabricator.wikimedia.org/T295247)
[16:03:07] <wikibugs>	 (03Merged) 10jenkins-bot: Use wgRestAPIAdditionalRouteFiles for WB REST API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774901 (owner: 10Jakob)
[16:04:26] <Lucas_WMDE>	 testing on mwdebug1001
[16:04:42] <Lucas_WMDE>	 looks good, syncing
[16:04:55] <wikibugs>	 (03PS3) 10Majavah: hieradata: switch eqiad1 to use the new enc server [puppet] - 10https://gerrit.wikimedia.org/r/779049 (https://phabricator.wikimedia.org/T295247)
[16:05:42] <icinga-wm>	 PROBLEM - Host es2030.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:05:46] <wikibugs>	 (03CR) 10BBlack: "Right idea! But there's already such a clause (~60 lines up where's not so obvious) in the upload case.  It's the equivalent in text-front" [puppet] - 10https://gerrit.wikimedia.org/r/779064 (owner: 10CDanis)
[16:05:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[16:06:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:01] <wikibugs>	 (03CR) 10BBlack: [C: 04-1] upload VCL: Only apply requestctl rules to external clients [puppet] - 10https://gerrit.wikimedia.org/r/779064 (owner: 10CDanis)
[16:06:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[16:06:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[16:06:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[16:06:07] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:774901|Use wgRestAPIAdditionalRouteFiles for WB REST API]] (duration: 00m 51s)
[16:06:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:11] <Lucas_WMDE>	 ok, I’m done
[16:09:39] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[16:10:02] <wikibugs>	 10SRE-Access-Requests: Denial of Service due to repeated hits from a particular IP - https://phabricator.wikimedia.org/T305863 (10ERayfield)
[16:11:14] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] upload VCL: Only apply requestctl rules to external clients (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/779064 (owner: 10CDanis)
[16:11:19] <wikibugs>	 (03CR) 10Vgutierrez: upload VCL: Only apply requestctl rules to external clients [puppet] - 10https://gerrit.wikimedia.org/r/779064 (owner: 10CDanis)
[16:11:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24434 and previous config saved to /var/cache/conftool/dbconfig/20220411-161125-ladsgroup.json
[16:11:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:00] <icinga-wm>	 RECOVERY - Host es2030.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.72 ms
[16:16:08] <icinga-wm>	 RECOVERY - Check systemd state on snapshot1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:17:12] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10hnowlan)
[16:17:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P24435 and previous config saved to /var/cache/conftool/dbconfig/20220411-161735-marostegui.json
[16:17:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:01] <wikibugs>	 (03PS1) 10Vgutierrez: vcl: Fix X-Abuse-Network typo [puppet] - 10https://gerrit.wikimedia.org/r/779068 (https://phabricator.wikimedia.org/T302471)
[16:20:39] <papaul>	 !log powerdown maps2006 for relocation 
[16:20:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:12] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] vcl: Fix X-Abuse-Network typo [puppet] - 10https://gerrit.wikimedia.org/r/779068 (https://phabricator.wikimedia.org/T302471) (owner: 10Vgutierrez)
[16:23:01] <wikibugs>	 10SRE, 10Traffic: Denial of Service due to repeated hits from a particular IP - https://phabricator.wikimedia.org/T305863 (10RLazarus) Routing to #traffic to see if this is a VCL rule we're hitting.  @ERayfield Can you provide some example requests, with headers and source IP?  I'm going to preemptively make t...
[16:23:46] <icinga-wm>	 PROBLEM - Host maps2006 is DOWN: PING CRITICAL - Packet loss = 100%
[16:24:42] <icinga-wm>	 PROBLEM - Host maps2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:25:30] <icinga-wm>	 PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The following units failed: docker-reporter-releng-images.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:26:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24436 and previous config saved to /var/cache/conftool/dbconfig/20220411-162630-ladsgroup.json
[16:26:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:00] <wikibugs>	 (03PS1) 10Majavah: hieradata: switch to ldap-rw naming on ldap hosts [puppet] - 10https://gerrit.wikimedia.org/r/779071 (https://phabricator.wikimedia.org/T295150)
[16:29:06] <papaul>	 RhinosF1: maps2006 should be back up online
[16:29:26] <icinga-wm>	 RECOVERY - Host maps2006 is UP: PING OK - Packet loss = 0%, RTA = 31.61 ms
[16:29:43] <RhinosF1>	 papaul: relayed
[16:29:53] <hnowlan>	 thanks! 
[16:30:06] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (NOOP 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34775/console" [puppet] - 10https://gerrit.wikimedia.org/r/776187 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[16:30:11] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics@cae0024]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@cae0024]
[16:30:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:19] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics@cae0024]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@cae0024] (duration: 00m 08s)
[16:30:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:32] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[16:31:02] <icinga-wm>	 RECOVERY - Host maps2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.72 ms
[16:31:38] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] vcl: Fix X-Abuse-Network typo [puppet] - 10https://gerrit.wikimedia.org/r/779068 (https://phabricator.wikimedia.org/T302471) (owner: 10Vgutierrez)
[16:31:44] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34776/console" [puppet] - 10https://gerrit.wikimedia.org/r/779071 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[16:32:14] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[16:32:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24437 and previous config saved to /var/cache/conftool/dbconfig/20220411-163240-marostegui.json
[16:32:42] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[16:32:44] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[16:32:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:45] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[16:32:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24438 and previous config saved to /var/cache/conftool/dbconfig/20220411-163248-marostegui.json
[16:32:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:29] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Marostegui) mysql started on es* hosts
[16:35:11] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Marostegui) Changing the tag as our DBA part here is done. If there's anything else required, I am still subscribed to the task.
[16:36:14] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul) @Marostegui thanks
[16:41:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24439 and previous config saved to /var/cache/conftool/dbconfig/20220411-164136-ladsgroup.json
[16:41:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[16:41:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[16:41:41] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:41:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24440 and previous config saved to /var/cache/conftool/dbconfig/20220411-164144-ladsgroup.json
[16:41:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:16] <wikibugs>	 (03PS1) 10BBlack: Exclude WMF cloud IPs from generic cloud limiter [puppet] - 10https://gerrit.wikimedia.org/r/779074
[16:46:23] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Exclude WMF cloud IPs from generic cloud limiter [puppet] - 10https://gerrit.wikimedia.org/r/779074 (owner: 10BBlack)
[16:47:28] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:54:38] <icinga-wm>	 RECOVERY - SSH on aqs1009.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:55:09] <wikibugs>	 (03PS1) 10Btullis: Add a volume for the jaas-ldap configuration for datahub [deployment-charts] - 10https://gerrit.wikimedia.org/r/779077 (https://phabricator.wikimedia.org/T301454)
[16:55:27] <logmsgbot>	 !log bking@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: physically moving host
[16:55:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:55:29] <logmsgbot>	 !log bking@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: physically moving host
[16:55:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:55:34] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=dc2c981d-aef2-4a2b-9d24-2e3ca912b985) set by bking@cumin1001 for 1 day, 0:00:00 on 1 host(s) an...
[16:59:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: (2) Blazegraph instance wdqs1004:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[16:59:54] <wikibugs>	 (03PS1) 10Zabe: Start writing to cuc_actor in guwwiki and shnwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779078 (https://phabricator.wikimedia.org/T233004)
[17:00:05] <jouncebot>	 ryankemper: Your horoscope predicts another unfortunate Wikidata Query Service weekly deploy deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220411T1700).
[17:01:15] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Add a volume for the jaas-ldap configuration for datahub [deployment-charts] - 10https://gerrit.wikimedia.org/r/779077 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[17:03:30] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10akosiaris)
[17:05:22] <wikibugs>	 (03Merged) 10jenkins-bot: Add a volume for the jaas-ldap configuration for datahub [deployment-charts] - 10https://gerrit.wikimedia.org/r/779077 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[17:09:04] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2008.codfw.wmnet with reason: moving to a different rack
[17:09:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:06] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2008.codfw.wmnet with reason: moving to a different rack
[17:09:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:11] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=9620461c-f770-40dd-99d6-2b4f895a2549) set by akosiaris@cumin1001 for 2:00:00 on 1 host(s) and t...
[17:09:15] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: moving to a different rack
[17:09:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:17] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: moving to a different rack
[17:09:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:22] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=6e1e84ea-fac8-4dde-be55-1bf6ea935f75) set by akosiaris@cumin1001 for 2:00:00 on 1 host(s) and t...
[17:11:55] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc2023.codfw.wmnet with reason: moving to a different rack
[17:11:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:58] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2023.codfw.wmnet with reason: moving to a different rack
[17:11:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:12:03] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=5293ea70-e1a3-4862-ae77-82e8abf9cdd4) set by akosiaris@cumin1001 for 2:00:00 on 1 host(s) and t...
[17:12:17] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10akosiaris)
[17:14:14] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10akosiaris) I marked rdb2008, kubestage2002 and mc2023 as YES in the table. rdb2008 is the secondary, not the primary, kubestage2002 is for the staging cluster a...
[17:15:21] <wikibugs>	 (03PS16) 10Herron: prometheus: enable prometheus web access via proxy with IDP [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944)
[17:15:55] <wikibugs>	 (03CR) 10Herron: prometheus: enable prometheus web access via proxy with IDP (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/764895 (https://phabricator.wikimedia.org/T301944) (owner: 10Herron)
[17:17:10] <icinga-wm>	 PROBLEM - Host wcqs2001.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[17:17:10] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10SCherukuwada) I've given the above-mentioned e-mail address access to the two English Wikipedia domains (en.wikipedia.org and en.m.wikpedia.org).  @Jaime Crespo <jcrespo@wiki...
[17:17:56] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10bking)
[17:23:27] <icinga-wm>	 RECOVERY - Host wcqs2001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 38.20 ms
[17:23:36] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: Q3:(Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10RobH) p:05High→03Unbreak! >>! In T299443#7841687, @cmooney wrote: > FYI I believe PXE is failing for dumpsdata1006 as the DAC cable is plugged into the...
[17:24:08] <papaul>	 !log powerdown kubestage2002 for relocation 
[17:24:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:25:48] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[17:26:57] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[17:27:36] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[17:27:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:28:14] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main
[17:28:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:29:59] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul) I can not power down kuberstage2002 ` W: aborting poweroff due to 30-query-hostname exiting with code 1.
[17:31:41] <papaul>	 !log powerdown rdb2008 for relocation 
[17:31:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:33:02] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[17:34:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24442 and previous config saved to /var/cache/conftool/dbconfig/20220411-173423-marostegui.json
[17:34:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:34:27] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[17:35:13] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: Q3:(Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10RobH) p:05Unbreak!→03Medium I worked around the issue via idrac and piping output to a text file to make up for the idrac serial screen issue of not get...
[17:37:34] <icinga-wm>	 PROBLEM - Host rdb2008.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[17:37:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24443 and previous config saved to /var/cache/conftool/dbconfig/20220411-173735-ladsgroup.json
[17:37:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:37:39] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:37:43] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations: Validate all yaml files in puppet.git - https://phabricator.wikimedia.org/T305676 (10Dzahn) The Debian package [[[ https://packages.debian.org/bullseye/yamllint | yamllint ]] exists in bullseye nowadays and works.  examples:   ` /puppet/hieradata$ yamllint cloud...
[17:37:58] <icinga-wm>	 RECOVERY - Host rdb2008.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.65 ms
[17:38:44] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] "Seems reasonable, verified functionality is also in 6.5." [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570) (owner: 10Bking)
[17:41:01] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[17:42:18] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[17:43:01] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[17:43:28] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] elastic: allow waiting for yellow instead of green [cookbooks] - 10https://gerrit.wikimedia.org/r/778335 (https://phabricator.wikimedia.org/T304570) (owner: 10Ryan Kemper)
[17:45:09] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[17:47:26] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul) @hnowlan will it be possible to get me restbase2021 offline on April 14th at 9:30am CT?   thanks.
[17:48:24] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at ulsfo on alert1001 is CRITICAL: 45.14 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6
[17:49:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P24444 and previous config saved to /var/cache/conftool/dbconfig/20220411-174928-marostegui.json
[17:49:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:49:33] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "No blocker for me, but I have no context on the ES side of thing." [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570) (owner: 10Bking)
[17:52:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24445 and previous config saved to /var/cache/conftool/dbconfig/20220411-175240-ladsgroup.json
[17:52:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:03:25] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "This is not intended as a global variable. Same as the other change, it's named after the directory. Feel free to name it $configDir thoug" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778667 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[18:04:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P24446 and previous config saved to /var/cache/conftool/dbconfig/20220411-180433-marostegui.json
[18:04:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24447 and previous config saved to /var/cache/conftool/dbconfig/20220411-180745-ladsgroup.json
[18:07:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:13:22] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hieradata: switch eqiad1 to use the new enc server [puppet] - 10https://gerrit.wikimedia.org/r/779049 (https://phabricator.wikimedia.org/T295247) (owner: 10Majavah)
[18:14:28] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH) Dell suggestion some alternate arguments for the command line utility that didn't work, and then requested we open a case for them to escalate   Service Request 1090168698  Sent case # to our team...
[18:15:37] <wikibugs>	 (03PS1) 10Thiemo Kreuz (WMDE): Temporarily undeprecate EditPage::$textbox2 [core] (wmf/1.39.0-wmf.6) - 10https://gerrit.wikimedia.org/r/778641 (https://phabricator.wikimedia.org/T305028)
[18:15:56] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at ulsfo on alert1001 is OK: (C)60 le (W)70 le 70.34 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6
[18:18:22] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10KFrancis) Hi all, reconfirming as there is an MSA on file, we are covered.  Thanks!
[18:19:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24448 and previous config saved to /var/cache/conftool/dbconfig/20220411-181939-marostegui.json
[18:19:41] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[18:19:42] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[18:19:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:45] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[18:19:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1181 (T297189)', diff saved to https://phabricator.wikimedia.org/P24449 and previous config saved to /var/cache/conftool/dbconfig/20220411-181947-marostegui.json
[18:19:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:20] <wikibugs>	 (03PS1) 10Herron: kafka-mirror: startup after kafka.service, shutdown before kafka.service [puppet] - 10https://gerrit.wikimedia.org/r/779086 (https://phabricator.wikimedia.org/T305652)
[18:22:20] <wikibugs>	 (03PS1) 10Jdlrobson: Enable sticky header edit button in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779087 (https://phabricator.wikimedia.org/T304072)
[18:22:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24450 and previous config saved to /var/cache/conftool/dbconfig/20220411-182250-ladsgroup.json
[18:22:52] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[18:22:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:22:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[18:22:54] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[18:22:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:22:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:22:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24451 and previous config saved to /var/cache/conftool/dbconfig/20220411-182258-ladsgroup.json
[18:23:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:09] <wikibugs>	 (03PS2) 10Zabe: Migrate $wmfConfigDir to $wmgConfigDir [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778667 (https://phabricator.wikimedia.org/T45956)
[18:26:20] <mutante>	 !log gitlab-runners:  pausing runner-1011 in gitlab UI from accepting new jobs, then deleting instance in Horizon UI to replace it with another bullseye instance T297659
[18:26:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:23] <stashbot>	 T297659: upgrade gitlab-runners to bullseye - https://phabricator.wikimedia.org/T297659
[18:26:36] <wikibugs>	 (03PS3) 10Zabe: Migrate $wmfConfigDir to $configDir [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778667 (https://phabricator.wikimedia.org/T45956)
[18:26:50] <wikibugs>	 (03CR) 10Zabe: Migrate $wmfConfigDir to $configDir (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778667 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[18:27:02] <wikibugs>	 (03Abandoned) 10Herron: sre.kafka.reboot-workers: add --skip-mirrormaker option [cookbooks] - 10https://gerrit.wikimedia.org/r/778325 (https://phabricator.wikimedia.org/T305652) (owner: 10Herron)
[18:34:04] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] "This is excellent cleanup -- thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/778551 (owner: 10Majavah)
[18:34:40] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hieradata: switch to ldap-rw naming on ldap hosts [puppet] - 10https://gerrit.wikimedia.org/r/779071 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[18:35:39] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to google console for TomekSikora.Monsoon - https://phabricator.wikimedia.org/T304502 (10jcrespo) Thank you, waiting for Tomek Sikora to confirm access to resolve.
[18:40:03] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Enable edit links in Vector sticky header on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779091 (https://phabricator.wikimedia.org/T305878)
[18:40:20] <wikibugs>	 (03PS4) 10Andrew Bogott: openstack: remove horizon access to puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/778551 (owner: 10Majavah)
[18:42:57] <MatmaRex>	 would anyone like to merge a beta cluster config change for me, or should i schedule it for a backport window? https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/779091
[18:43:55] <taavi>	 jouncebot: nowandnext
[18:43:55] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 16 minute(s)
[18:43:55] <jouncebot>	 In 1 hour(s) and 16 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220411T2000)
[18:44:00] <taavi>	 MatmaRex: looking
[18:44:17] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-1] "I see some references to PUPPETMASTER_API and PUPPET_TABLE_MODE in the horizon code, so that needs cleaning up before we can merge this. l" [puppet] - 10https://gerrit.wikimedia.org/r/778551 (owner: 10Majavah)
[18:44:48] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] Enable edit links in Vector sticky header on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779091 (https://phabricator.wikimedia.org/T305878) (owner: 10Bartosz Dziewoński)
[18:45:26] <wikibugs>	 (03Merged) 10jenkins-bot: Enable edit links in Vector sticky header on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779091 (https://phabricator.wikimedia.org/T305878) (owner: 10Bartosz Dziewoński)
[18:45:48] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-1] "ok, I imagine that's in https://gerrit.wikimedia.org/r/c/openstack/horizon/wmf-puppet-dashboard/+/778616 which I haven't read yet" [puppet] - 10https://gerrit.wikimedia.org/r/778551 (owner: 10Majavah)
[18:46:14] <taavi>	 MatmaRex: pulled to deploy1002 but not syncing since it only touches a -labs.php file, it should make its way to beta within the next 30 mins or so
[18:46:30] <MatmaRex>	 thanks taavi!
[18:48:34] <wikibugs>	 (03CR) 10Majavah: openstack: remove horizon access to puppetmaster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/778551 (owner: 10Majavah)
[18:49:28] <wikibugs>	 10SRE, 10Performance-Team, 10Traffic: Enable HTTP compression for arclamp trace logs - https://phabricator.wikimedia.org/T305783 (10Krinkle) p:05Triage→03Medium a:03dpifke
[18:52:48] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[18:52:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[18:52:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[18:52:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[18:53:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:53:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:17] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10Andrew) FYI, @ayounsi, our mid-term goal is to eliminate the need for this hardware entirely.  - Wikitech needs to move to the mediawiki cluste...
[18:56:50] <wikibugs>	 (03Abandoned) 10Jdlrobson: Enable sticky header edit button in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779087 (https://phabricator.wikimedia.org/T304072) (owner: 10Jdlrobson)
[18:59:51] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to <wmf group> for <Elena Lappen> - https://phabricator.wikimedia.org/T297652 (10Zabe)
[19:00:02] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Bernard Wang - https://phabricator.wikimedia.org/T279014 (10Zabe)
[19:00:07] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Clare Ming - https://phabricator.wikimedia.org/T278265 (10Zabe)
[19:00:16] <wikibugs>	 10SRE, 10LDAP-Access-Requests: LDAP access for Till Mletzko - https://phabricator.wikimedia.org/T267744 (10Zabe)
[19:00:27] <logmsgbot>	 !log volans@cumin1001 START - Cookbook sre.hosts.downtime for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: testing spicerack
[19:00:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:29] <logmsgbot>	 !log volans@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: testing spicerack
[19:00:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:02:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T297189)', diff saved to https://phabricator.wikimedia.org/P24452 and previous config saved to /var/cache/conftool/dbconfig/20220411-190257-marostegui.json
[19:03:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:03:02] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[19:07:04] <icinga-wm>	 PROBLEM - Check systemd state on mirror1001 is CRITICAL: CRITICAL - degraded: The following units failed: update-tails-mirror.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:08:44] <logmsgbot>	 !log volans@cumin1001 START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: testing spicerack
[19:08:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:08:46] <logmsgbot>	 !log volans@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: testing spicerack
[19:08:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:09:20] <icinga-wm>	 PROBLEM - Host rdb2008 is DOWN: PING CRITICAL - Packet loss = 100%
[19:09:43] <mutante>	 !log gitlab - deleting runner-1011, creating new runner runner-1022 using bullseye
[19:09:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:11:56] <icinga-wm>	 PROBLEM - SSH on mw2258.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:12:46] <icinga-wm>	 ACKNOWLEDGEMENT - SSH on mw2258.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn https://phabricator.wikimedia.org/T283582 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:17:16] <mutante>	 !log runner-1022.gitlab-runners - rm -rf /var/lib/puppet/ssl ; run puppet; sign new request on gitlab-runners-puppetmaster-01.gitlab-runners (normal procedure needed when creating fresh instance in project with local puppetmaster) T297659
[19:17:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:20] <stashbot>	 T297659: upgrade gitlab-runners to bullseye - https://phabricator.wikimedia.org/T297659
[19:17:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24453 and previous config saved to /var/cache/conftool/dbconfig/20220411-191738-ladsgroup.json
[19:17:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:41] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[19:17:56] <icinga-wm>	 PROBLEM - SSH on wtp1035.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:18:02] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P24454 and previous config saved to /var/cache/conftool/dbconfig/20220411-191802-marostegui.json
[19:18:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:20:38] <icinga-wm>	 PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv4: Active - Anycast, AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast, AS64605/IPv6: Active - Anycast, AS64605/IPv6: Active - Anycast, AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast, AS64605/IPv4: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_
[19:26:50] <wikibugs>	 (03PS1) 10Majavah: hieradata: fix value type for devtools [puppet] - 10https://gerrit.wikimedia.org/r/779095
[19:28:08] <wikibugs>	 (03PS1) 10Ottomata: Add gmodena to analytics-research-admins for airflow access [puppet] - 10https://gerrit.wikimedia.org/r/779096 (https://phabricator.wikimedia.org/T305880)
[19:29:39] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Add gmodena to analytics-research-admins for airflow access [puppet] - 10https://gerrit.wikimedia.org/r/779096 (https://phabricator.wikimedia.org/T305880) (owner: 10Ottomata)
[19:31:39] <wikibugs>	 (03PS3) 10Majavah: P:openldap: remove 'labs' branding [puppet] - 10https://gerrit.wikimedia.org/r/776191 (https://phabricator.wikimedia.org/T295150)
[19:32:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24456 and previous config saved to /var/cache/conftool/dbconfig/20220411-193243-ladsgroup.json
[19:32:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:32:55] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34778/console" [puppet] - 10https://gerrit.wikimedia.org/r/776191 (https://phabricator.wikimedia.org/T295150) (owner: 10Majavah)
[19:33:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P24457 and previous config saved to /var/cache/conftool/dbconfig/20220411-193307-marostegui.json
[19:33:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:42:21] <wikibugs>	 (03PS1) 10Dzahn: gitlab_runner: solve race condition to to make things work on first run [puppet] - 10https://gerrit.wikimedia.org/r/779099
[19:43:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] gitlab_runner: solve race condition to to make things work on first run [puppet] - 10https://gerrit.wikimedia.org/r/779099 (owner: 10Dzahn)
[19:47:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24458 and previous config saved to /var/cache/conftool/dbconfig/20220411-194748-ladsgroup.json
[19:47:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T297189)', diff saved to https://phabricator.wikimedia.org/P24459 and previous config saved to /var/cache/conftool/dbconfig/20220411-194812-marostegui.json
[19:48:14] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[19:48:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:16] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[19:48:16] <stashbot>	 T297189: Schema change for dropping ft_title and ft_namespace - https://phabricator.wikimedia.org/T297189
[19:48:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:42] <wikibugs>	 (03PS1) 10Cathal Mooney: Modify homer automation for IPv6 RAs to allow for custom interfaces [homer/public] - 10https://gerrit.wikimedia.org/r/779100 (https://phabricator.wikimedia.org/T299758)
[19:49:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Modify homer automation for IPv6 RAs to allow for custom interfaces [homer/public] - 10https://gerrit.wikimedia.org/r/779100 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[19:52:08] <wikibugs>	 (03PS2) 10Cathal Mooney: Modify homer automation for IPv6 RAs to allow for custom interfaces [homer/public] - 10https://gerrit.wikimedia.org/r/779100 (https://phabricator.wikimedia.org/T299758)
[19:52:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Modify homer automation for IPv6 RAs to allow for custom interfaces [homer/public] - 10https://gerrit.wikimedia.org/r/779100 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[19:53:40] <wikibugs>	 (03PS2) 10Dzahn: gitlab_runner: solve race condition to to make things work on first run [puppet] - 10https://gerrit.wikimedia.org/r/779099
[19:55:12] <wikibugs>	 (03CR) 10Bking: elastic: don't wait for green on first node (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570) (owner: 10Bking)
[19:56:15] <wikibugs>	 (03PS3) 10Cathal Mooney: Modify homer automation for IPv6 RAs to allow for custom interfaces [homer/public] - 10https://gerrit.wikimedia.org/r/779100 (https://phabricator.wikimedia.org/T299758)
[19:57:37] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Modify homer automation for IPv6 RAs to allow for custom interfaces [homer/public] - 10https://gerrit.wikimedia.org/r/779100 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[19:58:17] <wikibugs>	 (03Merged) 10jenkins-bot: Modify homer automation for IPv6 RAs to allow for custom interfaces [homer/public] - 10https://gerrit.wikimedia.org/r/779100 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[20:00:04] <jouncebot>	 RoanKattouw, Urbanecm, and cjming: That opportune time is upon us again. Time for a UTC late backport window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220411T2000).
[20:00:04] <jouncebot>	 zabe: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:28] <urbanecm>	 hey zabe 
[20:00:29] <urbanecm>	 around?
[20:00:51] <zabe>	 o/
[20:00:52] <zabe>	 hey
[20:01:41] <urbanecm>	 zabe: ad https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/779078, may i know why those two wikis (why not testwiki instead, for example)?
[20:02:02] <icinga-wm>	 RECOVERY - Check systemd state on mirror1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:02:03] <urbanecm>	 (and i also want to double check https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/773650 isn't needed for that patch to work)
[20:02:53] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Migrate $wmfUsingKubernetes to $wmgUsingKubernetes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776255 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[20:02:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24460 and previous config saved to /var/cache/conftool/dbconfig/20220411-200253-ladsgroup.json
[20:02:55] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[20:02:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[20:02:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:02:59] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:03:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24461 and previous config saved to /var/cache/conftool/dbconfig/20220411-200301-ladsgroup.json
[20:03:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:04:03] <wikibugs>	 (03Merged) 10jenkins-bot: Migrate $wmfUsingKubernetes to $wmgUsingKubernetes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776255 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[20:04:18] <zabe>	 urbanecm, very pragmatic. Those two are the only ones with the new column. Hope thats fine?
[20:04:30] <urbanecm>	 zabe: oh, i thought we added it to all wikis :)
[20:04:47] <urbanecm>	 sure, that's good enough
[20:05:28] <zabe>	 the dba task is open. The only reason these two have the column, is that they are new (created after the db change got merged).
[20:06:22] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "verified those two wikis have the new column (while others don't)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779078 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[20:06:34] <urbanecm>	 zabe: ah, makes sense. so we're testing early, basically
[20:06:36] <mutante>	 awards IRC barnstar to Zabe for working on a ticket from 2013
[20:06:42] * urbanecm awards a second one
[20:07:08] <urbanecm>	 zabe: `Migrate $wmfUsingKubernetes to $wmgUsingKubernetes` is now at mwdebug1001 if you can take a look?
[20:07:24] <mutante>	 "why are all the variables named after the foundation" 
[20:07:57] <urbanecm>	 zabe: also, if you've some time after the deployments, I can have a look at T305014 too. fine if not, we can do it later.
[20:07:58] <stashbot>	 T305014: Run PopulateCentralId on metawiki - https://phabricator.wikimedia.org/T305014
[20:08:27] <zabe>	 :)
[20:08:36] <zabe>	 urbanecm, that would be cool, I have time
[20:08:46] <urbanecm>	 okay, let's do the deployments and then the script :)
[20:08:48] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:08:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:08:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:08:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:08:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:59] <urbanecm>	 let me know how the kubernetes patch is doing
[20:09:39] <zabe>	 urbanecm, lgtm
[20:09:44] <urbanecm>	 syncing
[20:11:11] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/: d4ff32f: Migrate $wmfUsingKubernetes to $wmgUsingKubernetes (T45956) (duration: 00m 53s)
[20:11:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:11:15] <stashbot>	 T45956: Rename $wmf* to $wmg* in wmf-config - https://phabricator.wikimedia.org/T45956
[20:11:16] <urbanecm>	 and, it's live
[20:11:21] <wikibugs>	 (03PS3) 10Urbanecm: Stop writing to $wmfUsingKubernetes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776256 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[20:11:39] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Stop writing to $wmfUsingKubernetes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776256 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[20:12:21] <wikibugs>	 (03Merged) 10jenkins-bot: Stop writing to $wmfUsingKubernetes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776256 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[20:12:31] <wikibugs>	 (03PS1) 10Cathal Mooney: Remove IPv6 RA config on cr2-drmrs fxp0.0 [homer/public] - 10https://gerrit.wikimedia.org/r/779101 (https://phabricator.wikimedia.org/T299758)
[20:12:53] <urbanecm>	 zabe: pulled to mwdebug1001, but i doubt it's testable
[20:13:31] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Remove IPv6 RA config on cr2-drmrs fxp0.0 [homer/public] - 10https://gerrit.wikimedia.org/r/779101 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[20:13:59] <zabe>	 urbanecm, yeah, I can confirm that it doesn't let the site explode, I don't think either that I can test more
[20:14:09] <urbanecm>	 in that case, syncing :)
[20:14:25] <wikibugs>	 (03PS2) 10Urbanecm: Start writing to cuc_actor in guwwiki and shnwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779078 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[20:14:32] <wikibugs>	 (03Merged) 10jenkins-bot: Remove IPv6 RA config on cr2-drmrs fxp0.0 [homer/public] - 10https://gerrit.wikimedia.org/r/779101 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[20:15:31] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/CommonSettings.php: 8455fa0: Stop writing to $wmfUsingKubernetes (T45956) (duration: 00m 51s)
[20:15:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:14] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Start writing to cuc_actor in guwwiki and shnwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779078 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[20:17:07] <wikibugs>	 (03Merged) 10jenkins-bot: Start writing to cuc_actor in guwwiki and shnwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779078 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[20:17:16] <wikibugs>	 (03PS3) 10Dzahn: gitlab_runner: solve race condition to to make things work on first run [puppet] - 10https://gerrit.wikimedia.org/r/779099
[20:17:46] <urbanecm>	 zabe: pulled to mwdebug1001. i guess i'Ll need to help with this one, right?
[20:17:56] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://phabricator.wikimedia.org/P24455" [puppet] - 10https://gerrit.wikimedia.org/r/779099 (owner: 10Dzahn)
[20:18:26] <urbanecm>	 zabe: i think that one can be tested by making an edit, checking the table and checking the CU interface, is that right?
[20:19:04] <icinga-wm>	 RECOVERY - SSH on wtp1035.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:19:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:19:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:19:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:19:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:19:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:33] <zabe>	 urbanecm, I did https://guw.wikipedia.org/w/index.php?title=Zinzant%E1%BB%8D:Zabe/Test&oldid=19230 . There should be an entry in cu_changes for that I guess. Could you check whether the actor id is correct?
[20:20:10] <urbanecm>	       cuc_user: 3
[20:20:10] <urbanecm>	  cuc_user_text: Zabe
[20:20:10] <urbanecm>	      cuc_actor: 3
[20:20:13] <urbanecm>	 sounds about right
[20:22:17] <zabe>	 urbanecm, same for shnwikivoyage. I guess if that looks good we can sync it, I will keep an eye on logstash, to make sure no fatals occur?
[20:22:25] <urbanecm>	 sounds good to me
[20:22:49] <urbanecm>	       cuc_user: 2
[20:22:49] <urbanecm>	  cuc_user_text: Zabe
[20:22:49] <urbanecm>	      cuc_actor: 2
[20:22:52] <urbanecm>	 this is shnwikivoyage
[20:22:56] <urbanecm>	 also looks correct
[20:22:59] <urbanecm>	 zabe: so, sync?
[20:23:12] <zabe>	 I would say :)
[20:23:25] <urbanecm>	 doing :)
[20:24:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:24:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:24:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:24:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:24:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:40] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 17c8c17: Start writing to cuc_actor in guwwiki and shnwikivoyage (T233004) (duration: 00m 51s)
[20:24:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:44] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[20:24:56] <urbanecm>	 zabe: and, it's live
[20:25:03] <urbanecm>	 so i guess it's time for the script :))
[20:25:13] <zabe>	 yes :)
[20:26:15] <urbanecm>	 zabe: do you have a guess about how quick it should be?
[20:27:49] <zabe>	 tbh, not really. The largest id is ~3.000.000 and there are currently ~200.000 entries, so it shouldn't take /that/ long
[20:28:05] <zabe>	 maybe an hour?
[20:28:49] <urbanecm>	 okay
[20:28:52] <urbanecm>	 let's hope :)
[20:29:09] <urbanecm>	 let me run it for ~100 rows first
[20:29:29] <zabe>	 like half of the current entries should already have the new column populated
[20:29:42] <urbanecm>	 that's cool
[20:33:12] <wikibugs>	 (03CR) 10Dzahn: "tested with new instance runner-1023. No more errors on first puppet run, it works right away with a single run now after applying profile" [puppet] - 10https://gerrit.wikimedia.org/r/779099 (owner: 10Dzahn)
[20:35:17] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Exclude WMF cloud IPs from generic cloud limiter [puppet] - 10https://gerrit.wikimedia.org/r/779074 (owner: 10BBlack)
[20:35:45] <urbanecm>	 zabe: I'm trying to get the script to write something, and i'm failing to. i ran `mwscript extensions/GlobalBlocking/maintenance/PopulateCentralId.php --wiki=metawiki --batch-size=100` with a break after the first batch (to be able to verify). it said `Completed migration, updated 1 row(s), migration failed for 0 row(s).`
[20:35:59] <urbanecm>	 but...there are no blocks with gb_id <= 100
[20:36:29] <zabe>	 yeah, there are no blocks with gb_id <= 100, because expired global blocks get purged from the db
[20:36:42] <urbanecm>	 but why does it say it updated 1 row?
[20:38:21] <wikibugs>	 (03PS2) 10Phedenskog: grafana: double-proxy for performance JSON meta data [puppet] - 10https://gerrit.wikimedia.org/r/778469 (https://phabricator.wikimedia.org/T304583)
[20:38:41] <zabe>	 ehm
[20:38:42] <urbanecm>	 zabe: i tried higher batch sizes too (enough to hit the lowest block ID of 4157), and while it still says it updated 1 row, it does run the update
[20:39:08] <wikibugs>	 (03PS3) 10Phedenskog: grafana: double-proxy for performance JSON meta data [puppet] - 10https://gerrit.wikimedia.org/r/778469 (https://phabricator.wikimedia.org/T304583)
[20:39:34] <zabe>	 ok, actually I remember the update count to be wrong on beta aswell, it always said 70, while there are only like ~10 global blocks in beta
[20:39:42] <urbanecm>	 interesting
[20:39:55] <wikibugs>	 (03CR) 10Phedenskog: grafana: double-proxy for performance JSON meta data (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/778469 (https://phabricator.wikimedia.org/T304583) (owner: 10Phedenskog)
[20:40:26] <urbanecm>	 https://phabricator.wikimedia.org/P24462 is the updates i have
[20:40:35] <urbanecm>	 they look good to me
[20:41:05] <zabe>	 yep, value is correct
[20:42:23] <urbanecm>	 zabe: at least :). I'll run it in full then (unless you have any objections, of course).
[20:42:40] <zabe>	 no objections from me :)
[20:42:56] <urbanecm>	 running
[20:43:34] <urbanecm>	 i'm curious about the update count though (not that it's the most important part, it's really just curiosity)
[20:45:19] <urbanecm>	 !log [urbanecm@mwmaint1002 ~]$ mwscript extensions/GlobalBlocking/maintenance/PopulateCentralId.php --wiki=metawiki # START, T305014, running in a tmux under my account at mwmaint1002
[20:45:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:45:22] <stashbot>	 T305014: Run PopulateCentralId on metawiki - https://phabricator.wikimedia.org/T305014
[20:46:22] <Amir1>	 urbanecm: no no 
[20:46:29] <Amir1>	 the new column is not in production yet
[20:46:33] <Amir1>	 zabe: ^
[20:46:33] <urbanecm>	 Amir1: it is
[20:46:56] <urbanecm>	 Amir1: see https://phabricator.wikimedia.org/P24462
[20:47:10] <urbanecm>	 (script stopped)
[20:47:16] <Amir1>	 I'm talking about cuc_actor
[20:47:44] <zabe>	 Amir1, the wikis are new and got created after the db patch
[20:47:47] <Amir1>	 https://phabricator.wikimedia.org/T303603
[20:47:53] <urbanecm>	 i see it there as well https://www.irccloud.com/pastebin/Ft5v42is/
[20:48:04] <Amir1>	 zabe: aaah
[20:48:11] <Amir1>	 that's smart
[20:48:20] <Amir1>	 okay then
[20:48:22] <urbanecm>	 but if you prefer to have the column unused until it's everywhere, i can revert the patch, no problem
[20:48:24] <zabe>	 ;)
[20:48:33] * urbanecm was confused at first too
[20:48:45] <Amir1>	 urbanecm: nah, it's fine. As long as it doesn't break the wiki
[20:49:01] <urbanecm>	 okay :). we tested that, fortunately.
[20:49:10] <urbanecm>	 ok to restart the PopulateCentralId script too?
[20:49:13] <Amir1>	 ofc
[20:49:28] <urbanecm>	 thanks
[20:49:50] <zabe>	 I now 'abuse' those two wikis as testing environment, since there is no checkuser on beta ¯\_(ツ)_/¯
[20:50:38] <urbanecm>	 PopulateCentralId restarted
[20:58:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24463 and previous config saved to /var/cache/conftool/dbconfig/20220411-205844-ladsgroup.json
[20:58:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:58:50] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:59:16] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: (2) Blazegraph instance wdqs1004:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[21:00:04] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: My dear minions, it's time we take the moon! Just kidding. Time for Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220411T2100).
[21:02:38] <urbanecm>	 !log [urbanecm@mwmaint1002 ~]$ mwscript extensions/GlobalBlocking/maintenance/PopulateCentralId.php --wiki=metawiki # END, T305014
[21:02:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:02:41] <stashbot>	 T305014: Run PopulateCentralId on metawiki - https://phabricator.wikimedia.org/T305014
[21:02:44] <urbanecm>	 zabe: it finished. was quicker than i expected
[21:02:54] <zabe>	 ah
[21:02:55] <zabe>	 nice
[21:03:15] <urbanecm>	 zabe: do you need/want the script's output? or is the new DB content good enough?
[21:03:56] <zabe>	 maybe you could paste the output, but more importantly could double check that entries with gb_by_central_id = null are left?
[21:04:08] <zabe>	 * that no entries are left
[21:04:22] <urbanecm>	 `select gb_id from globalblocks where gb_by_central_id is null order by gb_id limit 1` returns no rows
[21:04:40] <zabe>	 awesome, thanks for your help :)
[21:04:49] <urbanecm>	 no problem
[21:05:46] <urbanecm>	 zabe: linked output from https://phabricator.wikimedia.org/T305014#7846543 and resolved the task :). lmk if anything more's necessary here
[21:06:59] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "lgtm and all for it. especially like that command lines stay exactly the same. the only thing that keeps me from compiling and merging mys" [puppet] - 10https://gerrit.wikimedia.org/r/779040 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[21:07:55] <zabe>	 yes :)
[21:08:11] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] hieradata: fix value type for devtools [puppet] - 10https://gerrit.wikimedia.org/r/779095 (owner: 10Majavah)
[21:09:43] <wikibugs>	 (03PS1) 10JHathaway: mx: reject email to legacy mailing list domains [puppet] - 10https://gerrit.wikimedia.org/r/779128 (https://phabricator.wikimedia.org/T280472)
[21:11:46] <wikibugs>	 (03CR) 10JHathaway: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34783/console" [puppet] - 10https://gerrit.wikimedia.org/r/779128 (https://phabricator.wikimedia.org/T280472) (owner: 10JHathaway)
[21:12:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] hieradata: fix value type for devtools [puppet] - 10https://gerrit.wikimedia.org/r/779095 (owner: 10Majavah)
[21:13:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24466 and previous config saved to /var/cache/conftool/dbconfig/20220411-211350-ladsgroup.json
[21:13:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:14:20] <icinga-wm>	 RECOVERY - SSH on mw2258.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:15:38] <wikibugs>	 (03CR) 10Bking: elastic: don't wait for green on first node (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570) (owner: 10Bking)
[21:15:54] <wikibugs>	 (03CR) 10JHathaway: [V: 03+2] "https://puppet-compiler.wmflabs.org/pcc-worker1001/34783/mx2001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/779128 (https://phabricator.wikimedia.org/T280472) (owner: 10JHathaway)
[21:17:20] <wikibugs>	 (03CR) 10JHathaway: [V: 03+2 C: 03+2] mx: reject email to legacy mailing list domains [puppet] - 10https://gerrit.wikimedia.org/r/779128 (https://phabricator.wikimedia.org/T280472) (owner: 10JHathaway)
[21:20:31] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] external_clouds_vendors: Support entity types besides "cloud" [puppet] - 10https://gerrit.wikimedia.org/r/777899 (https://phabricator.wikimedia.org/T305581) (owner: 10RLazarus)
[21:21:03] <rzl>	 jhathaway: okay to merge yours?
[21:21:09] <jhathaway>	 yup, thanks
[21:21:25] <rzl>	 done
[21:21:30] <jhathaway>	 thanks
[21:28:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24467 and previous config saved to /var/cache/conftool/dbconfig/20220411-212855-ladsgroup.json
[21:28:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:33:31] <wikibugs>	 (03CR) 10Dzahn: "compiled, ready to merge this but I would like someone around to confirm everything is working as expected after this major version change" [puppet] - 10https://gerrit.wikimedia.org/r/768774 (https://phabricator.wikimedia.org/T300682) (owner: 10Dduvall)
[21:42:29] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack: remove horizon access to puppetmaster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/778551 (owner: 10Majavah)
[21:44:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24468 and previous config saved to /var/cache/conftool/dbconfig/20220411-214400-ladsgroup.json
[21:44:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[21:44:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:44:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[21:44:04] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[21:44:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:44:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24469 and previous config saved to /var/cache/conftool/dbconfig/20220411-214408-ladsgroup.json
[21:44:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:44:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:52:02] <icinga-wm>	 PROBLEM - Host mw1334 is DOWN: PING CRITICAL - Packet loss = 100%
[21:54:22] <icinga-wm>	 RECOVERY - Host mw1334 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms
[22:01:54] <wikibugs>	 10SRE, 10ConfirmEdit (CAPTCHA extension), 10MediaWiki-extensions-CentralAuth, 10Platform Engineering, and 6 others: Allow Stewards to enable 'emergency CAPTCHAs' for anonymous IP edits - https://phabricator.wikimedia.org/T303433 (10Zabe)
[22:04:08] <wikibugs>	 (03PS14) 10Bking: elastic: don't wait for green on first node [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570)
[22:29:03] <wikibugs>	 10SRE, 10SRE-OnFire, 10observability: Internationalization (i18n) & localization (l10n) of www.wikimediastatus.net - https://phabricator.wikimedia.org/T305896 (10CDanis)
[22:35:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24470 and previous config saved to /var/cache/conftool/dbconfig/20220411-223530-ladsgroup.json
[22:35:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:35:37] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[22:50:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24471 and previous config saved to /var/cache/conftool/dbconfig/20220411-225035-ladsgroup.json
[22:50:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:05:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24472 and previous config saved to /var/cache/conftool/dbconfig/20220411-230540-ladsgroup.json
[23:05:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:09:01] <wikibugs>	 10SRE, 10ops-codfw: Dell switches testing - https://phabricator.wikimedia.org/T290133 (10Papaul) p:05Triage→03Medium
[23:12:23] <wikibugs>	 10SRE, 10ops-codfw, 10Discovery: elastic2033 without bootable devices available (repeat of T281621) - https://phabricator.wikimedia.org/T305646 (10Papaul) p:05Triage→03Medium a:03Papaul
[23:20:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24473 and previous config saved to /var/cache/conftool/dbconfig/20220411-232045-ladsgroup.json
[23:20:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[23:20:49] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[23:20:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:20:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[23:20:52] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[23:20:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:20:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[23:20:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:20:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:21:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:21:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24474 and previous config saved to /var/cache/conftool/dbconfig/20220411-232102-ladsgroup.json
[23:21:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:33:20] <icinga-wm>	 PROBLEM - SSH on labweb1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:38:34] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] "Good to go." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778667 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[23:44:18] <icinga-wm>	 RECOVERY - Host elastic2033 is UP: PING OK - Packet loss = 0%, RTA = 33.87 ms
[23:47:56] <wikibugs>	 10SRE, 10ops-codfw, 10Discovery: elastic2033 without bootable devices available (repeat of T281621) - https://phabricator.wikimedia.org/T305646 (10Papaul) 05Open→03Resolved Boot was set to UEFI for some reason. I changed it back to Legacy BIOS, system is back online
[23:49:02] <wikibugs>	 10SRE, 10ops-codfw, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs2001-aqs2012 - https://phabricator.wikimedia.org/T305568 (10Eevans)
[23:49:37] <wikibugs>	 10SRE, 10ops-eqiad, 10Cassandra, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install aqs1016-aqs1021 - https://phabricator.wikimedia.org/T305570 (10Eevans)