[00:02:05] <icinga-wm>	 PROBLEM - Maps tiles generation on alert1001 is CRITICAL: CRITICAL: 100.00% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/d/000000305/maps-performances?orgId=1&viewPanel=8
[00:02:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24539 and previous config saved to /var/cache/conftool/dbconfig/20220413-000258-ladsgroup.json
[00:03:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:12:01] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10Papaul) @nskaggs @Andrew @aborrero @dcaro the goal for codfw is to consolidate all cloudx-dev nodes in a single rack see (T305469) and the racking...
[00:18:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24540 and previous config saved to /var/cache/conftool/dbconfig/20220413-001803-ladsgroup.json
[00:18:05] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[00:18:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:07] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[00:18:08] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[00:18:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24541 and previous config saved to /var/cache/conftool/dbconfig/20220413-001811-ladsgroup.json
[00:18:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:21:11] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:25:41] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl2001 is CRITICAL: instance=10.192.32.33 verb=POST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[00:27:57] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[00:44:36] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.reimage for host elastic2033.codfw.wmnet with OS stretch
[00:44:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:44:41] <wikibugs>	 10SRE, 10ops-codfw, 10Discovery: elastic2033 without bootable devices available (repeat of T281621) - https://phabricator.wikimedia.org/T305646 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin1001 for host elastic2033.codfw.wmnet with OS stretch
[00:59:44] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
[00:59:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:03:11] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
[01:03:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:12:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24542 and previous config saved to /var/cache/conftool/dbconfig/20220413-011204-ladsgroup.json
[01:12:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:12:08] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[01:23:29] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:23:40] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2033.codfw.wmnet with OS stretch
[01:23:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:26:23] <wikibugs>	 10SRE, 10ops-codfw, 10Discovery: elastic2033 without bootable devices available (repeat of T281621) - https://phabricator.wikimedia.org/T305646 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin1001 for host elastic2033.codfw.wmnet with OS stretch completed: - elastic2033...
[01:27:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24544 and previous config saved to /var/cache/conftool/dbconfig/20220413-012709-ladsgroup.json
[01:27:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:38:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:42:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24545 and previous config saved to /var/cache/conftool/dbconfig/20220413-014214-ladsgroup.json
[01:42:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:43:45] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:57:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24546 and previous config saved to /var/cache/conftool/dbconfig/20220413-015719-ladsgroup.json
[01:57:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[01:57:22] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[01:57:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:57:24] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[01:57:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:57:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:57:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24547 and previous config saved to /var/cache/conftool/dbconfig/20220413-015727-ladsgroup.json
[01:57:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:32:54] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[02:36:41] <icinga-wm>	 PROBLEM - SSH on wtp1048.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:53:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24548 and previous config saved to /var/cache/conftool/dbconfig/20220413-025350-ladsgroup.json
[02:53:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:53:55] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[03:01:31] <icinga-wm>	 PROBLEM - Persistent high iowait on labstore1006 is CRITICAL: 66.52 ge 10 https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/d/000000568/labstore1004-1005-1006-1007
[03:08:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24549 and previous config saved to /var/cache/conftool/dbconfig/20220413-030855-ladsgroup.json
[03:08:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:12:53] <icinga-wm>	 RECOVERY - Persistent high iowait on labstore1006 is OK: (C)10 ge (W)5 ge 1.459 https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/d/000000568/labstore1004-1005-1006-1007
[03:19:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: Blazegraph instance wdqs1012:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[03:20:37] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:24:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24550 and previous config saved to /var/cache/conftool/dbconfig/20220413-032400-ladsgroup.json
[03:24:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:27:19] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1012 is OK: HTTP OK: HTTP/1.1 200 OK - 688 bytes in 1.059 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:34:21] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:37:53] <icinga-wm>	 RECOVERY - SSH on wtp1048.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:38:43] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1012 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.062 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:39:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24551 and previous config saved to /var/cache/conftool/dbconfig/20220413-033906-ladsgroup.json
[03:39:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:39:10] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[03:39:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[03:39:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[03:39:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:39:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
[03:39:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:39:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:39:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
[03:39:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:10:27] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[04:12:45] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[04:27:17] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[04:27:18] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[04:27:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:27:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:27:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24552 and previous config saved to /var/cache/conftool/dbconfig/20220413-042723-ladsgroup.json
[04:27:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:27:27] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[04:48:05] <wikibugs>	 (03PS1) 10STran: Enable IP Info instrumentation on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779579 (https://phabricator.wikimedia.org/T304438)
[04:48:58] <wikibugs>	 (03PS2) 10STran: Enable IP Info instrumentation on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779579 (https://phabricator.wikimedia.org/T304438)
[04:56:47] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2085:3311', diff saved to https://phabricator.wikimedia.org/P24553 and previous config saved to /var/cache/conftool/dbconfig/20220413-045646-root.json
[04:56:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:01:18] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1138: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/779114
[05:02:53] <icinga-wm>	 PROBLEM - Check systemd state on db2137 is CRITICAL: CRITICAL - degraded: The following units failed: mariadb.service,prometheus-mysqld-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:05:11] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: hyperkitty didn't import all wikitech-l messages - https://phabricator.wikimedia.org/T281070 (10Legoktm) 05Open→03Resolved a:03Legoktm Unfortunately the very old archives (pre-2004) are not in a great shape just because of old Mailman bugs or some other unknown reasons....
[05:09:49] <icinga-wm>	 RECOVERY - Check systemd state on db2137 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:10:49] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db1138: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/779114 (owner: 10Marostegui)
[05:12:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1138 (re)pooling @ 1%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24554 and previous config saved to /var/cache/conftool/dbconfig/20220413-051238-root.json
[05:12:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:23:43] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 235, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:24:31] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 44, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:25:31] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Nathillard - https://phabricator.wikimedia.org/T305978 (10jcrespo) @Dzahn I responded before I had the chance to read your comments. I didn't see explicit concerns about me proceeding (just hinting that in some cases they may not be needed).  Given th...
[05:32:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24556 and previous config saved to /var/cache/conftool/dbconfig/20220413-053248-ladsgroup.json
[05:32:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:32:53] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[05:35:13] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 236, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:35:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1181 for reboot T306001', diff saved to https://phabricator.wikimedia.org/P24557 and previous config saved to /var/cache/conftool/dbconfig/20220413-053526-root.json
[05:35:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:35:30] <stashbot>	 T306001: Switchover s7 master (db1136 -> db1181) - https://phabricator.wikimedia.org/T306001
[05:35:55] <icinga-wm>	 PROBLEM - SSH on aqs1009.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:36:01] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 45, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:44:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1138 (re)pooling @ 1%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24558 and previous config saved to /var/cache/conftool/dbconfig/20220413-054422-root.json
[05:44:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:44:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P24559 and previous config saved to /var/cache/conftool/dbconfig/20220413-054443-root.json
[05:44:43] <wikibugs>	 (03PS1) 10Jcrespo: admin: Add Nat to the list of privileged ldap users [puppet] - 10https://gerrit.wikimedia.org/r/779749 (https://phabricator.wikimedia.org/T305978)
[05:44:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:46:12] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for Nathillard - https://phabricator.wikimedia.org/T305978 (10jcrespo) p:05Triage→03High
[05:47:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2130 db2088:3311', diff saved to https://phabricator.wikimedia.org/P24560 and previous config saved to /var/cache/conftool/dbconfig/20220413-054739-root.json
[05:47:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:47:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24561 and previous config saved to /var/cache/conftool/dbconfig/20220413-054753-ladsgroup.json
[05:47:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:59:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1138 (re)pooling @ 5%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24562 and previous config saved to /var/cache/conftool/dbconfig/20220413-055925-root.json
[05:59:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:59:47] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P24563 and previous config saved to /var/cache/conftool/dbconfig/20220413-055947-root.json
[05:59:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:01:57] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops: allow certain users to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10jcrespo) This seems to me like a reasonable requests, although as you point out, the details of how to exactly implement it to make...
[06:02:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24564 and previous config saved to /var/cache/conftool/dbconfig/20220413-060258-ladsgroup.json
[06:03:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:14:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: (2) Blazegraph instance wdqs1012:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[06:14:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1138 (re)pooling @ 10%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24565 and previous config saved to /var/cache/conftool/dbconfig/20220413-061429-root.json
[06:14:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:14:51] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P24566 and previous config saved to /var/cache/conftool/dbconfig/20220413-061451-root.json
[06:14:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:18:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24567 and previous config saved to /var/cache/conftool/dbconfig/20220413-061803-ladsgroup.json
[06:18:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:18:08] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[06:18:09] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[06:18:11] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[06:18:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:18:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:18:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24568 and previous config saved to /var/cache/conftool/dbconfig/20220413-061815-ladsgroup.json
[06:18:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:20:49] <wikibugs>	 (03PS1) 10Marostegui: db2072: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/779750
[06:21:32] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2072: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/779750 (owner: 10Marostegui)
[06:29:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24569 and previous config saved to /var/cache/conftool/dbconfig/20220413-062933-root.json
[06:29:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:29:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P24570 and previous config saved to /var/cache/conftool/dbconfig/20220413-062955-root.json
[06:29:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:54] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[06:34:37] <icinga-wm>	 PROBLEM - SSH on labweb1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:44:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24571 and previous config saved to /var/cache/conftool/dbconfig/20220413-064437-root.json
[06:44:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:44:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P24572 and previous config saved to /var/cache/conftool/dbconfig/20220413-064459-root.json
[06:45:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:59:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24573 and previous config saved to /var/cache/conftool/dbconfig/20220413-065941-root.json
[06:59:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:00:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P24574 and previous config saved to /var/cache/conftool/dbconfig/20220413-070002-root.json
[07:00:04] <jouncebot>	 Amir1, awight, Urbanecm, and taavi: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for UTC morning backport window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220413T0700).
[07:00:04] <jouncebot>	 kart_: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:00:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:00:23] * kart_ is here.
[07:00:24] <taavi>	 o/
[07:00:28] <taavi>	 kart_: do you want to self deploy?
[07:00:43] <kart_>	 taavi: yeah. will self-deploy..
[07:00:43] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 120 probes of 677 (alerts on 90) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:01:02] <wikibugs>	 (03PS2) 10KartikMistry: Add SectionTranslation entry points as campaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778381 (https://phabricator.wikimedia.org/T298029)
[07:02:30] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Add SectionTranslation entry points as campaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778381 (https://phabricator.wikimedia.org/T298029) (owner: 10KartikMistry)
[07:02:43] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 46 probes of 760 (alerts on 35) - https://atlas.ripe.net/measurements/32390538/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:03:16] <wikibugs>	 (03Merged) 10jenkins-bot: Add SectionTranslation entry points as campaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/778381 (https://phabricator.wikimedia.org/T298029) (owner: 10KartikMistry)
[07:07:17] <kart_>	 Deploying..
[07:08:03] <logmsgbot>	 !log kartik@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:778381|Add SectionTranslation entry points as campaigns (T298029)]] (duration: 01m 03s)
[07:08:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:08:07] <stashbot>	 T298029: Enable Content Translation beta feature for a user when accessing a Section Translation entry point on mobile - https://phabricator.wikimedia.org/T298029
[07:10:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[07:10:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:10:32] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[07:10:33] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[07:10:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:10:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:10:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[07:10:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:11:47] <kart_>	 taavi: done.
[07:14:01] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 31 probes of 760 (alerts on 35) - https://atlas.ripe.net/measurements/32390538/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:14:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24575 and previous config saved to /var/cache/conftool/dbconfig/20220413-071445-root.json
[07:14:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P24576 and previous config saved to /var/cache/conftool/dbconfig/20220413-071506-root.json
[07:15:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24577 and previous config saved to /var/cache/conftool/dbconfig/20220413-071524-ladsgroup.json
[07:15:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:28] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[07:17:51] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 61 probes of 677 (alerts on 90) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:30:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24578 and previous config saved to /var/cache/conftool/dbconfig/20220413-073029-ladsgroup.json
[07:30:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:31:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2072 db2085:3311', diff saved to https://phabricator.wikimedia.org/P24579 and previous config saved to /var/cache/conftool/dbconfig/20220413-073119-root.json
[07:31:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:35:37] <icinga-wm>	 RECOVERY - SSH on labweb1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:45:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24580 and previous config saved to /var/cache/conftool/dbconfig/20220413-074534-ladsgroup.json
[07:45:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:04] <jouncebot>	 dancy and jnuche: Dear deployers, time to do the MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220413T0800).
[08:00:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24581 and previous config saved to /var/cache/conftool/dbconfig/20220413-080040-ladsgroup.json
[08:00:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:45] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[08:00:46] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[08:00:47] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[08:00:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:48] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
[08:00:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:58] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
[08:01:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:07:22] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Cannot verify NTP status asw1-b12-drmrs - https://phabricator.wikimedia.org/T305840 (10ayounsi) I had a quick look as well, but didn't make any progress.  I tried to bounce NTP with: `lang=diff [edit system] +   processes { +       ntp disable; +   } !    inacti...
[08:32:03] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for Nathillard - https://phabricator.wikimedia.org/T305978 (10dr0ptp4kt) Thanks all. This is all good and well. Thank you for the support and discussion! The access to some of the things around observability and metrics is part of th...
[08:39:31] <icinga-wm>	 RECOVERY - SSH on aqs1009.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:41:02] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.network.cf
[08:41:02] <logmsgbot>	 !log ayounsi@cumin2002 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[08:41:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:25] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.network.cf
[08:41:27] <logmsgbot>	 !log ayounsi@cumin2002 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[08:41:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:56] <jayme>	 !log imported scap 4.6.1 to stretch-/buster-/bullseye-wikimedia - T305949
[08:41:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:59] <stashbot>	 T305949: Deploy Scap version 4.6.1 - https://phabricator.wikimedia.org/T305949
[08:44:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: (2) Blazegraph instance wdqs1012:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[08:44:07] <logmsgbot>	 !log jayme@deploy1002 Started deploy [restbase/deploy@627f7d7] (dev-cluster): (no justification provided)
[08:44:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:48] <logmsgbot>	 !log jayme@deploy1002 Finished deploy [restbase/deploy@627f7d7] (dev-cluster): (no justification provided) (duration: 02m 41s)
[08:46:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[08:47:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[08:47:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24582 and previous config saved to /var/cache/conftool/dbconfig/20220413-084749-ladsgroup.json
[08:47:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:54] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[08:49:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) resolved: (2) Blazegraph instance wdqs1012:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[08:49:54] <wikibugs>	 (03CR) 10Ayounsi: "Had a chat on IRC, that RA for fxp0 seems like a leftover from the factory config or a miss-config when setting up the routers." [homer/public] - 10https://gerrit.wikimedia.org/r/779100 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[08:50:05] <wikibugs>	 (03PS4) 10JMeybohm: Add all members of the ops group to the deployment group [puppet] - 10https://gerrit.wikimedia.org/r/779047 (https://phabricator.wikimedia.org/T305729)
[08:50:31] <wikibugs>	 (03PS4) 10JMeybohm: Switch default group for Kubernetes credentials files to deployment [puppet] - 10https://gerrit.wikimedia.org/r/779048 (https://phabricator.wikimedia.org/T305729)
[08:52:58] <wikibugs>	 (03PS1) 10DCausse: team-search-platform: remove BlazegraphJvmQuakeWarnGC [alerts] - 10https://gerrit.wikimedia.org/r/779831 (https://phabricator.wikimedia.org/T293862)
[09:12:29] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] START helmfile.d/services/datahub: apply on main
[09:12:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:14:00] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
[09:14:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:18:29] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "easy" [software/spicerack] - 10https://gerrit.wikimedia.org/r/779561 (owner: 10Volans)
[09:20:09] <wikibugs>	 (03CR) 10Volans: [C: 03+2] yaml files: fix indentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/779561 (owner: 10Volans)
[09:21:28] <logmsgbot>	 !log jnuche@deploy1002 Started deploy [restbase/deploy@627f7d7] (dev-cluster): (no justification provided)
[09:21:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:22:10] <wikibugs>	 (03CR) 10Ayounsi: WIP move core routers definitions to hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/777347 (https://phabricator.wikimedia.org/T169860) (owner: 10Filippo Giunchedi)
[09:24:19] <logmsgbot>	 !log jnuche@deploy1002 Finished deploy [restbase/deploy@627f7d7] (dev-cluster): (no justification provided) (duration: 02m 51s)
[09:24:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:16] <wikibugs>	 (03Merged) 10jenkins-bot: yaml files: fix indentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/779561 (owner: 10Volans)
[09:33:33] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:37:08] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM. Before merging, got a PCC?" [puppet] - 10https://gerrit.wikimedia.org/r/779474 (https://phabricator.wikimedia.org/T304716) (owner: 10Majavah)
[09:42:42] <wikibugs>	 (03PS1) 10Btullis: Ensure that the datahub consumers use TLS where required [deployment-charts] - 10https://gerrit.wikimedia.org/r/779837 (https://phabricator.wikimedia.org/T301454)
[09:43:20] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] START helmfile.d/services/datahub: apply on main
[09:43:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24585 and previous config saved to /var/cache/conftool/dbconfig/20220413-094341-ladsgroup.json
[09:43:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:45] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[09:44:47] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
[09:44:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:07] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Revert "zotero: Disable paging" [puppet] - 10https://gerrit.wikimedia.org/r/779118 (https://phabricator.wikimedia.org/T291707)
[09:45:29] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Revert "zotero: Disable paging" [puppet] - 10https://gerrit.wikimedia.org/r/779118 (https://phabricator.wikimedia.org/T291707)
[09:51:51] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Revert "zotero: Disable paging" [puppet] - 10https://gerrit.wikimedia.org/r/779118 (https://phabricator.wikimedia.org/T291707) (owner: 10Alexandros Kosiaris)
[09:58:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24586 and previous config saved to /var/cache/conftool/dbconfig/20220413-095846-ladsgroup.json
[09:58:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:15] <icinga-wm>	 RECOVERY - Host analytics1076 is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms
[10:07:40] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34811/console" [puppet] - 10https://gerrit.wikimedia.org/r/779474 (https://phabricator.wikimedia.org/T304716) (owner: 10Majavah)
[10:07:49] <icinga-wm>	 PROBLEM - puppet last run on analytics1077 is CRITICAL: CRITICAL: Puppet last ran 1 day ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[10:08:16] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] P:toolforge::prometheus: simplify prometheus config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/779474 (https://phabricator.wikimedia.org/T304716) (owner: 10Majavah)
[10:09:27] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Ensure that the datahub consumers use TLS where required [deployment-charts] - 10https://gerrit.wikimedia.org/r/779837 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[10:12:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Ensure that the datahub consumers use TLS where required [deployment-charts] - 10https://gerrit.wikimedia.org/r/779837 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[10:13:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24587 and previous config saved to /var/cache/conftool/dbconfig/20220413-101351-ladsgroup.json
[10:13:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:18:54] <wikibugs>	 (03PS2) 10Btullis: Ensure that the datahub consumers use TLS where required [deployment-charts] - 10https://gerrit.wikimedia.org/r/779837 (https://phabricator.wikimedia.org/T301454)
[10:21:41] <wikibugs>	 (03PS1) 10Btullis: Add an A record for datahub.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/779839 (https://phabricator.wikimedia.org/T303049)
[10:28:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24588 and previous config saved to /var/cache/conftool/dbconfig/20220413-102856-ladsgroup.json
[10:28:58] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[10:28:59] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[10:29:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:02] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[10:29:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24589 and previous config saved to /var/cache/conftool/dbconfig/20220413-102904-ladsgroup.json
[10:29:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:30:22] <wikibugs>	 (03PS1) 10Btullis: Add a trafficserver backend mapping rule for datahub [puppet] - 10https://gerrit.wikimedia.org/r/779840 (https://phabricator.wikimedia.org/T303049)
[10:32:54] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[10:33:54] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:36:02] <icinga-wm>	 RECOVERY - puppet last run on analytics1077 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[10:36:06] <wikibugs>	 (03PS2) 10Btullis: Add an A record for datahub.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/779839 (https://phabricator.wikimedia.org/T303049)
[10:40:25] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[10:40:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:45] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main
[10:40:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:41:50] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] START helmfile.d/services/datahub: apply on main
[10:41:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:42:18] <logmsgbot>	 !log btullis@deploy1002 helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
[10:42:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:45:31] <wikibugs>	 (03PS1) 10Volans: mediawiki: call siteinfo in HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/779841
[10:46:02] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] START helmfile.d/services/datahub: apply on main
[10:46:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:46:20] <icinga-wm>	 PROBLEM - SSH on wtp1048.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:46:21] <logmsgbot>	 !log btullis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
[10:46:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:02:59] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops: allow certain users to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10Volans) > let them run "puppet disable/enable" either directly or with a wrapper around it. (the one used by cumin?).  Nobody shoul...
[11:21:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24590 and previous config saved to /var/cache/conftool/dbconfig/20220413-112140-ladsgroup.json
[11:21:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:21:46] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[11:36:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24591 and previous config saved to /var/cache/conftool/dbconfig/20220413-113645-ladsgroup.json
[11:36:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:51] <wikibugs>	 (03PS1) 10Cathal Mooney: Remove config/var for defining bespoke interfaces for IPv6 RAs [homer/public] - 10https://gerrit.wikimedia.org/r/779844 (https://phabricator.wikimedia.org/T299758)
[11:38:04] <logmsgbot>	 !log gmodena@deploy1002 Started deploy [airflow-dags/research@b029f10]: (no justification provided)
[11:38:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:38:11] <logmsgbot>	 !log gmodena@deploy1002 Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 00m 07s)
[11:38:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:40] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Engineering: Request to add user gmodena to analytics-research-admins group - https://phabricator.wikimedia.org/T305880 (10gmodena) >>! In T305880#7848648, @jcrespo wrote: > @gmodena Did the access work?  Hey @jcrespo, I tried a deployment that failed with: ` airflow-dags...
[11:40:08] <topranks>	 !log Remove IPv6 router-advertisement config for fxp0 management interface on cr1-drmrs.
[11:40:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:42:54] <icinga-wm>	 PROBLEM - SSH on aqs1009.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:46:47] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops: allow certain users to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10jcrespo)
[11:46:56] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
[11:46:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:51:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24592 and previous config saved to /var/cache/conftool/dbconfig/20220413-115151-ladsgroup.json
[11:51:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:06:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24593 and previous config saved to /var/cache/conftool/dbconfig/20220413-120656-ladsgroup.json
[12:06:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[12:06:59] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[12:06:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:01] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[12:07:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24594 and previous config saved to /var/cache/conftool/dbconfig/20220413-120704-ladsgroup.json
[12:07:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:15:33] <wikibugs>	 (03PS1) 10Hnowlan: Set production role and add config for restbase2027 [puppet] - 10https://gerrit.wikimedia.org/r/779846
[12:25:45] <wikibugs>	 (03CR) 10Tchanders: [C: 03+1] Enable IP Info instrumentation on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779579 (https://phabricator.wikimedia.org/T304438) (owner: 10STran)
[12:43:08] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] Remove config/var for defining bespoke interfaces for IPv6 RAs [homer/public] - 10https://gerrit.wikimedia.org/r/779844 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[12:44:06] <icinga-wm>	 RECOVERY - SSH on aqs1009.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:48:52] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] mediawiki: call siteinfo in HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/779841 (owner: 10Volans)
[12:55:39] <wikibugs>	 (03CR) 10Ottomata: "Hmmm, what do you think about using a more generic name for the public URL, rather than one associated with the tech?" [dns] - 10https://gerrit.wikimedia.org/r/779839 (https://phabricator.wikimedia.org/T303049) (owner: 10Btullis)
[12:57:40] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Engineering: Request to add user gmodena to analytics-research-admins group - https://phabricator.wikimedia.org/T305880 (10Ottomata) 05Open→03Resolved a:03Ottomata The access works though!  We'll figure out the deployment issues separately.
[12:57:46] <wikibugs>	 (03PS1) 10Volans: setup.py: add missing types for requests [software/homer] - 10https://gerrit.wikimedia.org/r/779849
[12:57:50] <wikibugs>	 (03PS1) 10Volans: capirca: catch also requests exceptions [software/homer] - 10https://gerrit.wikimedia.org/r/779850
[13:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, and Urbanecm: Time to snap out of that daydream and deploy UTC afternoon backport window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220413T1300).
[13:00:05] <jouncebot>	 zabe and Tchanders: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:21] <zabe>	 o/
[13:00:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24595 and previous config saved to /var/cache/conftool/dbconfig/20220413-130050-ladsgroup.json
[13:00:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:00:56] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[13:04:47] <volans>	 !log installed spicerack v2.4.1 on cumin2002
[13:04:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:09] <logmsgbot>	 !log volans@cumin2002 START - Cookbook sre.hosts.downtime for 0:05:00 on sretest[1001-1002].eqiad.wmnet with reason: testing spicerack
[13:10:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:13] <logmsgbot>	 !log volans@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest[1001-1002].eqiad.wmnet with reason: testing spicerack
[13:10:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:15] <wikibugs>	 (03PS1) 10Ladsgroup: Set templatelinks migration schema to write both in s4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779852 (https://phabricator.wikimedia.org/T299421)
[13:12:21] <Amir1>	 jouncebot: nowandnext
[13:12:21] <jouncebot>	 For the next 0 hour(s) and 47 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220413T1300)
[13:12:21] <jouncebot>	 In 0 hour(s) and 47 minute(s): Maintenance script run (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220413T1400)
[13:12:54] <Lucas_WMDE>	 I’m in a meeting, sorry
[13:12:58] <Lucas_WMDE>	 can’t deploy yet
[13:13:14] <logmsgbot>	 !log otto@deploy1002 Started deploy [airflow-dags/research@b029f10]: (no justification provided)
[13:13:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:13:31] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
[13:13:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:13:37] <stashbot>	 T301955: Upgrade relforge to elasticsearch 6.8.23 - https://phabricator.wikimedia.org/T301955
[13:13:41] <logmsgbot>	 !log bking@cumin2002 END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
[13:13:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:13:44] <wikibugs>	 (03PS3) 10Reedy: Use namespaced GerritExtDistProvider [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774963
[13:13:48] <logmsgbot>	 !log otto@deploy1002 Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 00m 34s)
[13:13:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:14:01] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] "ship it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774963 (owner: 10Reedy)
[13:14:40] <Amir1>	 I'm outside so my access is a bit limited
[13:14:46] <Tchanders>	 Lucas_WMDE: Hi! Will you be deploying later this window? (No worries if not - I can reschedule)
[13:14:58] <logmsgbot>	 !log bking@cumin1001 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin1001 - T301955
[13:15:00] <wikibugs>	 (03Merged) 10jenkins-bot: Use namespaced GerritExtDistProvider [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774963 (owner: 10Reedy)
[13:15:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:05] <logmsgbot>	 !log bking@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin1001 - T301955
[13:15:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24596 and previous config saved to /var/cache/conftool/dbconfig/20220413-131555-ladsgroup.json
[13:15:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:24] <Amir1>	 let me see if I can do it
[13:16:31] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
[13:16:35] <logmsgbot>	 !log reedy@deploy1002 Synchronized wmf-config/CommonSettings.php: Use namespaced GerritExtDistProvider (duration: 00m 55s)
[13:16:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:58] <Reedy>	 Amir1: I'm at home etc
[13:16:59] * Reedy looks
[13:17:23] <Amir1>	 if you can do it, it'd be awesome
[13:17:45] <Amir1>	 and once done https://gerrit.wikimedia.org/r/779852 as well :D
[13:17:56] <wikibugs>	 (03PS3) 10Reedy: Enable IP Info instrumentation on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779579 (https://phabricator.wikimedia.org/T304438) (owner: 10STran)
[13:17:59] <Amir1>	 but it's just a switch flip, it should be fine
[13:17:59] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Enable IP Info instrumentation on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779579 (https://phabricator.wikimedia.org/T304438) (owner: 10STran)
[13:18:14] <Tchanders>	 Amir1, Reedy: Thanks. I need to attend another training since it's been a while, but they're all outside my hours currently...
[13:18:27] <Reedy>	 Not much has changed... :)
[13:18:49] <Tchanders>	 Maybe just my memory/confidence...
[13:19:02] <wikibugs>	 (03Merged) 10jenkins-bot: Enable IP Info instrumentation on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779579 (https://phabricator.wikimedia.org/T304438) (owner: 10STran)
[13:19:23] <wikibugs>	 (03PS4) 10Reedy: Migrate $wmfUdp2logDest to $wmgUdp2logDest [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776258 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[13:19:34] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
[13:19:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:19:37] <stashbot>	 T301955: Upgrade relforge to elasticsearch 6.8.23 - https://phabricator.wikimedia.org/T301955
[13:19:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:19:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:19:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:45] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:19:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:53] <Reedy>	 Tchanders: Do you care about testing it on a mwdebug host? Or shall I just sync it out as it's only for testwiki?
[13:20:16] <Tchanders>	 Reedy: Would you mind if I test? We managed to break beta with a similar patch in the past
[13:20:21] <Reedy>	 heh
[13:20:23] <Reedy>	 yeah, that's fine
[13:20:25] <Reedy>	 moment
[13:21:04] <Reedy>	 Tchanders: it's on mwdebug1002
[13:21:12] <Tchanders>	 Testing...
[13:22:09] <jynus>	 question, if I run into something that may be a recent bug from a train deployment, marking it as wm-production-error is enough to flag it, right?
[13:22:19] <Tchanders>	 Reedy: Looks good - thank you
[13:22:25] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Unify loopback filters between CR routers and L3 switches - https://phabricator.wikimedia.org/T304553 (10cmooney) 05Open→03Resolved
[13:22:39] <Reedy>	 jynus: Not usually AFAIK. You can mark it as a blocker of the deployment task
[13:23:07] <jynus>	 ok, that is the part I am unsure about- how to know if it is a blocker or a regular bug?
[13:23:22] <Reedy>	 If you're not sure, file it as a blocker. It'll guarantee it gets triaged
[13:23:26] <jynus>	 ok
[13:23:31] <jynus>	 will do
[13:23:34] <Reedy>	 better safe than sorry etc
[13:24:07] <logmsgbot>	 !log reedy@deploy1002 Synchronized wmf-config/InitialiseSettings.php: T304438 (duration: 01m 03s)
[13:24:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:11] <stashbot>	 T304438: Enable IP Info instrumentation in testwiki - https://phabricator.wikimedia.org/T304438
[13:24:16] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Migrate $wmfUdp2logDest to $wmgUdp2logDest [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776258 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[13:24:25] <wikibugs>	 (03CR) 10Btullis: Add an A record for datahub.wikimedia.org (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/779839 (https://phabricator.wikimedia.org/T303049) (owner: 10Btullis)
[13:24:37] <jynus>	 ah, I think a team filed a duplicate and is aware already, so that0s ok
[13:24:49] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:24:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:24:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:24:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:24:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:00] <wikibugs>	 (03Merged) 10jenkins-bot: Migrate $wmfUdp2logDest to $wmgUdp2logDest [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776258 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[13:25:15] <wikibugs>	 (03PS2) 10Reedy: Set templatelinks migration schema to write both in s4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779852 (https://phabricator.wikimedia.org/T299421) (owner: 10Ladsgroup)
[13:27:00] <logmsgbot>	 !log reedy@deploy1002 Synchronized wmf-config/: Migrate $wmfUdp2logDest to $wmgUdp2logDest - T45956 (duration: 00m 55s)
[13:27:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:27:06] <stashbot>	 T45956: Rename $wmf* to $wmg* in wmf-config - https://phabricator.wikimedia.org/T45956
[13:27:34] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] "Okay" [dns] - 10https://gerrit.wikimedia.org/r/779839 (https://phabricator.wikimedia.org/T303049) (owner: 10Btullis)
[13:27:41] <zabe>	 thanks Reedy 
[13:28:09] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Set templatelinks migration schema to write both in s4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779852 (https://phabricator.wikimedia.org/T299421) (owner: 10Ladsgroup)
[13:28:18] <wikibugs>	 (03PS3) 10Zabe: Stop writing to $wmfUdp2logDest [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776259 (https://phabricator.wikimedia.org/T45956)
[13:29:02] <wikibugs>	 (03Merged) 10jenkins-bot: Set templatelinks migration schema to write both in s4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779852 (https://phabricator.wikimedia.org/T299421) (owner: 10Ladsgroup)
[13:30:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:30:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:30:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:30:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:30:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:48] <logmsgbot>	 !log reedy@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Set templatelinks migration schema to write both in s4 - T299421 (duration: 00m 55s)
[13:30:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:52] <stashbot>	 T299421: Turn on write both in production for templatelinks normalization - https://phabricator.wikimedia.org/T299421
[13:31:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24597 and previous config saved to /var/cache/conftool/dbconfig/20220413-133100-ladsgroup.json
[13:31:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:31:55] <Amir1>	 Thanks Reedy 
[13:33:01] <wikibugs>	 (03PS2) 10Zabe: Write the same value to wmgSwiftConfig as to wmfSwiftConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/768259 (https://phabricator.wikimedia.org/T45956)
[13:33:59] <volans>	 !log installed spicerack v2.4.1 on cumin1001
[13:34:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:03] <logmsgbot>	 !log milimetric@deploy1002 Started deploy [analytics/refinery@34be9f3] (thin): Regular analytics weekly train THIN [analytics/refinery@34be9f3]
[13:34:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:11] <logmsgbot>	 !log milimetric@deploy1002 Finished deploy [analytics/refinery@34be9f3] (thin): Regular analytics weekly train THIN [analytics/refinery@34be9f3] (duration: 00m 07s)
[13:34:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:35:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:20] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:35:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:35:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:35:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:16] <Lucas_WMDE>	 alright, I’m back… anything still needs to be deployed? ^^
[13:38:30] <zabe>	 no
[13:38:37] <Lucas_WMDE>	 alright
[13:38:43] <Lucas_WMDE>	 then I’ll just wait until the next window starts
[13:45:05] <wikibugs>	 (03CR) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[13:46:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24598 and previous config saved to /var/cache/conftool/dbconfig/20220413-134605-ladsgroup.json
[13:46:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[13:46:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[13:46:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:11] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[13:46:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24599 and previous config saved to /var/cache/conftool/dbconfig/20220413-134613-ladsgroup.json
[13:46:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:32] <wikibugs>	 (03PS1) 10Zabe: Migrate $wmfSwiftConfig to $wmgSwiftConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779856 (https://phabricator.wikimedia.org/T45956)
[13:58:22] <jynus>	 !log restarting bacula hosts
[13:58:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:36] <wikibugs>	 (03PS1) 104nn1l2: fawiki: Change logo for 900K milestone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779858 (https://phabricator.wikimedia.org/T306030)
[13:58:37] <jynus>	 ^backups will be unavailable for some minutes
[13:59:32] <wikibugs>	 (03CR) 10Bking: [C: 03+2] elastic: allow waiting for yellow instead of green [cookbooks] - 10https://gerrit.wikimedia.org/r/778335 (https://phabricator.wikimedia.org/T304570) (owner: 10Ryan Kemper)
[14:00:04] <jouncebot>	 Lucas_WMDE and hoo: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Maintenance script run. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220413T1400).
[14:00:35] <Lucas_WMDE>	 o/
[14:00:45] <Lucas_WMDE>	 alright, let’s go
[14:01:00] <wikibugs>	 (03PS1) 10Zabe: wikitech_private: convert to new array syntax [puppet] - 10https://gerrit.wikimedia.org/r/779860
[14:05:31] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php testwiki
[14:05:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:23] <logmsgbot>	 !log otto@deploy1002 Started deploy [airflow-dags/research@b029f10]: (no justification provided)
[14:06:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:27] <logmsgbot>	 !log otto@deploy1002 Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 00m 04s)
[14:06:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:15] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ foreachwikiindblist wikidataclient-test extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php
[14:07:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:51] <Base>	 Was something deployed and reverted to whatever cluster ukwiki is in? https://phabricator.wikimedia.org/T306033
[14:08:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job bacula in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:09:24] <Lucas_WMDE>	 Base: not today as far as I’m aware / can see in the SAL
[14:09:41] <Lucas_WMDE>	 ukwiki is in group2, so it wouldn’t be affected by the train yet
[14:11:38] <Base>	 interesting
[14:12:24] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Use "unexpectedUnconnectedPage" page prop on wikidataclient-test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779861
[14:13:18] <jynus>	 Base: note that doesn't mean you were wrong- there are many things "on the fly" (browser's cache, cdn's cache). Site notice I belive is js heavy, which adds to weirdness
[14:14:14] <jynus>	 ask if someone from the community see it wrong now, and if not, you can close the ticket :-)
[14:14:43] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Use "unexpectedUnconnectedPage" page prop on wikidataclient-test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779861 (owner: 10Lucas Werkmeister (WMDE))
[14:14:54] <Base>	 Sitenotice, unlike Centralnotice isn't that JS heavy I think
[14:15:37] <jynus>	 ah, sorry, I mixed those
[14:15:38] <wikibugs>	 (03Merged) 10jenkins-bot: Use "unexpectedUnconnectedPage" page prop on wikidataclient-test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779861 (owner: 10Lucas Werkmeister (WMDE))
[14:15:49] <jynus>	 but still- could be a job that took more than usual, etc.
[14:16:18] <zabe>	 sitenotice also can take some time until it is updated on all caching layers, but usually not /that/ long
[14:17:46] <Base>	 Well it is not a new one too, it was placed on March 6
[14:17:58] <Base>	 Having links render as self-link is a weird thing too
[14:18:09] <jynus>	 yeah, that I agree
[14:18:35] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:779861|Use "unexpectedUnconnectedPage" page prop on wikidataclient-test]] (duration: 00m 55s)
[14:18:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:20:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:21:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:21:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:21:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:21:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:23] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
[14:23:23] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
[14:23:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:26] <stashbot>	 T301955: Upgrade relforge to elasticsearch 6.8.23 - https://phabricator.wikimedia.org/T301955
[14:23:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:47] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --last-page-id 10000000
[14:23:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:01] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
[14:24:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:06] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
[14:27:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:28:25] <Lucas_WMDE>	 lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --first-page-id 10000001 --last-page-id 20000000
[14:31:03] <Lucas_WMDE>	 oops, forgot the log
[14:31:07] <Lucas_WMDE>	 well, that’s done now
[14:31:09] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --first-page-id 10000001 --last-page-id 20000000
[14:31:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:31:26] <jynus>	 !log bacula restarts finished
[14:31:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:31:32] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --first-page-id 20000001 --last-page-id 30000000
[14:31:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:32:54] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[14:33:45] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job bacula in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:36:07] <wikibugs>	 (03PS1) 10Stang: Optimize logo for Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779865 (https://phabricator.wikimedia.org/T306037)
[14:36:15] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 30000001 --last-page-id 40000000
[14:36:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:53] <wikibugs>	 (03PS1) 10Ottomata: Declare new deployer groups for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887
[14:39:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24600 and previous config saved to /var/cache/conftool/dbconfig/20220413-143948-ladsgroup.json
[14:39:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:53] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[14:40:22] <wikibugs>	 (03CR) 10Bking: [V: 03+2] wdqs: activate jvmquake at 300:5 [puppet] - 10https://gerrit.wikimedia.org/r/779440 (https://phabricator.wikimedia.org/T293862) (owner: 10DCausse)
[14:40:30] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Declare new deployer groups for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887 (owner: 10Ottomata)
[14:40:33] <wikibugs>	 (03CR) 10Bking: [V: 03+2 C: 03+2] wdqs: activate jvmquake at 300:5 [puppet] - 10https://gerrit.wikimedia.org/r/779440 (https://phabricator.wikimedia.org/T293862) (owner: 10DCausse)
[14:40:50] <wikibugs>	 (03CR) 10JHathaway: mx: use $domain_data rather than $domain for aliases (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/779504 (https://phabricator.wikimedia.org/T305962) (owner: 10JHathaway)
[14:41:06] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 40000001 --last-page-id 50000000
[14:41:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:44] <wikibugs>	 (03CR) 10Andrew Bogott: Create REST api service to manage toolforge replica.my.cnf (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[14:43:32] <wikibugs>	 10SRE-OnFire, 10Wikidata, 10wdwb-tech, 10Discovery-Search (Current work), and 3 others: Only generate maxlag from pooled query service servers. - https://phabricator.wikimedia.org/T238751 (10Addshore) @Joe (Also pinging @akosiaris as I know joe is out right now). It seems like the ideal solution of {T23939...
[14:46:21] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 50000001 --last-page-id 60000000
[14:46:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:15] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 60000001 --last-page-id 70000000
[14:50:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:14] <icinga-wm>	 RECOVERY - SSH on wtp1048.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:54:17] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 70000001 --last-page-id 80000000
[14:54:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:54:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24601 and previous config saved to /var/cache/conftool/dbconfig/20220413-145453-ladsgroup.json
[14:54:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:12] <wikibugs>	 (03PS2) 10Jcrespo: admin: Add Nat to the list of privileged ldap users [puppet] - 10https://gerrit.wikimedia.org/r/779749 (https://phabricator.wikimedia.org/T305978)
[14:58:07] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] admin: Add Nat to the list of privileged ldap users [puppet] - 10https://gerrit.wikimedia.org/r/779749 (https://phabricator.wikimedia.org/T305978) (owner: 10Jcrespo)
[14:58:30] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "checked in LDAP, looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/779749 (https://phabricator.wikimedia.org/T305978) (owner: 10Jcrespo)
[14:58:41] <wikibugs>	 (03PS2) 10Ottomata: Declare new deployer groups for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887
[14:58:45] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 80000001 --last-page-id 90000000
[14:58:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:59:28] <wikibugs>	 10SRE-OnFire, 10Wikidata, 10wdwb-tech, 10Discovery-Search (Current work), and 3 others: Only generate maxlag from pooled query service servers. - https://phabricator.wikimedia.org/T238751 (10akosiaris) >>! In T238751#7851690, @Addshore wrote: > @Joe (Also pinging @akosiaris as I know joe is out right now)....
[14:59:42] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for Nathillard - https://phabricator.wikimedia.org/T305978 (10Dzahn) >>! In T305978#7850466, @jcrespo wrote: > @Dzahn I responded before I had the chance to read your comments. I didn't see explicit concerns about me proceeding (just...
[15:00:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Declare new deployer groups for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887 (owner: 10Ottomata)
[15:00:18] <wikibugs>	 (03CR) 10Volans: "reply inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/775904 (owner: 10Volans)
[15:03:08] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 90000001 --last-page-id 100000000
[15:03:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:12] <wikibugs>	 (03PS3) 10Ottomata: Declare new deployer groups for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887
[15:04:05] <wikibugs>	 (03PS4) 10Ottomata: Declare new deployer groups for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887
[15:05:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Declare new deployer groups for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887 (owner: 10Ottomata)
[15:06:45] <wikibugs>	 (03PS5) 10Ottomata: Declare new research-deployers group for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887
[15:07:18] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 100000001 --last-page-id 110000000
[15:07:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:58] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34816/console" [puppet] - 10https://gerrit.wikimedia.org/r/779887 (owner: 10Ottomata)
[15:08:13] <wikibugs>	 (03CR) 10Vivian Rook: [C: 03+2] add chunkeddriver.py.patch to wallaby [puppet] - 10https://gerrit.wikimedia.org/r/777873 (https://phabricator.wikimedia.org/T304694) (owner: 10Vivian Rook)
[15:08:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Declare new research-deployers group for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887 (owner: 10Ottomata)
[15:08:35] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for Nathillard - https://phabricator.wikimedia.org/T305978 (10jcrespo) @NHillard-WMF Access deployed- you can test it works for you on gerrit, or any of the other services granted? https://wikitech.wikimedia.org/wiki/SRE/LDAP/Groups#...
[15:09:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24602 and previous config saved to /var/cache/conftool/dbconfig/20220413-150959-ladsgroup.json
[15:10:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:08] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 110000001
[15:11:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:30] <wikibugs>	 (03PS6) 10Ottomata: Declare new research-deployers group for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887
[15:15:38] <wikibugs>	 (03PS7) 10Ottomata: Declare new research-deployers group for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887
[15:17:17] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34819/console" [puppet] - 10https://gerrit.wikimedia.org/r/779887 (owner: 10Ottomata)
[15:18:43] <wikibugs>	 (03PS8) 10Ottomata: Declare new research-deployers group for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887
[15:21:14] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Declare new research-deployers group for airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/779887 (owner: 10Ottomata)
[15:23:26] <logmsgbot>	 !log otto@deploy1002 Started deploy [airflow-dags/research@b029f10]: (no justification provided)
[15:23:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:23:36] <logmsgbot>	 !log otto@deploy1002 Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 00m 10s)
[15:23:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24603 and previous config saved to /var/cache/conftool/dbconfig/20220413-152504-ladsgroup.json
[15:25:05] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[15:25:07] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[15:25:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:10] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:25:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:28:12] <icinga-wm>	 PROBLEM - Host mw1308 is DOWN: PING CRITICAL - Packet loss = 100%
[15:29:58] <icinga-wm>	 RECOVERY - Host mw1308 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms
[15:31:33] <wikibugs>	 10SRE, 10ops-eqiad: mw1308 - internal IPMI error - mgmt / DRAC problem - https://phabricator.wikimedia.org/T305741 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson Fixed
[15:32:22] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host cloudstore1010.wikimedia.org with OS bullseye
[15:32:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:32:29] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudstore1010.wikimedia.or...
[15:37:09] <logmsgbot>	 !log otto@deploy1002 Started deploy [airflow-dags/research@b029f10]: (no justification provided)
[15:37:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:13] <logmsgbot>	 !log otto@deploy1002 Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 00m 03s)
[15:37:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:27] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Cmjohnson) @volans @Papaul  I get this during the install.  This requires a manual entry  [            (1*installer)  2 shell  3...
[15:41:22] <wikibugs>	 10SRE, 10Data-Catalog, 10Data-Engineering, 10serviceops, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) I'm unsure what else I need to do now to make this new service available.  I've successfully deployed the service to staging, eqiad and codfw using `he...
[15:45:19] <logmsgbot>	 !log cmjohnson@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudstore1010.wikimedia.org with OS bullseye
[15:45:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:45:24] <wikibugs>	 (03PS1) 10Ottomata: Bounce keyholder-proxy when keyholder-auth.d group -> key mapping changes [puppet] - 10https://gerrit.wikimedia.org/r/779897
[15:45:25] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudstore1010.wikimedia.org wi...
[15:45:57] <wikibugs>	 (03PS1) 10Btullis: Update datahub to use version 0.8.32 [deployment-charts] - 10https://gerrit.wikimedia.org/r/779898 (https://phabricator.wikimedia.org/T306019)
[15:46:15] <wikibugs>	 (03PS1) 10Majavah: openstack: make enc-cli authenticate via keystone [puppet] - 10https://gerrit.wikimedia.org/r/779899 (https://phabricator.wikimedia.org/T274666)
[15:47:23] <wikibugs>	 (03CR) 10Ottomata: "I'll try to check that this works in a few weeks when I have to add another deployers group for platform eng." [puppet] - 10https://gerrit.wikimedia.org/r/779897 (owner: 10Ottomata)
[15:47:51] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mc2023.codfw.wmnet with reason: moving to a different rack
[15:47:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:53] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mc2023.codfw.wmnet with reason: moving to a different rack
[15:47:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:58] <icinga-wm>	 PROBLEM - SSH on aqs1009.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:47:59] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=575a5fd0-668b-41f6-8ab3-5ff749f54ac7) set by akosiaris@cumin1001 for 2 days, 0:00:00 on 1 host(...
[15:48:01] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubestage2002.codfw.wmnet with reason: moving to a different rack
[15:48:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:03] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubestage2002.codfw.wmnet with reason: moving to a different rack
[15:48:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:09] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=60f8ccbd-38ba-4b65-aadf-f44a7fc83c9e) set by akosiaris@cumin1001 for 2 days, 0:00:00 on 1 host(...
[15:49:15] <wikibugs>	 (03PS1) 10MVernon: swift: correct handling of non-ASCII paths in rewrite.py & test suite [puppet] - 10https://gerrit.wikimedia.org/r/779900 (https://phabricator.wikimedia.org/T305942)
[15:49:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] swift: correct handling of non-ASCII paths in rewrite.py & test suite [puppet] - 10https://gerrit.wikimedia.org/r/779900 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[15:49:57] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10akosiaris) @Papaul: mc2023 and kubestage2002 have been downtimed again (for 2days) and I 've just powered them off. The should be ready to be moved.
[15:50:39] <wikibugs>	 (03PS1) 10Zabe: webperf: migrate warm_up_coal_cache cron to systemd timer job [puppet] - 10https://gerrit.wikimedia.org/r/779901 (https://phabricator.wikimedia.org/T273673)
[15:50:41] <wikibugs>	 (03PS1) 10Zabe: webperf: remove absented warm_up_coal_cache cron [puppet] - 10https://gerrit.wikimedia.org/r/779902 (https://phabricator.wikimedia.org/T273673)
[15:51:03] <hoo>	 !log Ran extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of enwiki (for 5M pages each).
[15:51:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] webperf: migrate warm_up_coal_cache cron to systemd timer job [puppet] - 10https://gerrit.wikimedia.org/r/779901 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[15:51:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] webperf: remove absented warm_up_coal_cache cron [puppet] - 10https://gerrit.wikimedia.org/r/779902 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[15:52:11] <wikibugs>	 (03PS2) 10Zabe: webperf: migrate warm_up_coal_cache cron to systemd timer job [puppet] - 10https://gerrit.wikimedia.org/r/779901 (https://phabricator.wikimedia.org/T273673)
[15:52:58] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of wikidatawiki (for 5M pages each).
[15:53:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:35] <wikibugs>	 (03PS2) 10MVernon: swift: correct handling of non-ASCII paths in rewrite.py & test suite [puppet] - 10https://gerrit.wikimedia.org/r/779900 (https://phabricator.wikimedia.org/T305942)
[15:54:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job calico-felix in k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:54:58] <jinxer-wm>	 (KubernetesCalicoDown) firing: (2) kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[15:55:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] swift: correct handling of non-ASCII paths in rewrite.py & test suite [puppet] - 10https://gerrit.wikimedia.org/r/779900 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[15:57:25] <wikibugs>	 (03PS2) 10Zabe: webperf: remove absented warm_up_coal_cache cron [puppet] - 10https://gerrit.wikimedia.org/r/779902 (https://phabricator.wikimedia.org/T273673)
[15:57:26] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Cmjohnson) a:05Cmjohnson→03nskaggs These servers fail partman, it appears that the installer is looking for an answer that i...
[15:57:42] <wikibugs>	 (03PS3) 10MVernon: swift: correct handling of non-ASCII paths in rewrite.py & test suite [puppet] - 10https://gerrit.wikimedia.org/r/779900 (https://phabricator.wikimedia.org/T305942)
[15:59:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] swift: correct handling of non-ASCII paths in rewrite.py & test suite [puppet] - 10https://gerrit.wikimedia.org/r/779900 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[15:59:58] <jinxer-wm>	 (KubernetesCalicoDown) firing: (2) kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[16:01:28] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of frwiki (for 5M pages each).
[16:01:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:01:36] <wikibugs>	 (03PS4) 10MVernon: swift: correct handling of non-ASCII paths in rewrite.py & test suite [puppet] - 10https://gerrit.wikimedia.org/r/779900 (https://phabricator.wikimedia.org/T305942)
[16:02:45] <logmsgbot>	 !log dzahn@cumin2002 conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1308.eqiad.wmnet
[16:02:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:02:52] <wikibugs>	 10SRE, 10ops-eqiad: mw1308 - internal IPMI error - mgmt / DRAC problem - https://phabricator.wikimedia.org/T305741 (10Dzahn) Thank you, Chris.   - server repooled
[16:04:28] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM!  Also good syntax examples and I learnt what typing stubs were, so thanks :)" [software/homer] - 10https://gerrit.wikimedia.org/r/779849 (owner: 10Volans)
[16:04:35] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of jawiki (for 5M pages each).
[16:04:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:04:49] <wikibugs>	 (03PS1) 10Zabe: Revert "Start writing to cuc_actor in guwwiki and shnwikivoyage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779906 (https://phabricator.wikimedia.org/T306045)
[16:05:20] <wikibugs>	 (03CR) 10Btullis: "Adding Arzhel for the traffic perspective." [puppet] - 10https://gerrit.wikimedia.org/r/779840 (https://phabricator.wikimedia.org/T303049) (owner: 10Btullis)
[16:06:54] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of ruwiki (for 5M pages each).
[16:06:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:56] <wikibugs>	 (03CR) 10RhinosF1: [C: 03+1] Revert "Start writing to cuc_actor in guwwiki and shnwikivoyage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779906 (https://phabricator.wikimedia.org/T306045) (owner: 10Zabe)
[16:07:05] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Revert "Start writing to cuc_actor in guwwiki and shnwikivoyage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779906 (https://phabricator.wikimedia.org/T306045) (owner: 10Zabe)
[16:07:41] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM!" [software/homer] - 10https://gerrit.wikimedia.org/r/779850 (owner: 10Volans)
[16:08:17] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Start writing to cuc_actor in guwwiki and shnwikivoyage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779906 (https://phabricator.wikimedia.org/T306045) (owner: 10Zabe)
[16:09:35] <RhinosF1>	 Thanks Reedy
[16:09:37] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Remove config/var for defining bespoke interfaces for IPv6 RAs [homer/public] - 10https://gerrit.wikimedia.org/r/779844 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[16:11:02] <wikibugs>	 (03CR) 10Btullis: "Adding Arzhel for the traffic perspective." [dns] - 10https://gerrit.wikimedia.org/r/779839 (https://phabricator.wikimedia.org/T303049) (owner: 10Btullis)
[16:11:11] <wikibugs>	 (03Merged) 10jenkins-bot: Remove config/var for defining bespoke interfaces for IPv6 RAs [homer/public] - 10https://gerrit.wikimedia.org/r/779844 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[16:12:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[16:12:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[16:12:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[16:12:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[16:12:42] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[16:12:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24604 and previous config saved to /var/cache/conftool/dbconfig/20220413-161245-ladsgroup.json
[16:12:46] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[16:12:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:52] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:12:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:18] <logmsgbot>	 !log reedy@deploy1002 Synchronized wmf-config/InitialiseSettings.php: T306045 (duration: 00m 55s)
[16:13:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:22] <stashbot>	 T306045: Column 'cuc_actor' cannot be null (localhost) when logging in with incorrect creds - https://phabricator.wikimedia.org/T306045
[16:13:36] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of cebwiki (for 5M pages each).
[16:13:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:22] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of viwiki (for 5M pages each).
[16:18:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:19:29] <wikibugs>	 (03CR) 10Eevans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/779846 (owner: 10Hnowlan)
[16:20:48] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of metawiki (for 5M pages each).
[16:20:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:21:36] <Lucas_WMDE>	 huh, I wouldn’t have thought that metawiki even has 5M pages
[16:22:23] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] mediawiki: call siteinfo in HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/779841 (owner: 10Volans)
[16:22:47] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10jmads) 05Resolved→03Open re-opening this ticket to restore access to analytics-privatedata-users ldap group.
[16:24:17] <hoo>	 Lucas_WMDE: It has over 8M user talk pages :O
[16:26:35] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of ruwikinews (for 5M pages each).
[16:26:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:40] <Lucas_WMDE>	 ah :D
[16:39:26] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The following units failed: check_netbox_uncommitted_dns_changes.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:39:48] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-launcher1002.eqiad.wmnet
[16:39:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:39:50] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/779901 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[16:40:20] <razzi>	 !log reboot an-launcher1002 for security updates
[16:40:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:23] <wikibugs>	 (03CR) 10Volans: [C: 03+2] setup.py: add missing types for requests [software/homer] - 10https://gerrit.wikimedia.org/r/779849 (owner: 10Volans)
[16:41:38] <wikibugs>	 (03CR) 10Volans: [C: 03+2] capirca: catch also requests exceptions [software/homer] - 10https://gerrit.wikimedia.org/r/779850 (owner: 10Volans)
[16:41:42] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:41:47] <wikibugs>	 (03CR) 10Volans: [C: 03+2] mediawiki: call siteinfo in HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/779841 (owner: 10Volans)
[16:41:53] <Base>	 any idea why I might be getting a base@gerrit.wikimedia.org: Permission denied (publickey). when attempting to clone or pull? I do have my ssh key added to gerrit. Might be something on my side, since I recently had a system upgrade, but gitlab.com clone works fine.
[16:42:39] <Reedy>	 correct key loaded into the agent?
[16:42:44] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: An error occurred checking if Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[16:43:03] <Base>	 Reedy: well, I only have one
[16:44:22] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:45:34] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops, 10Release-Engineering-Team (Radar): Need a service account on deploy servers - https://phabricator.wikimedia.org/T303857 (10thcipriani)
[16:45:41] <wikibugs>	 (03Merged) 10jenkins-bot: setup.py: add missing types for requests [software/homer] - 10https://gerrit.wikimedia.org/r/779849 (owner: 10Volans)
[16:45:43] <wikibugs>	 (03Merged) 10jenkins-bot: capirca: catch also requests exceptions [software/homer] - 10https://gerrit.wikimedia.org/r/779850 (owner: 10Volans)
[16:48:03] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-launcher1002.eqiad.wmnet
[16:48:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:14] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: call siteinfo in HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/779841 (owner: 10Volans)
[16:50:28] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all Wikidata clients of s2 (with --batch-size 250).
[16:50:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:45] <wikibugs>	 (03PS3) 10Hnowlan: changeprop: add sampling configuration, set num_workers [deployment-charts] - 10https://gerrit.wikimedia.org/r/767080 (https://phabricator.wikimedia.org/T300914)
[17:03:54] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all remaining Wikidata clients of s3 (with --batch-size 250).
[17:03:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:06:59] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all remaining Wikidata clients of s5 (with --batch-size 250).
[17:07:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:08:05] <wikibugs>	 (03CR) 10Dzahn: "thanks, i'll do it soon" [puppet] - 10https://gerrit.wikimedia.org/r/779901 (https://phabricator.wikimedia.org/T273673) (owner: 10Zabe)
[17:09:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24605 and previous config saved to /var/cache/conftool/dbconfig/20220413-170907-ladsgroup.json
[17:09:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:12] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:09:44] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[17:09:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:10:07] <wikibugs>	 (03PS1) 10Btullis: Use both dbproxy101[89] servers for both wikireplica services [puppet] - 10https://gerrit.wikimedia.org/r/779915 (https://phabricator.wikimedia.org/T298940)
[17:10:48] <hoo>	 !log Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all remaining Wikidata clients of s7 (with --batch-size 250).
[17:10:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:12:35] <wikibugs>	 (03PS2) 10Btullis: Use both dbproxy101[89] servers for both wikireplica services [puppet] - 10https://gerrit.wikimedia.org/r/779915 (https://phabricator.wikimedia.org/T298940)
[17:12:41] <wikibugs>	 (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/779915 (https://phabricator.wikimedia.org/T298940) (owner: 10Btullis)
[17:12:48] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "The change looks sane, I don't have too much context to foresee all the possible implications, but I can't see anything wrong with it." [puppet] - 10https://gerrit.wikimedia.org/r/779900 (https://phabricator.wikimedia.org/T305942) (owner: 10MVernon)
[17:13:52] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:13:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:57] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to nda for jmads - https://phabricator.wikimedia.org/T306117 (10jmads)
[17:16:41] <wikibugs>	 (03CR) 10Btullis: [V: 03+1] "PCC SUCCESS (NOOP 2 DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34820/console" [puppet] - 10https://gerrit.wikimedia.org/r/779915 (https://phabricator.wikimedia.org/T298940) (owner: 10Btullis)
[17:24:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24606 and previous config saved to /var/cache/conftool/dbconfig/20220413-172412-ladsgroup.json
[17:24:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:28:02] <wikibugs>	 10SRE, 10MediaWiki-General, 10Performance-Team, 10serviceops-radar, and 5 others: Move MainStash out of Redis to a simpler multi-dc aware solution - https://phabricator.wikimedia.org/T212129 (10Krinkle)
[17:28:17] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[17:28:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:54] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install ganeti10[29|3(012)] - https://phabricator.wikimedia.org/T299459 (10Cmjohnson)
[17:35:51] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:35:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:39:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24607 and previous config saved to /var/cache/conftool/dbconfig/20220413-173917-ladsgroup.json
[17:39:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:39:39] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[17:39:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:40:46] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware: decommission kubernetes100[1-4] - https://phabricator.wikimedia.org/T303044 (10Cmjohnson)
[17:42:05] <wikibugs>	 (03PS2) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[17:43:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[17:44:51] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware: decommission kubernetes100[1-4] - https://phabricator.wikimedia.org/T303044 (10Cmjohnson) 05Open→03Resolved
[17:44:58] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:45:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:42] <wikibugs>	 (03CR) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[17:46:45] <wikibugs>	 (03CR) 10Raymond Ndibe: "the test is failing because the test tool doesn't recognize certain type hints. Wondering if we should remove those?" [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[17:48:43] <wikibugs>	 10ops-eqiad: Port with no description on access switch - https://phabricator.wikimedia.org/T304849 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson this is cloudstore1011, netbox is updated now
[17:50:40] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34] - https://phabricator.wikimedia.org/T294972 (10Cmjohnson) This is blocked until vlans for these switches are ready
[17:51:04] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[17:51:44] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10serviceops: Q4: (Need By: TBD) rack/setup/install mw14[57-98] - https://phabricator.wikimedia.org/T306121 (10RobH)
[17:51:58] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10serviceops: Q4: (Need By: TBD) rack/setup/install mw14[57-98] - https://phabricator.wikimedia.org/T306121 (10RobH)
[17:52:34] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10serviceops: Q4: (Need By: TBD) rack/setup/install mw14[57-98] - https://phabricator.wikimedia.org/T306121 (10RobH)
[17:54:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24608 and previous config saved to /var/cache/conftool/dbconfig/20220413-175422-ladsgroup.json
[17:54:24] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[17:54:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[17:54:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:27] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:54:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24609 and previous config saved to /var/cache/conftool/dbconfig/20220413-175430-ladsgroup.json
[17:54:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:31] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10RhinosF1) a:05fgiunchedi→03None
[17:54:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:56] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10RhinosF1) Moving back to SRE queue
[17:55:03] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10RhinosF1) >>! In T249873#7852201, @jmads wrote: > re-opening this ticket to restore access to analytics-privatedata-users ldap group. Is everything above still the same?...
[17:57:11] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to nda for jmads - https://phabricator.wikimedia.org/T306117 (10RhinosF1) See also  T249873
[17:57:37] <wikibugs>	 (03PS1) 10Razzi: wikireplicas: depool clouddb1015-16 [puppet] - 10https://gerrit.wikimedia.org/r/779918 (https://phabricator.wikimedia.org/T299480)
[18:00:04] <jouncebot>	 dancy and jnuche: Time to snap out of that daydream and deploy Train log triage with CPT. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220413T1800).
[18:00:04] <jouncebot>	 dancy and jnuche: I, the Bot under the Fountain, call upon thee, The Deployer, to do MediaWiki train - Utc-7+Utc-0 Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220413T1800).
[18:01:18] <wikibugs>	 (03CR) 10Razzi: [C: 03+2] wikireplicas: depool clouddb1015-16 [puppet] - 10https://gerrit.wikimedia.org/r/779918 (https://phabricator.wikimedia.org/T299480) (owner: 10Razzi)
[18:03:58] <dancy>	 o/
[18:06:52] <dancy>	 Train log triage will be tomorrow.
[18:06:57] <dancy>	 Rolling forward to group1.
[18:07:20] <wikibugs>	 (03PS1) 10Razzi: wikireplicas: fix depooling yaml [puppet] - 10https://gerrit.wikimedia.org/r/779919 (https://phabricator.wikimedia.org/T299480)
[18:07:31] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10jmads) All info is still the same.  Thanks!
[18:09:54] <wikibugs>	 (03CR) 10Razzi: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34822/console" [puppet] - 10https://gerrit.wikimedia.org/r/779919 (https://phabricator.wikimedia.org/T299480) (owner: 10Razzi)
[18:10:45] <wikibugs>	 (03CR) 10Razzi: [V: 03+1 C: 03+2] wikireplicas: fix depooling yaml [puppet] - 10https://gerrit.wikimedia.org/r/779919 (https://phabricator.wikimedia.org/T299480) (owner: 10Razzi)
[18:15:36] <wikibugs>	 (03PS1) 10Ahmon Dancy: group1 wikis to 1.39.0-wmf.7  refs T305213 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779921
[18:15:38] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+2] group1 wikis to 1.39.0-wmf.7  refs T305213 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779921 (owner: 10Ahmon Dancy)
[18:16:36] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.39.0-wmf.7  refs T305213 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779921 (owner: 10Ahmon Dancy)
[18:17:23] <logmsgbot>	 !log dancy@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.7  refs T305213
[18:17:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:17:27] <stashbot>	 T305213: 1.39.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T305213
[18:17:35] <dancy>	 I'm going to re-run that.
[18:19:09] <wikibugs>	 (03CR) 10Jdlrobson: Enable Table of Contents AB test on Beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779551 (https://phabricator.wikimedia.org/T302046) (owner: 10Nray)
[18:19:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[18:19:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[18:19:15] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[18:19:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[18:19:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:20:47] <logmsgbot>	 !log dancy@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.7  refs T305213
[18:20:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:28] <wikibugs>	 (03PS1) 10Zabe: Revert "Revert "Start writing to cuc_actor in guwwiki and shnwikivoyage"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779922 (https://phabricator.wikimedia.org/T233004)
[18:21:44] <logmsgbot>	 !log dancy@deploy1002 Synchronized php: group1 wikis to 1.39.0-wmf.7  refs T305213 (duration: 00m 56s)
[18:21:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:49] <wikibugs>	 (03CR) 10Zabe: [C: 04-1] "Needs https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/779912/ to safely be deployed on those wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779922 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[18:24:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[18:24:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:24:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[18:24:30] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[18:24:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:24:34] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[18:24:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:24:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:24:47] <wikibugs>	 (03PS1) 10Nray: Correct AB test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779923 (https://phabricator.wikimedia.org/T302046)
[18:25:40] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:26:00] <wikibugs>	 (03CR) 10Nray: Enable Table of Contents AB test on Beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779551 (https://phabricator.wikimedia.org/T302046) (owner: 10Nray)
[18:26:02] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:27:43] <wikibugs>	 (03CR) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[18:27:46] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.325 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:28:08] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 47966 bytes in 0.118 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:29:08] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts - https://phabricator.wikimedia.org/T304881 (10nskaggs) @Papaul By default for HA purposes, we include language to spread servers out when needed. However, given these machines are in dev, and...
[18:32:54] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[18:33:58] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1015.eqiad.wmnet with reason: Upgrade to bullseye
[18:34:01] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1015.eqiad.wmnet with reason: Upgrade to bullseye
[18:34:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:34:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:36:40] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.reimage for host clouddb1015.eqiad.wmnet with OS bullseye
[18:36:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:39:40] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1018 is CRITICAL: CRITICAL check_failover servers up 14 down 2: https://wikitech.wikimedia.org/wiki/HAProxy
[18:40:33] <RhinosF1>	 razzi: is that expected ^
[18:42:33] <wikibugs>	 (03CR) 10Herron: [C: 03+1] mx: use $domain_data rather than $domain for aliases (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/779504 (https://phabricator.wikimedia.org/T305962) (owner: 10JHathaway)
[18:44:38] <wikibugs>	 10SRE-Access-Requests, 10Data-Engineering, 10Generated Data Platform: Request to grant cparle and mfossati login to an-airflow1003.eqiad.wmne - https://phabricator.wikimedia.org/T306057 (10Ottomata)
[18:46:07] <wikibugs>	 10SRE-Access-Requests, 10Data-Engineering, 10Generated Data Platform: Request to grant cparle and mfossati login to an-airflow1003.eqiad.wmne - https://phabricator.wikimedia.org/T306057 (10Ottomata) Tagging SRE-Access-Requests for help in figuring out how best to fulfill this request.  Cormac and Marco are o...
[18:47:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24610 and previous config saved to /var/cache/conftool/dbconfig/20220413-184721-ladsgroup.json
[18:47:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:26] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[18:48:06] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1015.eqiad.wmnet with reason: host reimage
[18:48:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:01] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1015.eqiad.wmnet with reason: host reimage
[18:51:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:02] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] mx: use $domain_data rather than $domain for aliases [puppet] - 10https://gerrit.wikimedia.org/r/779504 (https://phabricator.wikimedia.org/T305962) (owner: 10JHathaway)
[18:53:20] <icinga-wm>	 RECOVERY - SSH on aqs1009.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:55:08] <wikibugs>	 (03PS1) 10Razzi: dbproxy: add clouddb sections to conftool [puppet] - 10https://gerrit.wikimedia.org/r/779926 (https://phabricator.wikimedia.org/T304478)
[18:55:57] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] team-search-platform: remove BlazegraphJvmQuakeWarnGC [alerts] - 10https://gerrit.wikimedia.org/r/779831 (https://phabricator.wikimedia.org/T293862) (owner: 10DCausse)
[18:56:35] <wikibugs>	 (03CR) 10Razzi: [V: 03+1] "PCC SUCCESS (DIFF 2 NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34823/console" [puppet] - 10https://gerrit.wikimedia.org/r/779926 (https://phabricator.wikimedia.org/T304478) (owner: 10Razzi)
[18:58:54] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10tox-wikimedia, and 2 others: Introduce Python code formatters usage - https://phabricator.wikimedia.org/T211750 (10jhathaway) Our we ready to consider running black on our puppet repo?
[19:00:12] <wikibugs>	 (03Merged) 10jenkins-bot: team-search-platform: remove BlazegraphJvmQuakeWarnGC [alerts] - 10https://gerrit.wikimedia.org/r/779831 (https://phabricator.wikimedia.org/T293862) (owner: 10DCausse)
[19:00:29] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Mail, 10Patch-For-Review: Exim emitting warnings about tainted filenames - https://phabricator.wikimedia.org/T305962 (10jhathaway) 05Open→03Resolved a:03jhathaway merged!
[19:01:39] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[19:02:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24611 and previous config saved to /var/cache/conftool/dbconfig/20220413-190226-ladsgroup.json
[19:02:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:04:15] <wikibugs>	 (03CR) 10Majavah: [C: 04-1] dbproxy: add clouddb sections to conftool (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/779926 (https://phabricator.wikimedia.org/T304478) (owner: 10Razzi)
[19:04:16] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1018 is OK: OK check_failover servers up 16 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
[19:06:40] <wikibugs>	 (03PS1) 10Andrew Bogott: OpenStack nova: change log level to 'debug' [puppet] - 10https://gerrit.wikimedia.org/r/779927
[19:08:04] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] OpenStack nova: change log level to 'debug' [puppet] - 10https://gerrit.wikimedia.org/r/779927 (owner: 10Andrew Bogott)
[19:09:36] <icinga-wm>	 PROBLEM - SSH on wtp1045.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:15:13] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1015.eqiad.wmnet with OS bullseye
[19:15:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24612 and previous config saved to /var/cache/conftool/dbconfig/20220413-191731-ladsgroup.json
[19:17:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:24:05] <wikibugs>	 (03CR) 10Clare Ming: [C: 03+2] Correct AB test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779923 (https://phabricator.wikimedia.org/T302046) (owner: 10Nray)
[19:25:35] <wikibugs>	 (03CR) 10Clare Ming: "whoops - sorry - got trigger happy before realizing it was config -- happy to deploy at next window which is in 30 mins" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779923 (https://phabricator.wikimedia.org/T302046) (owner: 10Nray)
[19:27:57] <wikibugs>	 (03CR) 10Nray: Correct AB test config (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779923 (https://phabricator.wikimedia.org/T302046) (owner: 10Nray)
[19:32:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24613 and previous config saved to /var/cache/conftool/dbconfig/20220413-193236-ladsgroup.json
[19:32:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[19:32:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[19:32:42] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[19:32:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:32:44] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[19:32:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[19:32:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:32:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:32:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:32:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24614 and previous config saved to /var/cache/conftool/dbconfig/20220413-193250-ladsgroup.json
[19:32:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:32:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:37:02] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] Correct AB test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779923 (https://phabricator.wikimedia.org/T302046) (owner: 10Nray)
[19:39:28] <wikibugs>	 (03CR) 10Clare Ming: [C: 03+2] Correct AB test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779923 (https://phabricator.wikimedia.org/T302046) (owner: 10Nray)
[19:40:38] <wikibugs>	 (03Merged) 10jenkins-bot: Correct AB test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779923 (https://phabricator.wikimedia.org/T302046) (owner: 10Nray)
[19:42:50] <wikibugs>	 (03PS7) 10Krinkle: List Kartographer static map exemptions and document+flip default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773883 (https://phabricator.wikimedia.org/T291736)
[19:45:25] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[19:45:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:45:28] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[19:45:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[19:45:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:45:33] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[19:45:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:45:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:17] <wikibugs>	 (03PS3) 10Phedenskog: grafana: provision JSON datasource [puppet] - 10https://gerrit.wikimedia.org/r/774380 (https://phabricator.wikimedia.org/T304583)
[19:49:50] <wikibugs>	 10ops-eqiad: Port with no description on access switch - https://phabricator.wikimedia.org/T306129 (10phaultfinder)
[19:55:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job calico-felix in k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:58:28] <wikibugs>	 (03PS1) 10Ssingh: dnsrecursor: refactor module (see detailed commit message) [puppet] - 10https://gerrit.wikimedia.org/r/779936
[20:00:04] <jouncebot>	 RoanKattouw, Urbanecm, and cjming: (Dis)respected human, time to deploy UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220413T2000). Please do the needful.
[20:00:04] <jouncebot>	 JSherman, koi, and nn1l2: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:09] <nn1l2>	 hi
[20:00:13] <jinxer-wm>	 (KubernetesCalicoDown) firing: kubestage2002.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[20:00:17] <koi>	 o/
[20:00:42] <urbanecm>	 hey
[20:00:44] <urbanecm>	 i can deploy today
[20:00:59] <cjming>	 ty!
[20:01:16] <JSherman>	 Hello, I'm here!
[20:01:51] <urbanecm>	 hello JSherman and cjming 
[20:02:18] <wikibugs>	 (03PS2) 10Urbanecm: Update enwiki surveys on beta for QA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779499 (https://phabricator.wikimedia.org/T294363) (owner: 10Jsn.sherman)
[20:02:23] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Update enwiki surveys on beta for QA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779499 (https://phabricator.wikimedia.org/T294363) (owner: 10Jsn.sherman)
[20:03:03] <urbanecm>	 JSherman: your patch should be auto-deployed within ~30 minutes
[20:03:24] <JSherman>	 Thanks! I'll keep an eye, out urbanecm.
[20:03:24] <wikibugs>	 (03Merged) 10jenkins-bot: Update enwiki surveys on beta for QA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779499 (https://phabricator.wikimedia.org/T294363) (owner: 10Jsn.sherman)
[20:03:31] <wikibugs>	 (03PS2) 10Urbanecm: Optimize logo for Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779865 (https://phabricator.wikimedia.org/T306037) (owner: 10Stang)
[20:03:46] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Optimize logo for Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779865 (https://phabricator.wikimedia.org/T306037) (owner: 10Stang)
[20:04:07] <urbanecm>	 koi: your patch is up next :). will let you know once it can be tested.
[20:04:22] <koi>	 got it, thanks
[20:04:45] <wikibugs>	 (03Merged) 10jenkins-bot: Optimize logo for Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779865 (https://phabricator.wikimedia.org/T306037) (owner: 10Stang)
[20:05:15] <urbanecm>	 koi: your patch is at mwdebug1001
[20:05:18] <urbanecm>	 can you have a look?
[20:05:23] <koi>	 sure
[20:05:26] <wikibugs>	 (03PS2) 10Urbanecm: fawiki: Change logo for 900K milestone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779858 (https://phabricator.wikimedia.org/T306030) (owner: 104nn1l2)
[20:05:40] <cjming>	 urbanecm: out of curiosity, is there more to do beyond steps in deployment commands https://deploy-commands.toolforge.org/bacc/779865 for files/images? i.e. purge caches for said files?
[20:05:55] <koi>	 urbanecm, lgtm
[20:06:06] <urbanecm>	 cjming: yes. you need to run `purgeList.php` (accepts list of URIs at stdin)
[20:06:22] <urbanecm>	 note that the canonical domain for /static is en.wikipedia.org
[20:06:47] <cjming>	 cool - gtk
[20:06:58] <urbanecm>	 so you'd run sth like `echo 'https://en.wikipedia.org/static/images/project-logos/cswiki.png' | mwscript purgeList.php` for each static resource that was changed
[20:07:07] <urbanecm>	 koi: thanks, syncing
[20:08:15] <wikibugs>	 (03PS2) 10Ssingh: dnsrecursor: refactor module (see detailed commit message) [puppet] - 10https://gerrit.wikimedia.org/r/779936
[20:08:41] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] fawiki: Change logo for 900K milestone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779858 (https://phabricator.wikimedia.org/T306030) (owner: 104nn1l2)
[20:09:13] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized static/images/project-logos/: 076e6ef: Optimize logo for Wikispecies (T306037; 1/2) (duration: 00m 55s)
[20:09:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:18] <stashbot>	 T306037: Optimize logo for Wikispecies - https://phabricator.wikimedia.org/T306037
[20:09:54] <wikibugs>	 (03Merged) 10jenkins-bot: fawiki: Change logo for 900K milestone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779858 (https://phabricator.wikimedia.org/T306030) (owner: 104nn1l2)
[20:10:07] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/logos.php: 076e6ef: Optimize logo for Wikispecies (T306037; 2/2) (duration: 00m 53s)
[20:10:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:10:40] <urbanecm>	 nn1l2: your patch is at mwdebug1001
[20:10:42] <urbanecm>	 can you have a look?
[20:10:45] <nn1l2>	 ok
[20:10:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:10:50] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:10:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:10:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:10:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:10:55] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:10:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:10:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:11:05] <wikibugs>	 (03PS3) 10Ssingh: dnsrecursor: refactor module (see detailed commit message) [puppet] - 10https://gerrit.wikimedia.org/r/779936
[20:11:11] <nn1l2>	 LGTM
[20:11:18] <urbanecm>	 syncing
[20:12:57] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized static/images/mobile/copyright/wikipedia-fa-900K.svg: dfe0b9c: fawiki: Change logo for 900K milestone (T306030; 1/2) (duration: 00m 56s)
[20:13:00] <wikibugs>	 (03PS2) 10Razzi: dbproxy: add clouddb sections to conftool [puppet] - 10https://gerrit.wikimedia.org/r/779926 (https://phabricator.wikimedia.org/T304478)
[20:13:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:13:02] <stashbot>	 T306030: Change the logo of Farsi Wikipedia for 900K milestone - https://phabricator.wikimedia.org/T306030
[20:13:15] <wikibugs>	 10SRE, 10MediaWiki-REST-API, 10Traffic-Icebox: Route requests to the REST MediaWiki API to the api cluster - https://phabricator.wikimedia.org/T263729 (10BBlack)
[20:13:52] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: dfe0b9c: fawiki: Change logo for 900K milestone (T306030; 2/2) (duration: 00m 54s)
[20:13:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:13:56] <wikibugs>	 10SRE, 10Traffic, 10serviceops, 10Platform Team Workboards (Green): MW REST API should be routed to api_appserver MW cluster - https://phabricator.wikimedia.org/T268043 (10BBlack)
[20:14:20] <urbanecm>	 nn1l2: should be all done
[20:14:22] <urbanecm>	 anything else, anyone?
[20:14:42] <nn1l2>	 thanks
[20:15:08] <urbanecm>	 np
[20:15:10] <koi>	 urbanecm, the logo is still the previous version https://species.wikimedia.org/static/images/project-logos/specieswiki-2x.png
[20:15:22] <koi>	 is the syncing completed?
[20:15:27] <urbanecm>	 it should be
[20:15:29] <urbanecm>	 but let me double check
[20:15:57] <urbanecm>	 koi: i purged it again, and now it seems to work
[20:15:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:15:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:16:01] <urbanecm>	 perhaps i purged a bit early
[20:16:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:16:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:16:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:36] <koi>	 hmm, still not working in my place 0 0
[20:17:38] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Upgrade to bullseye
[20:17:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:17:40] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Upgrade to bullseye
[20:17:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:18:28] <urbanecm>	 koi: did you try to purge your client side cache?
[20:18:34] <urbanecm>	 (ctrl+shift+r should do the trick)
[20:18:52] <koi>	 yeah, I even tried another browser
[20:19:09] <urbanecm>	 koi: do you try accessing https://species.wikimedia.org/static/images/project-logos/specieswiki-2x.png directly?
[20:19:35] <koi>	 yes, it is still the old version
[20:20:08] <urbanecm>	 interesting...
[20:20:14] <urbanecm>	 koi: i suggest to wait ~48 hours
[20:20:29] <urbanecm>	 if it's still broken then, please let me know and we can investigate further
[20:21:13] <koi>	 urbanecm, got it, hope the logo will get changed soon
[20:21:18] <urbanecm>	 let's see :)
[20:21:30] <urbanecm>	 it does work on my end, so that indicates it's not a server-side problem
[20:21:36] <wikibugs>	 (03CR) 10Razzi: [V: 03+1] dbproxy: add clouddb sections to conftool (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/779926 (https://phabricator.wikimedia.org/T304478) (owner: 10Razzi)
[20:23:27] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.reimage for host clouddb1016.eqiad.wmnet with OS bullseye
[20:23:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:26:47] <wikibugs>	 (03CR) 10Ssingh: "PCC error on dns1001 results from a parameter mismatch:" [puppet] - 10https://gerrit.wikimedia.org/r/779936 (owner: 10Ssingh)
[20:26:53] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1018 is CRITICAL: CRITICAL check_failover servers up 14 down 2: https://wikitech.wikimedia.org/wiki/HAProxy
[20:30:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24615 and previous config saved to /var/cache/conftool/dbconfig/20220413-203030-ladsgroup.json
[20:30:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:38] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:31:23] <wikibugs>	 10SRE-OnFire, 10Wikidata, 10wdwb-tech, 10Discovery-Search (Current work), and 3 others: Only generate maxlag from pooled query service servers. - https://phabricator.wikimedia.org/T238751 (10Ladsgroup) Do we really need this now that everything is on flink and fancy?
[20:32:46] <wikibugs>	 (03PS4) 10Ssingh: dnsrecursor: refactor module (see detailed commit message) [puppet] - 10https://gerrit.wikimedia.org/r/779936
[20:34:36] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops, 10Release-Engineering-Team (Radar): Need a service account on deploy servers - https://phabricator.wikimedia.org/T303857 (10dancy)
[20:34:47] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1016.eqiad.wmnet with reason: host reimage
[20:34:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:35:32] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops, 10Release-Engineering-Team (Radar): Need a service account on deploy servers for automated train pre-sync operations - https://phabricator.wikimedia.org/T303857 (10dancy)
[20:36:17] <logmsgbot>	 !log razzi@cumin1001 END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on clouddb1016.eqiad.wmnet with reason: host reimage
[20:36:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:41:02] <JSherman>	 urbanecm: It looks like the QuickSurveys extension is not enabled on enwiki beta. I couldn't get any of the configured surveys to load (even those that were there already), so I checked Special:version and it's not there. I found where it's enabled on some wikis in InitializeSettings.php with wmgUseQuickSurveys, but I couldn't find that set in
[20:41:03] <JSherman>	 InitializeSettings-labs.php. I verified that is working on eswiki, which has $wmgUseQuickSurveys set to true in InitializeSettings.php. To enable this in enwiki beta (but not prod), would I add wmgUseQuickSurveys to InitializeSettings-labs.php and just set it true for enwiki?
[20:41:37] <JSherman>	 verified it was working on *eswiki beta*
[20:44:40] <urbanecm>	 JSherman: yes. Just adding it to is-labs should do the trick. 
[20:44:53] <wikibugs>	 (03PS1) 10Andrew Bogott: Revert "OpenStack nova: change log level to 'debug'" [puppet] - 10https://gerrit.wikimedia.org/r/779939
[20:45:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24616 and previous config saved to /var/cache/conftool/dbconfig/20220413-204535-ladsgroup.json
[20:45:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:10] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Revert "OpenStack nova: change log level to 'debug'" [puppet] - 10https://gerrit.wikimedia.org/r/779939 (owner: 10Andrew Bogott)
[20:46:43] <wikibugs>	 (03PS5) 10Ssingh: dnsrecursor: refactor module (see detailed commit message) [puppet] - 10https://gerrit.wikimedia.org/r/779936
[20:49:10] <wikibugs>	 (03PS1) 10Jsn.sherman: Enable QuickSurveys on enwiki beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779940 (https://phabricator.wikimedia.org/T294363)
[20:51:24] <JSherman>	 urbanecm: mmk, I worked up a change for that; I justadded wmgUseQuickSurveys to IS-labs with enwiki => as the only setting inside
[20:51:25] <JSherman>	 https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/779940
[20:51:49] <JSherman>	 *enwiki => true*
[20:52:18] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1016.eqiad.wmnet with OS bullseye
[20:52:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:52:21] <wikibugs>	 10SRE, 10Performance-Team, 10Traffic, 10serviceops: Potential navtiming_responseStart regression as of 13 Mar 2022 - https://phabricator.wikimedia.org/T303782 (10Krinkle) 05Open→03Resolved There seems to be an upward trend that is continying having possibly added around ~25ms (5% of 500ms) on both the...
[20:53:03] <urbanecm>	 Great :)
[20:53:51] <wikibugs>	 (03PS1) 10Razzi: wikireplicas: depool clouddb1017-1020 and repool 15 and 16 [puppet] - 10https://gerrit.wikimedia.org/r/779941 (https://phabricator.wikimedia.org/T304478)
[20:55:12] <wikibugs>	 (03CR) 10Razzi: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34831/console" [puppet] - 10https://gerrit.wikimedia.org/r/779941 (https://phabricator.wikimedia.org/T304478) (owner: 10Razzi)
[20:56:17] <JSherman>	 urbanecm: is it possible to also deploy 779940 as well, or do I need to schedule for another day?
[20:56:27] <urbanecm>	 let's do it
[20:56:32] <urbanecm>	 (today)
[20:56:37] <JSherman>	 Ok!
[20:56:45] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Enable QuickSurveys on enwiki beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779940 (https://phabricator.wikimedia.org/T294363) (owner: 10Jsn.sherman)
[20:56:47] <urbanecm>	 let's see :)
[20:57:25] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1018 is OK: OK check_failover servers up 16 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
[20:57:36] <wikibugs>	 (03Merged) 10jenkins-bot: Enable QuickSurveys on enwiki beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779940 (https://phabricator.wikimedia.org/T294363) (owner: 10Jsn.sherman)
[21:00:13] <wikibugs>	 (03CR) 10Razzi: [V: 03+1 C: 03+2] wikireplicas: depool clouddb1017-1020 and repool 15 and 16 [puppet] - 10https://gerrit.wikimedia.org/r/779941 (https://phabricator.wikimedia.org/T304478) (owner: 10Razzi)
[21:00:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24617 and previous config saved to /var/cache/conftool/dbconfig/20220413-210041-ladsgroup.json
[21:00:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:30] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[21:01:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:33] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[21:01:34] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[21:01:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[21:01:38] <wikibugs>	 (03PS2) 10Razzi: wikireplicas: depool clouddb1017-1020 and repool 15 and 16 [puppet] - 10https://gerrit.wikimedia.org/r/779941 (https://phabricator.wikimedia.org/T304478)
[21:01:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:02:14] <wikibugs>	 (03PS1) 10Krinkle: static: Remove `/static/current` symlink [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779944 (https://phabricator.wikimedia.org/T302465)
[21:02:44] <wikibugs>	 (03Abandoned) 10Razzi: wikireplicas: depool clouddb1017-1020 and repool 15 and 16 [puppet] - 10https://gerrit.wikimedia.org/r/779941 (https://phabricator.wikimedia.org/T304478) (owner: 10Razzi)
[21:02:56] <urbanecm>	 JSherman: sorry, got distracted. should be auto-deployed soon(ish) to beta
[21:02:57] <urbanecm>	 (as bfore :))
[21:03:15] <wikibugs>	 (03CR) 10Krinkle: "Health checks were the last remaining reference, which has been removed/updated in Puppet with I3cd083bcadfa75da40." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779944 (https://phabricator.wikimedia.org/T302465) (owner: 10Krinkle)
[21:03:16] <urbanecm>	 *before
[21:03:25] <wikibugs>	 (03PS1) 10Razzi: wikireplicas: depool clouddb1017-1020 and repool 15 and 16 [puppet] - 10https://gerrit.wikimedia.org/r/779945 (https://phabricator.wikimedia.org/T304478)
[21:03:48] <JSherman>	 urbanecm: No worries, thanks for the bonus deploy!
[21:03:53] <urbanecm>	 happy to help!
[21:04:59] <wikibugs>	 (03CR) 10Razzi: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34833/console" [puppet] - 10https://gerrit.wikimedia.org/r/779945 (https://phabricator.wikimedia.org/T304478) (owner: 10Razzi)
[21:06:27] <JSherman>	 urbanecm: I verified all 8 surveys are now up and running on enwiki beta; thank you 1 000 000!
[21:06:34] <urbanecm>	 great!
[21:08:29] <wikibugs>	 (03CR) 10Razzi: [V: 03+1 C: 03+2] wikireplicas: depool clouddb1017-1020 and repool 15 and 16 [puppet] - 10https://gerrit.wikimedia.org/r/779945 (https://phabricator.wikimedia.org/T304478) (owner: 10Razzi)
[21:10:00] <wikibugs>	 (03PS1) 10Ladsgroup: MigrateLinksTable: Avoid dynamic loading of list columns to select [core] (wmf/1.39.0-wmf.7) - 10https://gerrit.wikimedia.org/r/779877 (https://phabricator.wikimedia.org/T299424)
[21:10:07] <Amir1>	 jouncebot: nowandnext
[21:10:07] <jouncebot>	 No deployments scheduled for the next 8 hour(s) and 49 minute(s)
[21:10:07] <jouncebot>	 In 8 hour(s) and 49 minute(s): Primary database switchover (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220414T0600)
[21:10:13] <Amir1>	 nice
[21:10:25] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] MigrateLinksTable: Avoid dynamic loading of list columns to select [core] (wmf/1.39.0-wmf.7) - 10https://gerrit.wikimedia.org/r/779877 (https://phabricator.wikimedia.org/T299424) (owner: 10Ladsgroup)
[21:15:04] <wikibugs>	 (03PS1) 10Ladsgroup: admin: Fix Tran's real name [puppet] - 10https://gerrit.wikimedia.org/r/779947
[21:15:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24618 and previous config saved to /var/cache/conftool/dbconfig/20220413-211546-ladsgroup.json
[21:15:48] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[21:15:49] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[21:15:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:15:52] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[21:15:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:15:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:16:17] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] admin: Fix Tran's real name [puppet] - 10https://gerrit.wikimedia.org/r/779947 (owner: 10Ladsgroup)
[21:16:45] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1017.eqiad.wmnet with reason: Upgrade to bullseye
[21:16:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:16:47] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1017.eqiad.wmnet with reason: Upgrade to bullseye
[21:16:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:17:22] <wikibugs>	 10SRE, 10ops-ulsfo, 10DC-Ops, 10Infrastructure-Foundations, 10netops: (Need By: TBD) rack/setup/install new mr1-ulsfo - https://phabricator.wikimedia.org/T294314 (10RobH)
[21:18:19] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.reimage for host clouddb1017.eqiad.wmnet with OS bullseye
[21:18:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:21:29] <wikibugs>	 10SRE, 10ops-ulsfo, 10DC-Ops, 10Infrastructure-Foundations, 10netops: (Need By: TBD) rack/setup/install new mr1-ulsfo - https://phabricator.wikimedia.org/T294314 (10RobH) a:05RobH→03ayounsi Arzhel,  When we set this up, I recall you saying you didn't want to move the connections in netbox, and wanted...
[21:22:09] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1019 is CRITICAL: CRITICAL check_failover servers up 14 down 2: https://wikitech.wikimedia.org/wiki/HAProxy
[21:29:07] <wikibugs>	 (03Merged) 10jenkins-bot: MigrateLinksTable: Avoid dynamic loading of list columns to select [core] (wmf/1.39.0-wmf.7) - 10https://gerrit.wikimedia.org/r/779877 (https://phabricator.wikimedia.org/T299424) (owner: 10Ladsgroup)
[21:29:48] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1017.eqiad.wmnet with reason: host reimage
[21:29:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:39] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.39.0-wmf.7/maintenance/migrateLinksTable.php: Backport: [[gerrit:779877|MigrateLinksTable: Avoid dynamic loading of list columns to select (T299424)]] (duration: 00m 55s)
[21:30:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:43] <stashbot>	 T299424: Run maintenance script backfilling tl_title_id - https://phabricator.wikimedia.org/T299424
[21:32:49] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1017.eqiad.wmnet with reason: host reimage
[21:32:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:05] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[21:37:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[21:37:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[21:37:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[21:37:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:01] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1018.eqiad.wmnet with reason: Upgrade to bullseye
[21:41:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:03] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1018.eqiad.wmnet with reason: Upgrade to bullseye
[21:41:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:42:39] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.reimage for host clouddb1018.eqiad.wmnet with OS bullseye
[21:42:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:47:21] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1019.eqiad.wmnet with reason: Upgrade to bullseye
[21:47:23] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1019.eqiad.wmnet with reason: Upgrade to bullseye
[21:47:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:47:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:47:53] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10tox-wikimedia, and 2 others: Introduce Python code formatters usage - https://phabricator.wikimedia.org/T211750 (10Volans) >>! In T211750#7853334, @jhathaway wrote: > Our we ready to consider running black on our puppet repo?  I'm not sure, personally I t...
[21:47:59] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1017.eqiad.wmnet with OS bullseye
[21:48:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:48:19] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[21:48:54] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.reimage for host clouddb1019.eqiad.wmnet with OS bullseye
[21:48:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:50:39] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/779944 (https://phabricator.wikimedia.org/T302465) (owner: 10Krinkle)
[21:51:11] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Upgrade to bullseye
[21:51:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:14] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Upgrade to bullseye
[21:51:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:17] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul) @akosiaris thanks will move them tomorrow.
[21:53:55] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1018.eqiad.wmnet with reason: host reimage
[21:53:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:12] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.reimage for host clouddb1020.eqiad.wmnet with OS bullseye
[21:54:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:57:22] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: host reimage
[21:57:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:57:54] <wikibugs>	 10SRE-swift-storage, 10UploadWizard, 10Unstewarded-production-error, 10Wikimedia-production-error: "Could not store upload in the stash (UploadStashFileException)" for 2.4 GiB TIF file - https://phabricator.wikimedia.org/T285341 (10Krinkle) 05Open→03Resolved a:03Krinkle Likedly caused by <https://wik...
[21:58:27] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[21:59:30] <wikibugs>	 10SRE-swift-storage: Test Commons doesn't show any images - https://phabricator.wikimedia.org/T306139 (10Ladsgroup)
[21:59:48] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1019.eqiad.wmnet with reason: host reimage
[21:59:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:01:48] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
[22:01:49] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
[22:01:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:01:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:03:13] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1019.eqiad.wmnet with reason: host reimage
[22:03:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:04:20] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: Upgrade to bullseye
[22:04:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:04:22] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: Upgrade to bullseye
[22:04:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:05:18] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1020.eqiad.wmnet with reason: host reimage
[22:05:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:06:31] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: Upgrade to bullseye
[22:06:33] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: Upgrade to bullseye
[22:06:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:06:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:06:42] <icinga-wm>	 RECOVERY - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is OK: 1 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[22:07:00] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.reimage for host clouddb1021.eqiad.wmnet with OS bullseye
[22:07:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:44] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1020.eqiad.wmnet with reason: host reimage
[22:08:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:11:19] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[22:11:57] <icinga-wm>	 RECOVERY - SSH on wtp1045.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:12:45] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1018.eqiad.wmnet with OS bullseye
[22:12:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:14:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job calico-felix in k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:15:38] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1019.eqiad.wmnet with OS bullseye
[22:15:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:19:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job calico-felix in k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:23:53] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1019 is OK: OK check_failover servers up 16 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
[22:23:59] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul) @Andrew @aborrero I have listed 14 servers that we will have to move into rack b1 4 of those are not in row B and using Public IP. I think will be bette...
[22:24:10] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence (Consultation): codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[22:25:33] <wikibugs>	 10SRE, 10ops-codfw: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 (10Papaul)
[22:30:32] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1020.eqiad.wmnet with OS bullseye
[22:30:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:31:10] <logmsgbot>	 !log razzi@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1021.eqiad.wmnet with OS bullseye
[22:31:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:32:54] <jinxer-wm>	 (NodeTextfileStale) firing: (3) Stale textfile for elastic1075:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[22:47:42] <wikibugs>	 (03CR) 10Krinkle: Add "db-mainstash" entry to $wgObjectCaches (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/752807 (https://phabricator.wikimedia.org/T212129) (owner: 10Aaron Schulz)
[22:56:05] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[22:56:07] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[22:56:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24620 and previous config saved to /var/cache/conftool/dbconfig/20220413-225612-ladsgroup.json
[22:56:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:15] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[22:56:58] <wikibugs>	 10SRE, 10Thumbor, 10serviceops, 10Patch-For-Review, and 2 others: Run latest Thumbor on Docker with Buster + Python 3 - https://phabricator.wikimedia.org/T267327 (10Krinkle)
[23:01:54] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[23:31:41] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s2 on db2101 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2104.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:31:55] <icinga-wm>	 PROBLEM - MariaDB Replica IO: x1 on db2101 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2096.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:32:07] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s5 on db2101 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2123.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:36:09] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s2 on db2101 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:36:21] <icinga-wm>	 RECOVERY - MariaDB Replica IO: x1 on db2101 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:36:33] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s5 on db2101 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:52:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24622 and previous config saved to /var/cache/conftool/dbconfig/20220413-235235-ladsgroup.json
[23:52:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:52:39] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565