[00:02:11] <icinga-wm>	 PROBLEM - SSH on ml-serve-ctrl1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[00:03:32] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releaser for MarkAHershberger - https://phabricator.wikimedia.org/T302287 (10Dzahn) 05Stalled→03In progress
[00:04:13] <icinga-wm>	 RECOVERY - SSH on ml-serve-ctrl1002 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[00:05:29] <icinga-wm>	 PROBLEM - Check systemd state on ml-serve-ctrl1002 is CRITICAL: CRITICAL - degraded: The following units failed: logrotate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:09:28] <wikibugs>	 (03PS1) 10Dzahn: admin: reactivate account for Mark Hershberger, add to Mediawiki releasers [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287)
[00:10:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] admin: reactivate account for Mark Hershberger, add to Mediawiki releasers [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287) (owner: 10Dzahn)
[00:12:02] <wikibugs>	 (03CR) 10MarkAHershberger: [C: 03+1] admin: reactivate account for Mark Hershberger, add to Mediawiki releasers [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287) (owner: 10Dzahn)
[00:12:55] <wikibugs>	 (03PS2) 10Dzahn: admin: reactivate account for Mark Hershberger, add to Mediawiki releasers [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287)
[00:13:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] admin: reactivate account for Mark Hershberger, add to Mediawiki releasers [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287) (owner: 10Dzahn)
[00:14:03] <wikibugs>	 (03CR) 10Dzahn: "sorry, PS1 failed CI because of trailing whitespace. @MarkAHershberger now:)" [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287) (owner: 10Dzahn)
[00:15:28] <wikibugs>	 (03PS3) 10Dzahn: admin: reactivate account for Mark Hershberger, add to Mediawiki releasers [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287)
[00:16:36] <wikibugs>	 (03CR) 10Dzahn: "@MarkAHershberger I did not like your key because it was missing the prefix like "ssh-ed25519" or "ssh-rsa". I am guessing it is "ssh-ed25" [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287) (owner: 10Dzahn)
[00:16:50] <wikibugs>	 (03CR) 10Dzahn: "s/I/it (CI did not like the key)" [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287) (owner: 10Dzahn)
[00:30:33] <icinga-wm>	 PROBLEM - SSH on db2090.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:34:55] <icinga-wm>	 RECOVERY - SSH on wtp1026.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:36:59] <jinxer-wm>	 (KubernetesCalicoDown) firing: ml-serve-ctrl1002.eqiad.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[00:38:53] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase2027 - https://phabricator.wikimedia.org/T301399 (10Papaul) getting the message below during install ` reuse-parts: Recipe device matching failed │                  │ ERROR: =dev=md0 matches zero devices       │...
[00:38:58] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on ml-serve1003:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:39:10] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2027.codfw.wmnet with OS buster
[00:39:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:39:15] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase2027 - https://phabricator.wikimedia.org/T301399 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host restbase2027.codfw.wmnet with OS buster executed with errors: - restbase20...
[00:45:07] <wikibugs>	 (03CR) 10Razzi: [C: 03+2] karapace: remove Type=notify [puppet] - 10https://gerrit.wikimedia.org/r/773387 (https://phabricator.wikimedia.org/T301565) (owner: 10Razzi)
[00:46:26] <icinga-wm>	 PROBLEM - SSH on ml-serve-ctrl1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[00:59:19] <wikibugs>	 (03PS1) 10Andrew Bogott: Make cloudvirt1047 a virt node [puppet] - 10https://gerrit.wikimedia.org/r/773667 (https://phabricator.wikimedia.org/T293391)
[01:01:18] <wikibugs>	 (03PS2) 10Andrew Bogott: Make cloudvirt1047 a virt node [puppet] - 10https://gerrit.wikimedia.org/r/773667 (https://phabricator.wikimedia.org/T293391)
[01:02:16] <wikibugs>	 10SRE, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10Xeno_WMF) Hello, I believe I have encountered this bug here: https://meta.wikimedia.org/w/index.php?title=Special:Log&logid=471164...
[01:02:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Make cloudvirt1047 a virt node [puppet] - 10https://gerrit.wikimedia.org/r/773667 (https://phabricator.wikimedia.org/T293391) (owner: 10Andrew Bogott)
[01:03:58] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on ml-serve1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[01:08:58] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on ml-serve1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[01:10:43] <icinga-wm>	 PROBLEM - Check systemd state on ml-serve-ctrl1002 is CRITICAL: CRITICAL - degraded: The following units failed: logrotate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:11:39] <icinga-wm>	 RECOVERY - SSH on ml-serve-ctrl1002 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[01:11:58] <jinxer-wm>	 (KubernetesCalicoDown) resolved: ml-serve-ctrl1002.eqiad.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[01:13:58] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: (2) rsyslog on ml-serve1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[01:24:10] <wikibugs>	 (03PS1) 10Razzi: sre.wikireplicas.update-views: add more options [cookbooks] - 10https://gerrit.wikimedia.org/r/773670 (https://phabricator.wikimedia.org/T297026)
[01:24:17] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:27:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sre.wikireplicas.update-views: add more options [cookbooks] - 10https://gerrit.wikimedia.org/r/773670 (https://phabricator.wikimedia.org/T297026) (owner: 10Razzi)
[01:30:47] <icinga-wm>	 RECOVERY - SSH on db2090.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:35:49] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:38:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:38:57] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q2:(Need By: TBD) rack/setup/install cloudvirt1047.eqiad.wmnet - https://phabricator.wikimedia.org/T293391 (10Andrew) I needed to enable virtualization in the bios but now this host is in service and seems fine.  thanks @papaul!
[01:43:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:46:55] <wikibugs>	 (03PS2) 10Razzi: sre.wikireplicas.update-views: add more options [cookbooks] - 10https://gerrit.wikimedia.org/r/773670 (https://phabricator.wikimedia.org/T297026)
[01:50:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sre.wikireplicas.update-views: add more options [cookbooks] - 10https://gerrit.wikimedia.org/r/773670 (https://phabricator.wikimedia.org/T297026) (owner: 10Razzi)
[02:54:48] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, and 2 others: cloudvirt1016.eqiad.wmnet and cloudvirt1017.eqiad.wmnet fail to PXE boot - https://phabricator.wikimedia.org/T303296 (10Andrew) 05Open→03Resolved
[02:54:53] <wikibugs>	 (03CR) 10NguoiDungKhongDinhDanh: "Since you claimed T303579, would you mind review this first patch of mine? Thanks a lot." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773320 (https://phabricator.wikimedia.org/T303579) (owner: 10NguoiDungKhongDinhDanh)
[02:59:33] <wikibugs>	 (03CR) 10AntiCompositeNumber: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773320 (https://phabricator.wikimedia.org/T303579) (owner: 10NguoiDungKhongDinhDanh)
[03:01:18] <wikibugs>	 (03CR) 10AntiCompositeNumber: "Please write an informative commit message that explains what is being changed and why. https://www.mediawiki.org/wiki/Gerrit/Commit_messa" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773320 (https://phabricator.wikimedia.org/T303579) (owner: 10NguoiDungKhongDinhDanh)
[03:28:57] <icinga-wm>	 PROBLEM - SSH on ml-serve-ctrl1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[03:29:58] <jinxer-wm>	 (KubernetesCalicoDown) firing: ml-serve-ctrl1001.eqiad.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[03:33:17] <icinga-wm>	 RECOVERY - SSH on ml-serve-ctrl1001 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[03:33:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job k8s-api in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:34:58] <jinxer-wm>	 (KubernetesCalicoDown) resolved: ml-serve-ctrl1001.eqiad.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[03:38:13] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:57:37] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[04:22:43] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.06 ms
[05:30:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1134 for testing', diff saved to https://phabricator.wikimedia.org/P23053 and previous config saved to /var/cache/conftool/dbconfig/20220325-053037-marostegui.json
[05:30:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:30:56] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2087: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/773693
[05:36:42] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db2087: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/773693 (owner: 10Marostegui)
[05:38:32] <wikibugs>	 (03PS1) 10Marostegui: db1134: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/773676 (https://phabricator.wikimedia.org/T304626)
[05:40:08] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1134: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/773676 (https://phabricator.wikimedia.org/T304626) (owner: 10Marostegui)
[05:40:33] <wikibugs>	 (03CR) 10Marostegui: "That sounds good, I will merge this now though so we have it done here as well." [software] - 10https://gerrit.wikimedia.org/r/773440 (https://phabricator.wikimedia.org/T303605) (owner: 10Marostegui)
[05:40:35] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] switchover-tmpl.sh: Add "Affected wikis" field [software] - 10https://gerrit.wikimedia.org/r/773440 (https://phabricator.wikimedia.org/T303605) (owner: 10Marostegui)
[05:46:59] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[05:47:01] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[05:47:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:47:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23054 and previous config saved to /var/cache/conftool/dbconfig/20220325-054705-marostegui.json
[05:47:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:47:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:47:11] <stashbot>	 T302658: globaluser table schema changes (March 2022) - https://phabricator.wikimedia.org/T302658
[05:52:01] <wikibugs>	 (03CR) 10Marostegui: Add fix_user_varbinaries_T298565.py (031 comment) [software/schema-changes] - 10https://gerrit.wikimedia.org/r/773655 (https://phabricator.wikimedia.org/T298565) (owner: 10Ladsgroup)
[05:55:21] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] filtered_tables.txt: remove gu_enabled and gu_enabled_method columns [puppet] - 10https://gerrit.wikimedia.org/r/773616 (https://phabricator.wikimedia.org/T303266) (owner: 10Zabe)
[06:07:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P23055 and previous config saved to /var/cache/conftool/dbconfig/20220325-060723-marostegui.json
[06:07:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:09:30] <icinga-wm>	 PROBLEM - LVS zotero eqiad port 4969/tcp - Zotero- zotero.svc.eqiad.wmnet IPv4 #page on zotero.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[06:11:32] <icinga-wm>	 RECOVERY - LVS zotero eqiad port 4969/tcp - Zotero- zotero.svc.eqiad.wmnet IPv4 #page on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 196 bytes in 1.012 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[06:18:06] <icinga-wm>	 PROBLEM - LVS zotero eqiad port 4969/tcp - Zotero- zotero.svc.eqiad.wmnet IPv4 #page on zotero.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[06:18:49] <Amir1>	 On phone but here
[06:19:00] <Amir1>	 Will soon get to laptop
[06:25:56] <_joe_>	 taking a look now
[06:28:01] <Amir1>	 Thanks
[06:28:31] <wikibugs>	 (03PS2) 10Ladsgroup: Add fix_user_varbinaries_T298565.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/773655 (https://phabricator.wikimedia.org/T298565)
[06:28:46] <wikibugs>	 (03CR) 10Ladsgroup: Add fix_user_varbinaries_T298565.py (031 comment) [software/schema-changes] - 10https://gerrit.wikimedia.org/r/773655 (https://phabricator.wikimedia.org/T298565) (owner: 10Ladsgroup)
[06:29:20] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Add fix_user_varbinaries_T298565.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/773655 (https://phabricator.wikimedia.org/T298565) (owner: 10Ladsgroup)
[06:29:46] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add fix_user_varbinaries_T298565.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/773655 (https://phabricator.wikimedia.org/T298565) (owner: 10Ladsgroup)
[06:29:51] <marostegui>	 !log dbmaint s4@eqiad T300775
[06:29:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:29:57] <stashbot>	 T300775: Add tl_target_id column to templatelinks - https://phabricator.wikimedia.org/T300775
[06:30:01] <_joe_>	 Amir1: if you want to take a look too, basically we had a huge cpu spike
[06:30:10] <wikibugs>	 (03Merged) 10jenkins-bot: Add fix_user_varbinaries_T298565.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/773655 (https://phabricator.wikimedia.org/T298565) (owner: 10Ladsgroup)
[06:31:24] <_joe_>	 !log deleting a couple zotero pods with excessive number of restarts
[06:31:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:44] <_joe_>	 not sure why the zotero page hasn't rec overred
[06:33:37] <Amir1>	 it might take a while? 
[06:33:56] <_joe_>	 no, it actually looks like it's just not working
[06:37:30] <_joe_>	 Amir1: basically what I did was
[06:37:43] <_joe_>	 look at pod details here https://grafana.wikimedia.org/d/-D2KNUEGk/kubernetes-pod-details?orgId=1&var-datasource=eqiad%20prometheus%2Fk8s&var-namespace=zotero&var-pod=All
[06:37:55] <_joe_>	 as root
[06:37:59] <_joe_>	 kube_env admin eqiad
[06:38:14] <_joe_>	 kubectl -n zotero delete pod <pod-name>
[06:39:53] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:41:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[06:41:34] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[06:41:35] <icinga-wm>	 PROBLEM - Host cp1090.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[06:41:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:41:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23056 and previous config saved to /var/cache/conftool/dbconfig/20220325-064139-ladsgroup.json
[06:41:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:41:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:41:45] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[06:41:52] <icinga-wm>	 RECOVERY - LVS zotero eqiad port 4969/tcp - Zotero- zotero.svc.eqiad.wmnet IPv4 #page on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 196 bytes in 1.016 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[06:42:00] <_joe_>	 took your time icinga-wm 
[06:42:05] <Amir1>	 haha
[06:42:27] <Amir1>	 maybe it should kill them automatically? :D
[06:43:20] <_joe_>	 Amir1: let me ask you this - do you think kubernetes doesn't have such facilities? but zotero is resistant to any decent production practice
[06:43:33] <_joe_>	 (and even worse, newer versions of zotero don't have a server component)
[06:43:45] <Amir1>	 sigh
[06:44:01] <_joe_>	 basically you normally gather if a pod is able to serve traffic using a readiness probe
[06:44:09] <_joe_>	 which we can't have in zotero lol
[06:44:22] <_joe_>	 anyways
[06:44:38] <_joe_>	 ttyl
[06:44:40] <Amir1>	 Thanks for fixing this. I'm in a train with spotty connection 
[07:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220325T0700)
[07:10:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23057 and previous config saved to /var/cache/conftool/dbconfig/20220325-071054-ladsgroup.json
[07:10:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:11:00] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[07:17:00] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10elukey) 05Resolved→03Open Hi Chris! I noticed that we have two nodes on the same ROW, would it be possible to move one elsewhere? We are going to h...
[07:18:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23058 and previous config saved to /var/cache/conftool/dbconfig/20220325-071840-marostegui.json
[07:18:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:18:46] <stashbot>	 T302658: globaluser table schema changes (March 2022) - https://phabricator.wikimedia.org/T302658
[07:26:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23059 and previous config saved to /var/cache/conftool/dbconfig/20220325-072559-ladsgroup.json
[07:26:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:29:25] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] "Yes I believe the end goal is dynamic text indeed" [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/773456 (https://phabricator.wikimedia.org/T304585) (owner: 10Phedenskog)
[07:30:18] <icinga-wm>	 PROBLEM - LVS zotero eqiad port 4969/tcp - Zotero- zotero.svc.eqiad.wmnet IPv4 #page on zotero.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[07:30:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Thanks for the review Cole! Peter, mind submitting a patch for https://phabricator.wikimedia.org/T304587 too so we can bundle (hah!) the p" [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/773456 (https://phabricator.wikimedia.org/T304585) (owner: 10Phedenskog)
[07:31:19] <godog>	 hah, re occurrence of zotero throwing its toys out of the pram
[07:31:23] <godog>	 ?
[07:33:28] <wikibugs>	 (03PS22) 10Elukey: Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612)
[07:33:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23060 and previous config saved to /var/cache/conftool/dbconfig/20220325-073345-marostegui.json
[07:33:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:34:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job trafficserver in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:35:43] <elukey>	 there are a couple of pods with throttled cpu
[07:36:42] <icinga-wm>	 RECOVERY - LVS zotero eqiad port 4969/tcp - Zotero- zotero.svc.eqiad.wmnet IPv4 #page on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 197 bytes in 1.016 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[07:40:57] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:41:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23061 and previous config saved to /var/cache/conftool/dbconfig/20220325-074105-ladsgroup.json
[07:41:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:48:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23062 and previous config saved to /var/cache/conftool/dbconfig/20220325-074850-marostegui.json
[07:48:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:49:35] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: change puppetdb-api probe to check for 200 status code [puppet] - 10https://gerrit.wikimedia.org/r/773740
[07:56:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23063 and previous config saved to /var/cache/conftool/dbconfig/20220325-075610-ladsgroup.json
[07:56:14] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[07:56:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:56:15] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[07:56:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[07:56:17] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[07:56:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:56:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:56:22] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[07:56:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:56:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:05] <icinga-wm>	 RECOVERY - Host ms-be1071 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[07:57:34] <wikibugs>	 (03PS23) 10Elukey: Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612)
[08:02:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[08:02:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[08:02:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:02:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:03:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23064 and previous config saved to /var/cache/conftool/dbconfig/20220325-080355-marostegui.json
[08:03:57] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[08:03:58] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[08:03:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:04:00] <stashbot>	 T302658: globaluser table schema changes (March 2022) - https://phabricator.wikimedia.org/T302658
[08:04:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:04:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1174 (T302658)', diff saved to https://phabricator.wikimedia.org/P23065 and previous config saved to /var/cache/conftool/dbconfig/20220325-080403-marostegui.json
[08:04:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:04:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:04:54] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "LGTM now." [puppet] - 10https://gerrit.wikimedia.org/r/724049 (https://phabricator.wikimedia.org/T205361) (owner: 10Majavah)
[08:05:48] <wikibugs>	 (03PS24) 10Elukey: Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612)
[08:05:51] <_joe_>	 taavi: sorry for the delay, I needed to check the apache docs about QSA
[08:11:31] <wikibugs>	 (03CR) 10Elukey: [C: 04-1] "WIP" [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[08:15:06] <wikibugs>	 (03PS25) 10Elukey: Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612)
[08:15:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[08:16:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Enable Ganeti 3 for ganeti-test* [puppet] - 10https://gerrit.wikimedia.org/r/773564 (owner: 10Muehlenhoff)
[08:24:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
[08:24:41] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
[08:24:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23066 and previous config saved to /var/cache/conftool/dbconfig/20220325-082446-ladsgroup.json
[08:24:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:53] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[08:26:39] <wikibugs>	 (03PS26) 10Elukey: Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612)
[08:30:24] <wikibugs>	 10SRE, 10serviceops: Clean up old Docker images on deneb - https://phabricator.wikimedia.org/T287222 (10JMeybohm) There now is a prune timer, see T304644
[08:40:30] <wikibugs>	 (03CR) 10Elukey: [C: 04-1] "Had a chat with Joe, the cni define should represent only a config/kubeconfig, and not a list of them. I am going to rework this change to" [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[08:46:53] <icinga-wm>	 PROBLEM - Ensure hosts are not performing a change on every puppet run on cumin2002 is CRITICAL: CRITICAL: the following (7) node(s) change every puppet run: cloudcontrol1003, cloudcontrol1004, cp1085, deploy1002, deploy2002, ms-be1068, ms-be1071 https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes
[08:48:02] <wikibugs>	 (03PS27) 10Elukey: WIP - Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612)
[08:48:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP - Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[08:49:32] <wikibugs>	 (03PS28) 10Elukey: WIP - Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612)
[08:51:05] <wikibugs>	 (03PS29) 10Elukey: WIP - Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612)
[08:51:40] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Add new command line utility to update existing metadata [software/mediabackups] - 10https://gerrit.wikimedia.org/r/773444 (https://phabricator.wikimedia.org/T299764) (owner: 10Jcrespo)
[08:52:15] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] kubernetes: clean up extra netboot and host settings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773520 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[08:53:49] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] Initial debianization of istio-cni (032 comments) [debs/istio] - 10https://gerrit.wikimedia.org/r/771670 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[08:53:51] <wikibugs>	 (03PS1) 10Majavah: wmcs: toolforge: k8s: show output of deploy.sh [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/773743
[08:54:46] <wikibugs>	 (03PS4) 10Elukey: kubernetes: clean up extra netboot and host settings [puppet] - 10https://gerrit.wikimedia.org/r/773520 (https://phabricator.wikimedia.org/T300744)
[08:55:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T302658)', diff saved to https://phabricator.wikimedia.org/P23067 and previous config saved to /var/cache/conftool/dbconfig/20220325-085508-marostegui.json
[08:55:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:55:14] <stashbot>	 T302658: globaluser table schema changes (March 2022) - https://phabricator.wikimedia.org/T302658
[08:55:20] <wikibugs>	 (03CR) 10Elukey: kubernetes: clean up extra netboot and host settings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773520 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[08:56:42] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] kubernetes: clean up extra netboot and host settings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773520 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[08:58:27] <icinga-wm>	 PROBLEM - Ensure hosts are not performing a change on every puppet run on cumin1001 is CRITICAL: CRITICAL: the following (7) node(s) change every puppet run: cloudcontrol1003, cloudcontrol1004, cp1085, deploy1002, deploy2002, ms-be1068, ms-be1071 https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes
[08:58:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: toolforge: k8s: show output of deploy.sh [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/773743 (owner: 10Majavah)
[09:01:07] <wikibugs>	 (03PS30) 10Elukey: WIP - Refactor Calico's CNI plugin config [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612)
[09:02:30] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 5): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34559/console" [puppet] - 10https://gerrit.wikimedia.org/r/772909 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[09:06:39] <wikibugs>	 (03PS5) 10Elukey: kubernetes: clean up extra netboot and host settings [puppet] - 10https://gerrit.wikimedia.org/r/773520 (https://phabricator.wikimedia.org/T300744)
[09:10:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23068 and previous config saved to /var/cache/conftool/dbconfig/20220325-091013-marostegui.json
[09:10:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:57] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] "The error seems unrelated (happens in one of the sre cookbooks), will rebase the branch see if it fixes it, but this can go in." [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/773743 (owner: 10Majavah)
[09:19:55] <wikibugs>	 (03PS1) 10David Caro: discovery: remove unneeded protected-access supression [cookbooks] - 10https://gerrit.wikimedia.org/r/773744
[09:20:05] <wikibugs>	 (03CR) 10Elukey: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/773520 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[09:23:26] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] "See https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/773744 for the test fix, will rebase on top of master once that is merged" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/773743 (owner: 10Majavah)
[09:25:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23069 and previous config saved to /var/cache/conftool/dbconfig/20220325-092500-ladsgroup.json
[09:25:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:07] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[09:25:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23070 and previous config saved to /var/cache/conftool/dbconfig/20220325-092518-marostegui.json
[09:25:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:54] <moritzm>	 !log updating libapache2-mod-auth-cas on moscovium/debmonitor1002
[09:27:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:38] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] Add helm charts and a helmfile configuration for datahub (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/764375 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[09:31:41] <wikibugs>	 (03PS1) 10Jelto: gitlab_runner: add option to drop Docker capabilities [puppet] - 10https://gerrit.wikimedia.org/r/773746 (https://phabricator.wikimedia.org/T295481)
[09:32:12] <elukey>	 I am not getting any V: +2 from jenkins, checked https://integration.wikimedia.org/zuul/ but didn't see strange things
[09:32:37] <elukey>	 hashar (if you are around) --^ o/
[09:33:43] <wikibugs>	 (03PS1) 10Filippo Giunchedi: sre: add ProbeDown paging alert for enabled services [alerts] - 10https://gerrit.wikimedia.org/r/773747 (https://phabricator.wikimedia.org/T291946)
[09:40:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23071 and previous config saved to /var/cache/conftool/dbconfig/20220325-094006-ladsgroup.json
[09:40:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T302658)', diff saved to https://phabricator.wikimedia.org/P23072 and previous config saved to /var/cache/conftool/dbconfig/20220325-094023-marostegui.json
[09:40:25] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[09:40:27] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[09:40:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:28] <stashbot>	 T302658: globaluser table schema changes (March 2022) - https://phabricator.wikimedia.org/T302658
[09:40:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1181 (T302658)', diff saved to https://phabricator.wikimedia.org/P23073 and previous config saved to /var/cache/conftool/dbconfig/20220325-094031-marostegui.json
[09:40:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:40] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] kubernetes: clean up extra netboot and host settings [puppet] - 10https://gerrit.wikimedia.org/r/773520 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[09:44:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: toolforge: k8s: show output of deploy.sh [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/773743 (owner: 10Majavah)
[09:46:48] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] "Hahahah, of course it would not pass the gating tests xd, not sure what was I thinking" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/773743 (owner: 10Majavah)
[09:47:19] <wikibugs>	 (03PS2) 10David Caro: wmcs: toolforge: k8s: show output of deploy.sh [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/773743 (owner: 10Majavah)
[09:47:34] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34560/console" [puppet] - 10https://gerrit.wikimedia.org/r/773746 (https://phabricator.wikimedia.org/T295481) (owner: 10Jelto)
[09:48:22] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] hieradata: change puppetdb-api probe to check for 200 status code [puppet] - 10https://gerrit.wikimedia.org/r/773740 (owner: 10Filippo Giunchedi)
[09:50:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: toolforge: k8s: show output of deploy.sh [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/773743 (owner: 10Majavah)
[09:54:54] <elukey>	  /12
[09:54:56] <elukey>	 uff
[09:55:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23074 and previous config saved to /var/cache/conftool/dbconfig/20220325-095511-ladsgroup.json
[09:55:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:39] <wikibugs>	 (03PS1) 10Majavah: toolforge: remove ingress-nginx manifests [puppet] - 10https://gerrit.wikimedia.org/r/773750
[10:05:58] <wikibugs>	 (03PS2) 10Elukey: decommission kubernetes[12]00[1-4] [puppet] - 10https://gerrit.wikimedia.org/r/771850 (https://phabricator.wikimedia.org/T303044) (owner: 10Alexandros Kosiaris)
[10:08:55] <wikibugs>	 (03PS1) 10Elukey: kubernetes: apply devicemapper settings to kubernetes[12]00[1-4] [puppet] - 10https://gerrit.wikimedia.org/r/773751 (https://phabricator.wikimedia.org/T300744)
[10:10:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23075 and previous config saved to /var/cache/conftool/dbconfig/20220325-101016-ladsgroup.json
[10:10:17] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[10:10:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[10:10:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:10:21] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[10:10:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:10:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:10:38] <wikibugs>	 (03PS2) 10Elukey: kubernetes: apply devicemapper settings to kubernetes[12]00[1-4] [puppet] - 10https://gerrit.wikimedia.org/r/773751 (https://phabricator.wikimedia.org/T300744)
[10:11:06] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
[10:11:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:33] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] kubernetes: apply devicemapper settings to kubernetes[12]00[1-4] [puppet] - 10https://gerrit.wikimedia.org/r/773751 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[10:12:26] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] kubernetes: apply devicemapper settings to kubernetes[12]00[1-4] [puppet] - 10https://gerrit.wikimedia.org/r/773751 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[10:17:02] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T302658)', diff saved to https://phabricator.wikimedia.org/P23076 and previous config saved to /var/cache/conftool/dbconfig/20220325-101701-marostegui.json
[10:17:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:08] <stashbot>	 T302658: globaluser table schema changes (March 2022) - https://phabricator.wikimedia.org/T302658
[10:18:23] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
[10:18:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:39] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "Given there could be some interim confusion, would you please leave links to the gitlab repo everywhere?" [puppet] - 10https://gerrit.wikimedia.org/r/773750 (owner: 10Majavah)
[10:22:44] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
[10:22:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:38] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban: Create conda .deb and docker image - https://phabricator.wikimedia.org/T304450 (10MoritzMuehlenhoff) >>! In T304450#7797902, @Ottomata wrote: > @MoritzMuehlenhoff advice?  Can I import [[ https://docs.conda.io/projects/conda/en/latest/user-guide/install/r...
[10:28:55] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:32:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23077 and previous config saved to /var/cache/conftool/dbconfig/20220325-103207-marostegui.json
[10:32:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[10:33:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[10:33:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[10:33:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[10:33:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23078 and previous config saved to /var/cache/conftool/dbconfig/20220325-103310-ladsgroup.json
[10:33:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:16] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[10:33:27] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
[10:33:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:43] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "This conflicts with https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/773509" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/773743 (owner: 10Majavah)
[10:45:53] <wikibugs>	 (03PS39) 10Btullis: Add helm charts and a helmfile configuration for datahub [deployment-charts] - 10https://gerrit.wikimedia.org/r/764375 (https://phabricator.wikimedia.org/T301454)
[10:46:14] <wikibugs>	 (03CR) 10Btullis: Add helm charts and a helmfile configuration for datahub (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/764375 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[10:46:52] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: Introduce requestctl (034 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/772342 (https://phabricator.wikimedia.org/T302471) (owner: 10Giuseppe Lavagetto)
[10:47:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23079 and previous config saved to /var/cache/conftool/dbconfig/20220325-104712-marostegui.json
[10:47:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:50:08] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] keepalived: use version from bullseye-bpo [puppet] - 10https://gerrit.wikimedia.org/r/773585 (https://phabricator.wikimedia.org/T304598) (owner: 10Arturo Borrero Gonzalez)
[10:50:24] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudgw: don't install kernel or nft from backports [puppet] - 10https://gerrit.wikimedia.org/r/773586 (https://phabricator.wikimedia.org/T304598) (owner: 10Arturo Borrero Gonzalez)
[10:53:05] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2020 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:53:07] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [software/conftool] - 10https://gerrit.wikimedia.org/r/772342 (https://phabricator.wikimedia.org/T302471) (owner: 10Giuseppe Lavagetto)
[10:53:48] <wikibugs>	 (03CR) 10Jbond: "thanks 😊" [puppet] - 10https://gerrit.wikimedia.org/r/773740 (owner: 10Filippo Giunchedi)
[10:55:15] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[11:02:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T302658)', diff saved to https://phabricator.wikimedia.org/P23080 and previous config saved to /var/cache/conftool/dbconfig/20220325-110217-marostegui.json
[11:02:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:02:22] <stashbot>	 T302658: globaluser table schema changes (March 2022) - https://phabricator.wikimedia.org/T302658
[11:05:59] <wikibugs>	 (03CR) 10Jbond: [C: 04-2] "this job is also responsible for downloading the following datasets which AFAIK are very much in use" [puppet] - 10https://gerrit.wikimedia.org/r/773648 (https://phabricator.wikimedia.org/T303464) (owner: 10Dzahn)
[11:07:26] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Initial debianization of istio-cni [debs/istio] - 10https://gerrit.wikimedia.org/r/771670 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[11:08:11] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm thx" [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287) (owner: 10Dzahn)
[11:08:51] <wikibugs>	 (03PS1) 10DCausse: team-search-platform: add jvmquake alerting [alerts] - 10https://gerrit.wikimedia.org/r/773758 (https://phabricator.wikimedia.org/T293862)
[11:12:53] <icinga-wm>	 RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:21:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23081 and previous config saved to /var/cache/conftool/dbconfig/20220325-112145-ladsgroup.json
[11:21:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:21:52] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[11:24:01] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
[11:24:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:17] <wikibugs>	 (03Abandoned) 10Arturo Borrero Gonzalez: sonofgridengine: grid-configurator: support shorter bastion prefix [puppet] - 10https://gerrit.wikimedia.org/r/749744 (owner: 10Arturo Borrero Gonzalez)
[11:28:03] <wikibugs>	 (03PS7) 10Giuseppe Lavagetto: Introduce requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/772342 (https://phabricator.wikimedia.org/T302471)
[11:28:05] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Add debian packaging for requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/773760
[11:29:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Introduce requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/772342 (https://phabricator.wikimedia.org/T302471) (owner: 10Giuseppe Lavagetto)
[11:30:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add debian packaging for requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/773760 (owner: 10Giuseppe Lavagetto)
[11:30:35] <icinga-wm>	 PROBLEM - SSH on thumbor2004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:32:29] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.3 point update - https://phabricator.wikimedia.org/T304599 (10jbond) p:05Triage→03Medium
[11:32:33] <wikibugs>	 (03PS21) 10MVernon: swift: deploy swift_ring_manager to one node per cluster [puppet] - 10https://gerrit.wikimedia.org/r/769941 (https://phabricator.wikimedia.org/T265117)
[11:33:07] <wikibugs>	 (03Abandoned) 10Arturo Borrero Gonzalez: UNTESTED: openstack: neutron: refresh API policy to allow port management [puppet] - 10https://gerrit.wikimedia.org/r/606991 (https://phabricator.wikimedia.org/T255670) (owner: 10Arturo Borrero Gonzalez)
[11:33:34] <wikibugs>	 (03Abandoned) 10Arturo Borrero Gonzalez: nftables: introduce nft-check exec [puppet] - 10https://gerrit.wikimedia.org/r/651453 (owner: 10Arturo Borrero Gonzalez)
[11:34:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job trafficserver in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:35:47] <wikibugs>	 10SRE, 10Data-Engineering-Radar, 10Traffic: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617 (10jbond) p:05Triage→03Medium
[11:35:54] <wikibugs>	 (03PS11) 10MVernon: puppetmaster: rsync swift rings from each cluster's ring manager [puppet] - 10https://gerrit.wikimedia.org/r/769942 (https://phabricator.wikimedia.org/T265117)
[11:36:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23082 and previous config saved to /var/cache/conftool/dbconfig/20220325-113651-ladsgroup.json
[11:36:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:49:53] <wikibugs>	 (03PS22) 10MVernon: swift: deploy swift_ring_manager to one node per cluster [puppet] - 10https://gerrit.wikimedia.org/r/769941 (https://phabricator.wikimedia.org/T265117)
[11:51:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23083 and previous config saved to /var/cache/conftool/dbconfig/20220325-115156-ladsgroup.json
[11:51:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:52:46] <wikibugs>	 (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/769942 (https://phabricator.wikimedia.org/T265117) (owner: 10MVernon)
[11:53:46] <wikibugs>	 (03CR) 10MVernon: swift: deploy swift_ring_manager to one node per cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/769941 (https://phabricator.wikimedia.org/T265117) (owner: 10MVernon)
[11:53:55] <wikibugs>	 (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/769941 (https://phabricator.wikimedia.org/T265117) (owner: 10MVernon)
[12:07:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23084 and previous config saved to /var/cache/conftool/dbconfig/20220325-120701-ladsgroup.json
[12:07:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
[12:07:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
[12:07:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:06] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[12:07:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23085 and previous config saved to /var/cache/conftool/dbconfig/20220325-120708-ladsgroup.json
[12:07:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:16:18] <wikibugs>	 (03CR) 10Jbond: "So you accidentally ended up with me on you CR and now i have reviewed it :/" [puppet] - 10https://gerrit.wikimedia.org/r/769941 (https://phabricator.wikimedia.org/T265117) (owner: 10MVernon)
[12:16:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23086 and previous config saved to /var/cache/conftool/dbconfig/20220325-121623-ladsgroup.json
[12:16:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:16:29] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[12:21:12] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to nda/logstash for User:TheDJ - https://phabricator.wikimedia.org/T304120 (10jbond) 05Stalled→03Resolved a:03jbond thanks @KFrancis   @TheDJ Access has been granted you should be able to access the requested resources now, please let me know if yu have any issues
[12:22:47] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Email spam from varying tawk.email addresses - https://phabricator.wikimedia.org/T304390 (10jbond) p:05Triage→03Medium
[12:28:13] <wikibugs>	 (03CR) 10MarkAHershberger: admin: reactivate account for Mark Hershberger, add to Mediawiki releasers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287) (owner: 10Dzahn)
[12:28:53] <wikibugs>	 10SRE, 10VPS-project-Codesearch: Add operations/software/purged to Codesearch - https://phabricator.wikimedia.org/T303434 (10jbond)
[12:31:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23088 and previous config saved to /var/cache/conftool/dbconfig/20220325-123128-ladsgroup.json
[12:31:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:23] <wikibugs>	 10SRE, 10VPS-project-Codesearch: Add operations/software/purged to Codesearch - https://phabricator.wikimedia.org/T303434 (10jbond) Im not too familiar with code search so not sure wht does and doesn't make senses but tagging a few project owners @joe pcc @Volans anything extra you can think of e.g. cumin, deb...
[12:43:50] <wikibugs>	 10SRE, 10VPS-project-Codesearch: Add operations/software/purged to Codesearch - https://phabricator.wikimedia.org/T303434 (10jbond) p:05Triage→03Medium
[12:46:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23089 and previous config saved to /var/cache/conftool/dbconfig/20220325-124633-ladsgroup.json
[12:46:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:08] <hoo>	 !log Updated operations/dumps/dcat on snapshot10(08|09|11|12|13) from d4886f6 to a1f46e4
[12:49:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:34] <wikibugs>	 (03CR) 10Hoo man: "I just deployed this change it should take effect whenever dcat.rdf is re-generated next (probably early next week)." [dumps/dcat] - 10https://gerrit.wikimedia.org/r/773490 (owner: 10Abbe98)
[12:51:23] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10MoritzMuehlenhoff) >>! In T297913#7805533, @RobH wrote: > Also unable to determine how to poll for virtual disk IDs, other htan dropping into raid bios, which won't work out for production.  I need to ke...
[12:59:04] <wikibugs>	 (03PS2) 10Muehlenhoff: mediabackup::storage: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/771560
[12:59:40] <wikibugs>	 10SRE, 10VPS-project-Codesearch, 10Patch-For-Review: Add operations/software/purged to Codesearch - https://phabricator.wikimedia.org/T303434 (10jbond) moritz suggested we should just add all software we maintain so ill create a cr to do that
[13:00:15] <wikibugs>	 (03PS3) 10BBlack: geodns: remove geo-maps-esams-offline hack [dns] - 10https://gerrit.wikimedia.org/r/771631 (https://phabricator.wikimedia.org/T304089)
[13:00:17] <wikibugs>	 (03PS4) 10BBlack: geodns: add drmrs fallback for esams to whole map [dns] - 10https://gerrit.wikimedia.org/r/771632 (https://phabricator.wikimedia.org/T304089)
[13:01:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23090 and previous config saved to /var/cache/conftool/dbconfig/20220325-130138-ladsgroup.json
[13:01:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[13:01:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[13:01:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:01:46] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[13:01:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23091 and previous config saved to /var/cache/conftool/dbconfig/20220325-130146-ladsgroup.json
[13:01:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:01:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:01:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:08] <wikibugs>	 10SRE, 10VPS-project-Codesearch, 10Patch-For-Review: Add operations/software/purged to Codesearch - https://phabricator.wikimedia.org/T303434 (10Joe) Other things to add that are not under `operations/software`:  * `operations/docker-images/docker-pkg` * `operations/docker-images/docker-report` * `operations...
[13:08:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23092 and previous config saved to /var/cache/conftool/dbconfig/20220325-130834-root.json
[13:08:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:44] <wikibugs>	 (03PS1) 10Muehlenhoff: klaxone: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/773772
[13:17:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] klaxone: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/773772 (owner: 10Muehlenhoff)
[13:20:57] <wikibugs>	 10SRE, 10Data-Persistence-Backup, 10media-backups, 10Goal, 10Patch-For-Review: Document media recovery use case proposals and decide their priority - https://phabricator.wikimedia.org/T299764 (10jcrespo) Because performing backups takes multiple days, the following issues have been detected:  * Some file...
[13:22:55] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
[13:22:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23093 and previous config saved to /var/cache/conftool/dbconfig/20220325-132338-root.json
[13:23:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:27:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23094 and previous config saved to /var/cache/conftool/dbconfig/20220325-132746-ladsgroup.json
[13:27:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:27:53] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[13:27:55] <wikibugs>	 (03CR) 10MVernon: swift: deploy swift_ring_manager to one node per cluster (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/769941 (https://phabricator.wikimedia.org/T265117) (owner: 10MVernon)
[13:28:57] <icinga-wm>	 PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The following units failed: docker-reporter-releng-images.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:31:55] <icinga-wm>	 RECOVERY - SSH on thumbor2004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:34:36] <wikibugs>	 (03PS2) 10Muehlenhoff: klaxone: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/773772
[13:35:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] klaxone: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/773772 (owner: 10Muehlenhoff)
[13:37:29] <wikibugs>	 (03CR) 10RhinosF1: klaxone: Switch to systemd::sysuser (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773772 (owner: 10Muehlenhoff)
[13:38:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23095 and previous config saved to /var/cache/conftool/dbconfig/20220325-133842-root.json
[13:38:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:41:35] <icinga-wm>	 PROBLEM - SSH on db2090.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:42:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23096 and previous config saved to /var/cache/conftool/dbconfig/20220325-134251-ladsgroup.json
[13:42:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:03] <icinga-wm>	 RECOVERY - Host ms-be1070 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms
[13:46:38] <wikibugs>	 (03PS3) 10Muehlenhoff: klaxon: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/773772
[13:48:02] <wikibugs>	 (03PS1) 10Phedenskog: Add marcusolsson-dynamic-text plugin. [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/773778 (https://phabricator.wikimedia.org/T304587)
[13:49:59] <wikibugs>	 (03CR) 10Muehlenhoff: klaxon: Switch to systemd::sysuser (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773772 (owner: 10Muehlenhoff)
[13:50:08] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/773772 (owner: 10Muehlenhoff)
[13:53:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23097 and previous config saved to /var/cache/conftool/dbconfig/20220325-135346-root.json
[13:53:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:56:47] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] klaxon: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/773772 (owner: 10Muehlenhoff)
[13:57:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23098 and previous config saved to /var/cache/conftool/dbconfig/20220325-135756-ladsgroup.json
[13:57:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:01:31] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install ms-be10[68-71] - https://phabricator.wikimedia.org/T299462 (10cmooney) Just an update here.  Juniper have been able to confirm that this is a bug in their implementation of ARP on this platform.  TL;DR what happens on a...
[14:02:32] <wikibugs>	 (03PS1) 10Muehlenhoff: certspotter: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/773780
[14:03:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] certspotter: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/773780 (owner: 10Muehlenhoff)
[14:04:08] <wikibugs>	 (03PS2) 10Muehlenhoff: certspotter: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/773780
[14:07:36] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/773780 (owner: 10Muehlenhoff)
[14:08:42] <wikibugs>	 (03PS1) 10Jelto: gitlab: add version check to restore script [puppet] - 10https://gerrit.wikimedia.org/r/773783 (https://phabricator.wikimedia.org/T274463)
[14:08:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23099 and previous config saved to /var/cache/conftool/dbconfig/20220325-140850-root.json
[14:08:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:31] <wikibugs>	 (03CR) 10Hashar: docker: move pruning to new profile docker::prune (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773641 (https://phabricator.wikimedia.org/T304644) (owner: 10Razzi)
[14:10:56] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Add marcusolsson-dynamic-text plugin. [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/773778 (https://phabricator.wikimedia.org/T304587) (owner: 10Phedenskog)
[14:10:57] <icinga-wm>	 RECOVERY - Host cp1090.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.15 ms
[14:11:26] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34563/console" [puppet] - 10https://gerrit.wikimedia.org/r/773783 (https://phabricator.wikimedia.org/T274463) (owner: 10Jelto)
[14:13:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23100 and previous config saved to /var/cache/conftool/dbconfig/20220325-141301-ladsgroup.json
[14:13:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[14:13:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[14:13:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:06] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[14:13:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:45] <wikibugs>	 (03PS6) 10Hashar: docker: move pruning to new profile docker::prune [puppet] - 10https://gerrit.wikimedia.org/r/773641 (https://phabricator.wikimedia.org/T304644) (owner: 10Razzi)
[14:19:23] <wikibugs>	 (03CR) 10Hashar: docker: move pruning to new profile docker::prune (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/773641 (https://phabricator.wikimedia.org/T304644) (owner: 10Razzi)
[14:24:45] <wikibugs>	 10SRE, 10SRE-OnFire (FY2021/2022-Q3), 10Infrastructure-Foundations, 10SRE Observability (FY2021/2022-Q3): Implement an accurate and easy to understand status page for all wikis - https://phabricator.wikimedia.org/T202061 (10CDanis) [[ https://status.wikimedia.org | status.wikimedia.org ]] is now up-to-date...
[14:26:48] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] wmcs.backy2: add link to the runbook for backup_vms [puppet] - 10https://gerrit.wikimedia.org/r/772839 (https://phabricator.wikimedia.org/T304408) (owner: 10David Caro)
[14:27:19] <wikibugs>	 (03PS1) 10Hashar: ci: docker system prune on ci::master [puppet] - 10https://gerrit.wikimedia.org/r/773784
[14:27:41] <wikibugs>	 (03CR) 10Hashar: "For the production hosts:" [puppet] - 10https://gerrit.wikimedia.org/r/773641 (https://phabricator.wikimedia.org/T304644) (owner: 10Razzi)
[14:28:22] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Hieradata yaml style checking - https://phabricator.wikimedia.org/T236954 (10jhathaway) a:03jhathaway
[14:35:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[14:35:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[14:35:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23101 and previous config saved to /var/cache/conftool/dbconfig/20220325-143545-ladsgroup.json
[14:35:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:53] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[14:39:23] <wikibugs>	 (03CR) 10Hashar: beta::autoupdater: Remove more obsolete stuff after scap prep auto (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/753787 (owner: 10Ahmon Dancy)
[15:01:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23107 and previous config saved to /var/cache/conftool/dbconfig/20220325-150141-ladsgroup.json
[15:01:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:48] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:07:40] <wikibugs>	 (03PS1) 10Elukey: aptrepo: add component for istio 1.9.5 [puppet] - 10https://gerrit.wikimedia.org/r/773791 (https://phabricator.wikimedia.org/T297612)
[15:10:02] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] aptrepo: add component for istio 1.9.5 [puppet] - 10https://gerrit.wikimedia.org/r/773791 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[15:13:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2013 is CRITICAL: /en.wikipedia.org/v1/page/talk/{title} (Get structured talk page for enwiki Salt article) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:14:57] <wikibugs>	 (03PS1) 10Jbond: POC: P:thanos::swift::frontend:  move ring manager config to hiera [puppet] - 10https://gerrit.wikimedia.org/r/773794
[15:14:59] <wikibugs>	 (03PS1) 10Jbond: P:thanos::swift: demo changing wieghts and draining [puppet] - 10https://gerrit.wikimedia.org/r/773795
[15:15:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] POC: P:thanos::swift::frontend:  move ring manager config to hiera [puppet] - 10https://gerrit.wikimedia.org/r/773794 (owner: 10Jbond)
[15:15:50] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2013 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:16:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23108 and previous config saved to /var/cache/conftool/dbconfig/20220325-151647-ladsgroup.json
[15:16:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:21:31] <wikibugs>	 (03CR) 10Cwhite: [V: 03+2 C: 03+2] "Build succeeds.  Installed on grafana-next.wm.o for testing." [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/773778 (https://phabricator.wikimedia.org/T304587) (owner: 10Phedenskog)
[15:22:18] <wikibugs>	 (03CR) 10Herron: [C: 03+1] "LGTM overall, one question inline" [alerts] - 10https://gerrit.wikimedia.org/r/773747 (https://phabricator.wikimedia.org/T291946) (owner: 10Filippo Giunchedi)
[15:23:39] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to nda/logstash for User:TheDJ - https://phabricator.wikimedia.org/T304120 (10TheDJ) access confirmed
[15:26:36] <wikibugs>	 (03CR) 10Ayounsi: "Overall LGTM, some suggestions inline." [homer/public] - 10https://gerrit.wikimedia.org/r/773587 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[15:27:38] <wikibugs>	 (03CR) 10David Caro: wmcs: toolforge: k8s: factorize build code into a class (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/773510 (owner: 10Arturo Borrero Gonzalez)
[15:30:25] <wikibugs>	 (03CR) 10Jbond: swift: deploy swift_ring_manager to one node per cluster (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/769941 (https://phabricator.wikimedia.org/T265117) (owner: 10MVernon)
[15:30:29] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: use vendor_modules during bootstrap [puppet] - 10https://gerrit.wikimedia.org/r/773799
[15:31:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23109 and previous config saved to /var/cache/conftool/dbconfig/20220325-153152-ladsgroup.json
[15:31:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job trafficserver in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:34:01] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: use vendor_modules during bootstrap [puppet] - 10https://gerrit.wikimedia.org/r/773799 (owner: 10Filippo Giunchedi)
[15:36:50] <wikibugs>	 (03PS2) 10Jbond: POC: P:thanos::swift::frontend:  move ring manager config to hiera [puppet] - 10https://gerrit.wikimedia.org/r/773794
[15:37:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] POC: P:thanos::swift::frontend:  move ring manager config to hiera [puppet] - 10https://gerrit.wikimedia.org/r/773794 (owner: 10Jbond)
[15:38:40] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "Had a chat with Janis about adding a note in the docs that the first tlsHostname will be used as CN, LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/773255 (https://phabricator.wikimedia.org/T290966) (owner: 10JMeybohm)
[15:40:56] <wikibugs>	 (03PS2) 10JMeybohm: Allow multiple tlsHostnames [deployment-charts] - 10https://gerrit.wikimedia.org/r/773255 (https://phabricator.wikimedia.org/T290966)
[15:43:11] <wikibugs>	 (03PS1) 10Btullis: Add an alert for zero messages being generated by varnishkafka instances [alerts] - 10https://gerrit.wikimedia.org/r/773801 (https://phabricator.wikimedia.org/T300246)
[15:45:00] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3548 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[15:45:05] <wikibugs>	 10ops-codfw: Document codfw breakout patch pannels in Netbox - https://phabricator.wikimedia.org/T304710 (10ayounsi) p:05Triage→03Low
[15:45:34] <wikibugs>	 10SRE, 10ops-codfw: Document codfw breakout patch pannels in Netbox - https://phabricator.wikimedia.org/T304710 (10ayounsi)
[15:46:01] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Sustainability (Incident Followup): Add linecard diversity to the router-to-router interconnect in codfw - https://phabricator.wikimedia.org/T248506 (10ayounsi)
[15:46:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23110 and previous config saved to /var/cache/conftool/dbconfig/20220325-154658-ladsgroup.json
[15:46:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[15:47:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[15:47:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:04] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:47:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23111 and previous config saved to /var/cache/conftool/dbconfig/20220325-154705-ladsgroup.json
[15:47:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:26] <wikibugs>	 (03CR) 10Phuedx: "> We should definitely actually run the tests before we merge this as well" [puppet] - 10https://gerrit.wikimedia.org/r/765485 (https://phabricator.wikimedia.org/T301238) (owner: 10Phuedx)
[15:49:51] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] P:scap::dsh: Add scap targets as a dsh group [puppet] - 10https://gerrit.wikimedia.org/r/771441 (https://phabricator.wikimedia.org/T303559) (owner: 10Jbond)
[15:50:14] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] wmflib: add class_hosts [puppet] - 10https://gerrit.wikimedia.org/r/771437 (https://phabricator.wikimedia.org/T303559) (owner: 10Jbond)
[15:50:50] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Allow multiple tlsHostnames [deployment-charts] - 10https://gerrit.wikimedia.org/r/773255 (https://phabricator.wikimedia.org/T290966) (owner: 10JMeybohm)
[15:51:07] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Add correct tlsHostnames and extra SAN to datahub cert [deployment-charts] - 10https://gerrit.wikimedia.org/r/773256 (https://phabricator.wikimedia.org/T303049) (owner: 10JMeybohm)
[15:51:14] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3387 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[15:52:42] <wikibugs>	 (03PS2) 10Btullis: Add an alert for zero messages being generated by varnishkafka instances [alerts] - 10https://gerrit.wikimedia.org/r/773801 (https://phabricator.wikimedia.org/T300246)
[15:53:27] <wikibugs>	 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10ayounsi) p:05Triage→03Medium
[15:54:10] <wikibugs>	 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10ayounsi)
[15:54:50] <wikibugs>	 (03Merged) 10jenkins-bot: Allow multiple tlsHostnames [deployment-charts] - 10https://gerrit.wikimedia.org/r/773255 (https://phabricator.wikimedia.org/T290966) (owner: 10JMeybohm)
[15:55:07] <wikibugs>	 (03CR) 10Ahmon Dancy: "just typo nits. The change LGTM otherwise" [puppet] - 10https://gerrit.wikimedia.org/r/773784 (owner: 10Hashar)
[15:56:56] <wikibugs>	 (03CR) 10Dzahn: geoip::data::maxmind: deactivate timer for downloading of legacy DBs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773648 (https://phabricator.wikimedia.org/T303464) (owner: 10Dzahn)
[16:00:08] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.4355 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[16:02:06] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:04:30] <icinga-wm>	 RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: (C)0.3 gt (W)0.1 gt 0.09677 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[16:06:37] <wikibugs>	 10SRE, 10ops-codfw: Document codfw breakout patch panels in Netbox - https://phabricator.wikimedia.org/T304710 (10ayounsi)
[16:11:53] <wikibugs>	 (03PS2) 10Majavah: toolforge: remove ingress-nginx manifests [puppet] - 10https://gerrit.wikimedia.org/r/773750
[16:12:12] <wikibugs>	 (03CR) 10Majavah: toolforge: remove ingress-nginx manifests (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773750 (owner: 10Majavah)
[16:12:37] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] sre.kafka.roll-restart-brokers: generalize the restart reason [cookbooks] - 10https://gerrit.wikimedia.org/r/773475 (owner: 10Elukey)
[16:12:41] <wikibugs>	 (03PS2) 10Elukey: sre.kafka.roll-restart-brokers: generalize the restart reason [cookbooks] - 10https://gerrit.wikimedia.org/r/773475
[16:12:44] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] sre.kafka.roll-restart-brokers: generalize the restart reason [cookbooks] - 10https://gerrit.wikimedia.org/r/773475 (owner: 10Elukey)
[16:12:51] <wikibugs>	 (03PS1) 10JMeybohm: Allow to specify additional gatewayHosts without overriding the default [deployment-charts] - 10https://gerrit.wikimedia.org/r/773805 (https://phabricator.wikimedia.org/T290966)
[16:13:10] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:13:21] <wikibugs>	 (03PS2) 10Elukey: Add helmfile config for Istio proxy sidecars [deployment-charts] - 10https://gerrit.wikimedia.org/r/773565 (https://phabricator.wikimedia.org/T297612)
[16:14:35] <wikibugs>	 (03PS3) 10Jbond: POC: P:thanos::swift::frontend:  move ring manager config to hiera [puppet] - 10https://gerrit.wikimedia.org/r/773794
[16:16:19] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] sre: add ProbeDown paging alert for enabled services [alerts] - 10https://gerrit.wikimedia.org/r/773747 (https://phabricator.wikimedia.org/T291946) (owner: 10Filippo Giunchedi)
[16:16:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23112 and previous config saved to /var/cache/conftool/dbconfig/20220325-161631-ladsgroup.json
[16:16:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:37] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:19:55] <wikibugs>	 (03PS1) 10Vivian Rook: Update codfw1dev cloudservices openstack [puppet] - 10https://gerrit.wikimedia.org/r/773806 (https://phabricator.wikimedia.org/T304702)
[16:20:55] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] toolforge: remove ingress-nginx manifests [puppet] - 10https://gerrit.wikimedia.org/r/773750 (owner: 10Majavah)
[16:21:57] <wikibugs>	 (03PS2) 10Jbond: P:thanos::swift: demo changing wieghts and draining [puppet] - 10https://gerrit.wikimedia.org/r/773795
[16:23:09] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: cp1090.mgmt ssh port not accessible - https://phabricator.wikimedia.org/T304589 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson re-seated the mgmt cable. no issues logging into mgmt interface  root@cp1090.mgmt.eqiad.wmnet's password: /admin1->
[16:23:45] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "I'm all for moving things out of the monolithic puppet gerrit repo whenever possible. Thanks taavi!" [puppet] - 10https://gerrit.wikimedia.org/r/773750 (owner: 10Majavah)
[16:23:50] <wikibugs>	 10SRE, 10Wikimedia-Etherpad, 10serviceops: Etherpads corrupted - https://phabricator.wikimedia.org/T304005 (10Dzahn) a:03Dzahn
[16:24:06] <wikibugs>	 10SRE, 10ops-eqiad, 10serviceops: mc1053 PS redundancy alert - https://phabricator.wikimedia.org/T304477 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson Fixed
[16:24:56] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: remove ingress-nginx manifests [puppet] - 10https://gerrit.wikimedia.org/r/773750 (owner: 10Majavah)
[16:27:03] <wikibugs>	 (03CR) 10Jcrespo: "Ok to me, I trust your suggestion this is better. :-)" [puppet] - 10https://gerrit.wikimedia.org/r/771560 (owner: 10Muehlenhoff)
[16:29:11] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Release-Engineering-Team (🚂🧪 Trainsperiment Week): Need a service account on deploy servers - https://phabricator.wikimedia.org/T303857 (10thcipriani) > Approving manager: @thcipriani  Approved from my side!  Tagging #SRE as well — not sure about the current best task i...
[16:31:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23114 and previous config saved to /var/cache/conftool/dbconfig/20220325-163136-ladsgroup.json
[16:31:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:34] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] Update codfw1dev cloudservices openstack [puppet] - 10https://gerrit.wikimedia.org/r/773806 (https://phabricator.wikimedia.org/T304702) (owner: 10Vivian Rook)
[16:34:26] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[16:34:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:29] <wikibugs>	 (03PS8) 10Giuseppe Lavagetto: Introduce requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/772342 (https://phabricator.wikimedia.org/T302471)
[16:34:31] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Add debian packaging for requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/773760
[16:35:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Introduce requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/772342 (https://phabricator.wikimedia.org/T302471) (owner: 10Giuseppe Lavagetto)
[16:35:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add debian packaging for requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/773760 (owner: 10Giuseppe Lavagetto)
[16:36:09] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: Introduce requestctl (033 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/772342 (https://phabricator.wikimedia.org/T302471) (owner: 10Giuseppe Lavagetto)
[16:36:41] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10decommission-hardware: Decommission ms-fe100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T304064 (10Cmjohnson)
[16:37:12] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10decommission-hardware: Decommission ms-fe100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T304064 (10Cmjohnson) 05Open→03Resolved Removed from rack and netbox updated
[16:37:29] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:37:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:39:38] <wikibugs>	 (03CR) 10Jcrespo: mediabackup::storage: Switch to systemd::sysuser (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/771560 (owner: 10Muehlenhoff)
[16:39:40] <wikibugs>	 (03PS9) 10Giuseppe Lavagetto: Introduce requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/772342 (https://phabricator.wikimedia.org/T302471)
[16:39:42] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Add debian packaging for requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/773760
[16:40:02] <icinga-wm>	 RECOVERY - IPMI Sensor Status on mc1053 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures
[16:41:56] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:43:28] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10serviceops, 10Release-Engineering-Team (🚂🧪 Trainsperiment Week): Need a service account on deploy servers - https://phabricator.wikimedia.org/T303857 (10jcrespo) Normally this kind of tasks would be routed by the person in clinic duty, but I just happened to see him g...
[16:43:59] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Introduce requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/772342 (https://phabricator.wikimedia.org/T302471) (owner: 10Giuseppe Lavagetto)
[16:44:14] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Mailman3: 550-Support for list subscription via email has been disabled. - https://phabricator.wikimedia.org/T303888 (10Urbanecm) >>! In T303888#7805113, @Ladsgroup wrote: > Yup, this is something we carried over from mailman2 given the history of abuse with mass subscription...
[16:45:21] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops, 10Release-Engineering-Team (🚂🧪 Trainsperiment Week): Need a service account on deploy servers - https://phabricator.wikimedia.org/T303857 (10Dzahn)
[16:45:48] <wikibugs>	 (03Merged) 10jenkins-bot: Introduce requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/772342 (https://phabricator.wikimedia.org/T302471) (owner: 10Giuseppe Lavagetto)
[16:46:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23115 and previous config saved to /var/cache/conftool/dbconfig/20220325-164641-ladsgroup.json
[16:46:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:53] <wikibugs>	 10SRE, 10Maps: Allow Wikimedia Maps usage on bbcrewind.co.uk - https://phabricator.wikimedia.org/T297968 (10JMinor) a:05MSantos→03JMinor Looks like were set. Just need to close the loop with the BBC folks. Will resolve when confirmed.   Thank you!
[16:48:34] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops, 10Release-Engineering-Team (🚂🧪 Trainsperiment Week): Need a service account on deploy servers - https://phabricator.wikimedia.org/T303857 (10Joe) a:03Joe I'll take care of this, as I assume it's not urgent to be completed before...
[16:49:06] <icinga-wm>	 PROBLEM - Host ml-cache1002.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:49:47] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add debian packaging for requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/773760 (owner: 10Giuseppe Lavagetto)
[16:50:44] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[16:50:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:52:09] <wikibugs>	 (03Merged) 10jenkins-bot: Add debian packaging for requestctl [software/conftool] - 10https://gerrit.wikimedia.org/r/773760 (owner: 10Giuseppe Lavagetto)
[16:53:36] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admin: reactivate account for Mark Hershberger, add to Mediawiki releasers [puppet] - 10https://gerrit.wikimedia.org/r/773660 (https://phabricator.wikimedia.org/T302287) (owner: 10Dzahn)
[16:57:09] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to releaser for MarkAHershberger - https://phabricator.wikimedia.org/T302287 (10Dzahn) ` [releases1002:~] $ id mah uid=1232(mah) gid=500(wikidev) groups=500(wikidev),711(releasers-mediawiki)  [releases2002:~] $ id mah uid=1232(mah) gid=500(w...
[16:57:43] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:57:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:58:02] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10Cmjohnson) @elukey I moved ml-cache1002 to row/rack C4.
[16:58:55] <wikibugs>	 (03CR) 10Phuedx: Request high-entropy Sec-CH-UA* client hints (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/765485 (https://phabricator.wikimedia.org/T301238) (owner: 10Phuedx)
[17:00:10] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] "Bah, no worth time discussing, let's just deploy as it is." [puppet] - 10https://gerrit.wikimedia.org/r/771560 (owner: 10Muehlenhoff)
[17:00:33] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] wmflib: add class_hosts [puppet] - 10https://gerrit.wikimedia.org/r/771437 (https://phabricator.wikimedia.org/T303559) (owner: 10Jbond)
[17:01:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23116 and previous config saved to /var/cache/conftool/dbconfig/20220325-170146-ladsgroup.json
[17:01:48] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[17:01:50] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[17:01:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:52] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:01:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23117 and previous config saved to /var/cache/conftool/dbconfig/20220325-170154-ladsgroup.json
[17:01:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:07:54] <wikibugs>	 (03Abandoned) 10Majavah: toolforge: deploy ingress-nginx via helmfile and provide deploy.sh [puppet] - 10https://gerrit.wikimedia.org/r/773448 (https://phabricator.wikimedia.org/T303931) (owner: 10Majavah)
[17:09:27] <wikibugs>	 (03PS8) 10Phuedx: Request high-entropy Sec-CH-UA* client hints [puppet] - 10https://gerrit.wikimedia.org/r/765485 (https://phabricator.wikimedia.org/T301238)
[17:10:47] <wikibugs>	 (03PS1) 10Andrew Bogott: Move cloudstore1008/1009 to role::spare [puppet] - 10https://gerrit.wikimedia.org/r/773819 (https://phabricator.wikimedia.org/T291405)
[17:14:16] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
[17:14:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:14:21] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Move cloudstore1008/1009 to role::spare [puppet] - 10https://gerrit.wikimedia.org/r/773819 (https://phabricator.wikimedia.org/T291405) (owner: 10Andrew Bogott)
[17:14:22] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host ml-cache1002.eqiad.wmnet with OS bullseye
[17:14:36] <icinga-wm>	 RECOVERY - Host ml-cache1002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.12 ms
[17:16:08] <wikibugs>	 (03PS1) 10Majavah: hieradata: generate cfssl certs for cloudmetrics* [puppet] - 10https://gerrit.wikimedia.org/r/773821
[17:16:44] <wikibugs>	 (03PS9) 10Phuedx: Request high-entropy Sec-CH-UA* client hints [puppet] - 10https://gerrit.wikimedia.org/r/765485 (https://phabricator.wikimedia.org/T301238)
[17:17:47] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34566/console" [puppet] - 10https://gerrit.wikimedia.org/r/773821 (owner: 10Majavah)
[17:19:34] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hieradata: generate cfssl certs for cloudmetrics* [puppet] - 10https://gerrit.wikimedia.org/r/773821 (owner: 10Majavah)
[17:28:08] <wikibugs>	 (03CR) 10BBlack: [C: 03+1] "Looks great now! Testsuite caught a couple of minor syntax issues fixed in PS8 and PS9, all clean on both text and upload runs now." [puppet] - 10https://gerrit.wikimedia.org/r/765485 (https://phabricator.wikimedia.org/T301238) (owner: 10Phuedx)
[17:29:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23118 and previous config saved to /var/cache/conftool/dbconfig/20220325-172916-ladsgroup.json
[17:29:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:29:21] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:32:33] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops, 10Release-Engineering-Team (🚂🧪 Trainsperiment Week): Need a service account on deploy servers - https://phabricator.wikimedia.org/T303857 (10dancy) Hi @Joe.  This request is not urgent so it can wait until next week.  The plan is f...
[17:32:43] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 121 probes of 675 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:35:21] <icinga-wm>	 PROBLEM - SSH on thumbor2004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:38:15] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 60 probes of 675 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:42:42] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
[17:42:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:42:47] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host ml-cache1002.eqiad.wmnet with OS bullseye executed wit...
[17:44:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23119 and previous config saved to /var/cache/conftool/dbconfig/20220325-174421-ladsgroup.json
[17:44:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:48:16] <wikibugs>	 (03PS3) 10Elukey: Add helmfile config for Istio proxy sidecars [deployment-charts] - 10https://gerrit.wikimedia.org/r/773565 (https://phabricator.wikimedia.org/T297612)
[17:53:01] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[17:59:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23120 and previous config saved to /var/cache/conftool/dbconfig/20220325-175926-ladsgroup.json
[17:59:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:03:53] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.69 ms
[18:14:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23121 and previous config saved to /var/cache/conftool/dbconfig/20220325-181431-ladsgroup.json
[18:14:32] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[18:14:34] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[18:14:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:38] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[18:14:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23122 and previous config saved to /var/cache/conftool/dbconfig/20220325-181439-ladsgroup.json
[18:14:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:49] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:39:11] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releaser for MarkAHershberger - https://phabricator.wikimedia.org/T302287 (10Dzahn)
[18:39:21] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releaser for MarkAHershberger - https://phabricator.wikimedia.org/T302287 (10Dzahn) a:03Dzahn
[18:40:37] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34570/console" [puppet] - 10https://gerrit.wikimedia.org/r/773806 (https://phabricator.wikimedia.org/T304702) (owner: 10Vivian Rook)
[18:43:24] <wikibugs>	 10SRE, 10Patch-For-Review, 10Service-deployment-requests: New Service Request miscweb - https://phabricator.wikimedia.org/T281538 (10Dzahn)
[18:44:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23123 and previous config saved to /var/cache/conftool/dbconfig/20220325-184406-ladsgroup.json
[18:44:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:44:11] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[18:51:53] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:59:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23124 and previous config saved to /var/cache/conftool/dbconfig/20220325-185911-ladsgroup.json
[18:59:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:01:09] <wikibugs>	 (03PS1) 10Dzahn: dumps: add description for Bugzilla HTML dump file [puppet] - 10https://gerrit.wikimedia.org/r/773832 (https://phabricator.wikimedia.org/T284193)
[19:02:14] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "just a few words of description in HTML" [puppet] - 10https://gerrit.wikimedia.org/r/773832 (https://phabricator.wikimedia.org/T284193) (owner: 10Dzahn)
[19:10:10] <mutante>	 !log copying dump from deploy server to dumps server: scp -3 deploy1002.eqiad.wmnet:/srv/miscweb/static-bugzilla.tar.gz labstore1006.wikimedia.org:~ (T284193)
[19:10:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:16] <stashbot>	 T284193: put static-bugzilla HTML dump on dumps servers - https://phabricator.wikimedia.org/T284193
[19:10:34] <wikibugs>	 10SRE, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10Quiddity) @Ottomata Ping in case this fresh example helps. It's unclear from the last engineer comment above (Petr's at T93049#659...
[19:14:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23125 and previous config saved to /var/cache/conftool/dbconfig/20220325-191416-ladsgroup.json
[19:14:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:29:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23126 and previous config saved to /var/cache/conftool/dbconfig/20220325-192923-ladsgroup.json
[19:29:24] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[19:29:26] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[19:29:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:29:30] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[19:29:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:29:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:34:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job trafficserver in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:36:51] <icinga-wm>	 RECOVERY - SSH on thumbor2004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:51:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[19:51:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[19:51:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23127 and previous config saved to /var/cache/conftool/dbconfig/20220325-195137-ladsgroup.json
[19:51:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:44] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[19:56:58] <mutante>	 !Log deploy1002 - removing /srv/miscweb and files inside it, moved to dumps, was only needed temporary, meanwhile inside the container repo for k8s as well (T284193), cleaning up deploy1002
[19:56:58] <stashbot>	 T284193: put static-bugzilla HTML dump on dumps servers - https://phabricator.wikimedia.org/T284193
[19:58:20] <wikibugs>	 10SRE, 10Patch-For-Review, 10Service-deployment-requests: New Service Request miscweb - https://phabricator.wikimedia.org/T281538 (10Dzahn)
[20:05:36] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] "I'd argue to set `istio_sidecar_proxy: true` for the ml-clusters in this patch as well to have the new helmfile rendered (and the diff vis" [deployment-charts] - 10https://gerrit.wikimedia.org/r/773565 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[20:06:24] <wikibugs>	 (03PS1) 10Dzahn: puppetmaster:geoip: stop trying to download GeoIP1 legacy databases [puppet] - 10https://gerrit.wikimedia.org/r/773843 (https://phabricator.wikimedia.org/T303464)
[20:09:12] <dwisehaupt>	 away
[20:10:39] <wikibugs>	 (03PS1) 10Dzahn: geoip::maxmind: remove code for absenting old resources [puppet] - 10https://gerrit.wikimedia.org/r/773844 (https://phabricator.wikimedia.org/T303464)
[20:15:14] <wikibugs>	 (03PS1) 10Dzahn: geoip::maxmind: rename the legacy timer to geoip2 [puppet] - 10https://gerrit.wikimedia.org/r/773845 (https://phabricator.wikimedia.org/T303464)
[20:16:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23128 and previous config saved to /var/cache/conftool/dbconfig/20220325-201613-ladsgroup.json
[20:16:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:19] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:31:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23129 and previous config saved to /var/cache/conftool/dbconfig/20220325-203118-ladsgroup.json
[20:31:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:42:01] <wikibugs>	 10SRE, 10Wikimedia-Etherpad, 10serviceops: Etherpads corrupted - https://phabricator.wikimedia.org/T304005 (10Dzahn) Hello @Zapipedia-WMF   so I think what happened here is, the first case was likely caused by me doing the maintenance because I had the service running from 2 servers at the same time. I knew...
[20:46:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23130 and previous config saved to /var/cache/conftool/dbconfig/20220325-204623-ladsgroup.json
[20:46:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:51:13] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Email spam from varying tawk.email addresses - https://phabricator.wikimedia.org/T304390 (10Quiddity) That resulted in an error message: ` An error occurred: Invalid Parameter "email": Expected a valid email address or regular expression, got .+\.tawk\.email$. ` Looking at the...
[20:59:29] <wikibugs>	 (03PS2) 10JMeybohm: Allow to specify additional gatewayHosts without overriding the default [deployment-charts] - 10https://gerrit.wikimedia.org/r/773805 (https://phabricator.wikimedia.org/T290966)
[21:01:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23131 and previous config saved to /var/cache/conftool/dbconfig/20220325-210128-ladsgroup.json
[21:01:30] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
[21:01:31] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
[21:01:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:34] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[21:01:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23132 and previous config saved to /var/cache/conftool/dbconfig/20220325-210136-ladsgroup.json
[21:01:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:52] <wikibugs>	 10SRE, 10Wikimedia-Etherpad, 10serviceops: Etherpads corrupted - https://phabricator.wikimedia.org/T304005 (10Dzahn) I documented this at  https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org#Restoring_a_pad_to_a_previous_revision
[21:03:01] <wikibugs>	 10SRE, 10Wikimedia-Etherpad, 10serviceops: Etherpads corrupted - https://phabricator.wikimedia.org/T304005 (10Dzahn) 05Open→03Resolved claiming resolved, let me know if you agree
[21:03:44] <wikibugs>	 (03CR) 10JMeybohm: "I think we're almost good to go. I've another small patch to scaffolding and _ingress_helper that I would like to rebase this on, though: " [deployment-charts] - 10https://gerrit.wikimedia.org/r/764375 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[21:06:22] <wikibugs>	 10SRE, 10Data-Engineering, 10Traffic, 10Trust-and-Safety, and 2 others: Disable GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10Dzahn) a:03Dzahn
[21:08:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23133 and previous config saved to /var/cache/conftool/dbconfig/20220325-210831-ladsgroup.json
[21:08:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:08:38] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[21:18:36] <wikibugs>	 (03Abandoned) 10Dzahn: geoip::data::maxmind: deactivate timer for downloading of legacy DBs [puppet] - 10https://gerrit.wikimedia.org/r/773648 (https://phabricator.wikimedia.org/T303464) (owner: 10Dzahn)
[21:21:36] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] zuul: stop keeping reflog on the mergers [puppet] - 10https://gerrit.wikimedia.org/r/757943 (owner: 10Hashar)
[21:23:04] <hashar>	 mutante: oh thanks, I kind of forgot about those zuul-merger git settings :D
[21:23:33] <mutante>	 hashar: no problem, same here but backlog after vacation. 
[21:23:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23134 and previous config saved to /var/cache/conftool/dbconfig/20220325-212336-ladsgroup.json
[21:23:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:24:01] <mutante>	 hashar: I assume I don't need to go to an actual merger now
[21:24:23] <mutante>	 I did compare git::userconfig to another case, git option makes sense 
[21:24:40] <hashar>	 yeah
[21:24:46] <mutante>	 ok, cool
[21:24:56] <wikibugs>	 (03PS3) 10Hashar: zuul: prune heads and tags on each fetches [puppet] - 10https://gerrit.wikimedia.org/r/757944 (https://phabricator.wikimedia.org/T220606)
[21:25:19] <hashar>	 mutante: and there is a 2nd one which I have rebased https://gerrit.wikimedia.org/r/c/operations/puppet/+/757944/ :]
[21:25:42] <hashar>	 to prune branches and tags when fetching from Gerrit
[21:25:59] <hashar>	 else deleted tags stay behind on the zuul-merger git repos
[21:26:03] <mutante>	 yea, I saw that.. well..since you are here.. let's do that too
[21:26:12] <mutante>	 the previous one seemed safer
[21:26:12] <hashar>	 and the branches keep accumulating, notably the wmf/* branches for mediawiki repos :]
[21:26:21] <mutante>	 ack
[21:26:41] <hashar>	 I have manually pruned the branches and tags a few weeks ago
[21:26:51] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] zuul: prune heads and tags on each fetches [puppet] - 10https://gerrit.wikimedia.org/r/757944 (https://phabricator.wikimedia.org/T220606) (owner: 10Hashar)
[21:27:01] <mutante>	 great. here we go
[21:27:05] <hashar>	 I should pay more attention to the patch I send for review
[21:27:16] <wikibugs>	 (03CR) 10Krinkle: Relax CSP rules for taint-check-demo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/680337 (https://phabricator.wikimedia.org/T257301) (owner: 10Daimona Eaytoy)
[21:27:38] <mutante>	 hashar: merged on puppetmaster. wanna check in cloud?
[21:28:24] <hashar>	 they run on contint2001 and contint1001, I am checking puppet
[21:29:10] <mutante>	 oh, zuul::merger, of course. doing that too
[21:29:21] <hashar>	 $ sudo -H -u zuul git config --list
[21:29:21] <hashar>	 protocol.version=2
[21:29:21] <hashar>	 core.logallrefupdates=false
[21:29:21] <hashar>	 fetch.prune=true
[21:29:21] <hashar>	 fetch.prunetags=true
[21:30:12] <mutante>	 ack:) looks good
[21:31:32] <hashar>	 yeah that will make the git operations slightly faster
[21:31:45] <hashar>	 will close the tasks on monday after I have verified
[21:31:54] <mutante>	 tells performance team, hehe
[21:32:07] <mutante>	 cool, have a good weekend then
[21:33:44] <hashar>	 :]
[21:33:50] <hashar>	 danke schon have a merry week-end
[21:34:14] <mutante>	 de rien
[21:36:53] <wikibugs>	 (03PS3) 10Dzahn: aptrepo: import gitlab-runner package for bullseye [puppet] - 10https://gerrit.wikimedia.org/r/767604 (https://phabricator.wikimedia.org/T297659)
[21:37:51] <wikibugs>	 (03CR) 10Dzahn: "Is this what you meant, Moritz? Not exactly line 214 (because it it's sorted alpha) but that must be the file you mean.. only that has "Up" [puppet] - 10https://gerrit.wikimedia.org/r/767604 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[21:38:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23135 and previous config saved to /var/cache/conftool/dbconfig/20220325-213841-ladsgroup.json
[21:38:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:50:05] <icinga-wm>	 RECOVERY - SSH on db2090.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:53:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23136 and previous config saved to /var/cache/conftool/dbconfig/20220325-215346-ladsgroup.json
[21:53:49] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[21:53:50] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[21:53:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:53:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[21:53:52] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[21:53:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:53:55] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[21:53:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23137 and previous config saved to /var/cache/conftool/dbconfig/20220325-215400-ladsgroup.json
[21:54:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:58:38] <wikibugs>	 (03PS1) 10Krinkle: [WIP] wgKartographerStaticMapframe [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773883
[22:07:09] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:13:49] <wikibugs>	 (03PS2) 10Krinkle: [WIP] wgKartographerStaticMapframe [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773883
[22:16:07] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:16:07] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.4355 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[22:18:10] <wikibugs>	 (03PS3) 10Krinkle: [WIP] wgKartographerStaticMapframe [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773883
[22:19:10] <wikibugs>	 (03PS4) 10Krinkle: [WIP] wgKartographerStaticMapframe [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773883
[22:20:08] <wikibugs>	 (03PS5) 10Krinkle: [WIP] wgKartographerStaticMapframe [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773883
[22:20:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23138 and previous config saved to /var/cache/conftool/dbconfig/20220325-222025-ladsgroup.json
[22:20:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:20:30] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[22:20:35] <icinga-wm>	 RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: (C)0.3 gt (W)0.1 gt 0.04839 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[22:35:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23139 and previous config saved to /var/cache/conftool/dbconfig/20220325-223530-ladsgroup.json
[22:35:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:36:17] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3871 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[22:50:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23140 and previous config saved to /var/cache/conftool/dbconfig/20220325-225035-ladsgroup.json
[22:50:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:03:13] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3226 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[23:05:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23141 and previous config saved to /var/cache/conftool/dbconfig/20220325-230540-ladsgroup.json
[23:05:42] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[23:05:43] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[23:05:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:05:46] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[23:05:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:05:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:06:24] <wikibugs>	 (03PS6) 10Krinkle: List Kartographer static map exemptions and document+flip default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773883 (https://phabricator.wikimedia.org/T291736)
[23:30:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[23:30:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[23:30:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:30:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:34:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job trafficserver in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[23:34:03] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.3387 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[23:36:28] <wikibugs>	 (03CR) 10Awight: [C: 03+1] "Seems like a good idea, and safe to experiment with.  Is there any monitoring that we can use to measure the improvement?  Or maybe not be" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773883 (https://phabricator.wikimedia.org/T291736) (owner: 10Krinkle)
[23:42:19] <wikibugs>	 (03CR) 10Krinkle: List Kartographer static map exemptions and document+flip default (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773883 (https://phabricator.wikimedia.org/T291736) (owner: 10Krinkle)
[23:43:01] <icinga-wm>	 RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: (C)0.3 gt (W)0.1 gt 0.08065 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[23:53:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[23:53:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[23:53:04] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[23:53:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:53:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:53:10] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[23:53:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:53:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:58:49] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[23:58:50] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[23:58:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:58:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:58:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23142 and previous config saved to /var/cache/conftool/dbconfig/20220325-235855-ladsgroup.json
[23:59:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:59:00] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565