[00:03:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (8) rsyslog on kubernetes1014:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:04:54] <icinga-wm>	 RECOVERY - Check systemd state on gitlab1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:05:59] <mutante>	 !log rsyncing /root and /mnt/gitlab-backup of gitlab1001 to /srv/gitlab-backup on gitlab1004 (/srv/gitlab-backup was automounted after creating it and has > 200G free) T274463
[00:06:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:06:08] <stashbot>	 T274463: Backups for GitLab - https://phabricator.wikimedia.org/T274463
[00:09:13] <wikibugs>	 (03PS5) 10Cwhite: opensearch_dashboards: add backup script enable job [puppet] - 10https://gerrit.wikimedia.org/r/798886 (https://phabricator.wikimedia.org/T237224)
[00:09:40] <icinga-wm>	 PROBLEM - Check systemd state on gitlab1001 is CRITICAL: CRITICAL - degraded: The following units failed: full-backup.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:09:50] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] opensearch_dashboards: add backup script enable job [puppet] - 10https://gerrit.wikimedia.org/r/798886 (https://phabricator.wikimedia.org/T237224) (owner: 10Cwhite)
[00:09:54] <icinga-wm>	 PROBLEM - etcd request latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 operation={get,list,listWithCount,update} https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[00:10:37] <wikibugs>	 (03PS6) 10Cwhite: opensearch_dashboards: add backup script enable job [puppet] - 10https://gerrit.wikimedia.org/r/798886 (https://phabricator.wikimedia.org/T237224)
[00:11:16] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] opensearch_dashboards: add backup script enable job [puppet] - 10https://gerrit.wikimedia.org/r/798886 (https://phabricator.wikimedia.org/T237224) (owner: 10Cwhite)
[00:12:09] <wikibugs>	 (03CR) 10Cwhite: opensearch_dashboards: add backup script enable job (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/798886 (https://phabricator.wikimedia.org/T237224) (owner: 10Cwhite)
[00:15:19] <wikibugs>	 (03PS7) 10Cwhite: opensearch_dashboards: add backup script enable job [puppet] - 10https://gerrit.wikimedia.org/r/798886 (https://phabricator.wikimedia.org/T237224)
[00:19:26] <wikibugs>	 (03PS1) 10Dzahn: backup: switch fileset for gitlab from /mnt to /srv [puppet] - 10https://gerrit.wikimedia.org/r/800357 (https://phabricator.wikimedia.org/T274463)
[00:22:34] <wikibugs>	 (03PS1) 10Dzahn: gitlab::dump: backup files on gitlab1004 in Bacula [puppet] - 10https://gerrit.wikimedia.org/r/800358 (https://phabricator.wikimedia.org/T274463)
[00:24:05] <wikibugs>	 (03CR) 10Dzahn: "not all paths in the file set exist on this host, only /srv/gitlab-backups but I would hope Bacula doesn't care and just skips what isn't " [puppet] - 10https://gerrit.wikimedia.org/r/800358 (https://phabricator.wikimedia.org/T274463) (owner: 10Dzahn)
[00:26:06] <icinga-wm>	 RECOVERY - etcd request latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[00:26:26] <icinga-wm>	 RECOVERY - Disk space on gitlab1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=gitlab1001&var-datasource=eqiad+prometheus/ops
[00:26:32] <mutante>	 !log gitlab1001 deleted backups from last 3 days after rsync to gitlab1004 - freeing disk space, starting the full-backup service once again, should finish now without running out of disk - T2744463
[00:26:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:26:54] <mutante>	 !log gitlab1001 deleted backups from last 3 days after rsync to gitlab1004 - freeing disk space, starting the full-backup service once again, should finish now without running out of disk - T274463
[00:26:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:27:00] <stashbot>	 T274463: Backups for GitLab - https://phabricator.wikimedia.org/T274463
[00:28:08] <icinga-wm>	 RECOVERY - Check systemd state on gitlab1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:31:03] <wikibugs>	 (03PS1) 10Dzahn: site/gitlab: make gitlab2002 another backup dump location [puppet] - 10https://gerrit.wikimedia.org/r/800366 (https://phabricator.wikimedia.org/T274463)
[00:31:39] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "not yet (gitlab1001 still tries to dump to /mnt) but soon and we need to not forget this" [puppet] - 10https://gerrit.wikimedia.org/r/800357 (https://phabricator.wikimedia.org/T274463) (owner: 10Dzahn)
[00:35:22] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/pcc-worker1003/35582/" [puppet] - 10https://gerrit.wikimedia.org/r/800366 (https://phabricator.wikimedia.org/T274463) (owner: 10Dzahn)
[00:40:55] <wikibugs>	 (03PS1) 10Dzahn: gitlab::dump: add gitlab1004 to allowed hosts [puppet] - 10https://gerrit.wikimedia.org/r/800384 (https://phabricator.wikimedia.org/T274463)
[00:42:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (3) rsyslog on kubestage1003:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:43:37] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] gitlab::dump: add gitlab1004 to allowed hosts [puppet] - 10https://gerrit.wikimedia.org/r/800384 (https://phabricator.wikimedia.org/T274463) (owner: 10Dzahn)
[00:45:55] <mutante>	 !log rsyncing /srv/gitlab-backup from gitlab1004 to gitlab2002 | systemctl status full-backup ..in progress on gitlab1001 - T274463
[00:46:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:46:02] <stashbot>	 T274463: Backups for GitLab - https://phabricator.wikimedia.org/T274463
[00:53:01] <icinga-wm>	 RECOVERY - SSH on labweb1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:54:54] <wikibugs>	 (03CR) 10Ori: "I cherry-picked this on the Beta Cluster puppet master and confirmed that logs from the function-* services made it to logstash." [puppet] - 10https://gerrit.wikimedia.org/r/800282 (https://phabricator.wikimedia.org/T309319) (owner: 10Ori)
[01:02:13] <icinga-wm>	 PROBLEM - SSH on wtp1025.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:07:09] <icinga-wm>	 PROBLEM - Docker registry HTTPS interface on registry1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Docker
[01:07:18] <jinxer-wm>	 (ProbeDown) firing: Service docker-registry:443 has failed probes (http_docker-registry_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=http - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:07:19] <jinxer-wm>	 (ProbeDown) firing: Service docker-registry:443 has failed probes (http_docker-registry_ip4) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=http - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:08:36] <rzl>	 👋 looking
[01:08:37] <icinga-wm>	 RECOVERY - Docker registry HTTPS interface on registry1003 is OK: HTTP OK: HTTP/1.1 200 OK - 3753 bytes in 6.016 second response time https://wikitech.wikimedia.org/wiki/Docker
[01:08:37] <icinga-wm>	 PROBLEM - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring
[01:09:09] <icinga-wm>	 PROBLEM - etcd request latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 operation={get,list,listWithCount,update} https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[01:12:18] <jinxer-wm>	 (ProbeDown) resolved: Service docker-registry:443 has failed probes (http_docker-registry_ip4) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=http - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:12:19] <jinxer-wm>	 (ProbeDown) resolved: Service docker-registry:443 has failed probes (http_docker-registry_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=http - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:12:30] <icinga-wm>	 ACKNOWLEDGEMENT - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn backup in progress https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring
[01:12:30] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on gitlab1001 is CRITICAL: DISK CRITICAL - free space: /mnt/gitlab-backup 2956 MB (3% inode=99%): daniel_zahn backup in progress https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=gitlab1001&var-datasource=eqiad+prometheus/ops
[01:13:34] <rzl>	 having trouble loading grafana dashboards
[01:13:49] <icinga-wm>	 PROBLEM - PHP7 rendering on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[01:13:55] <mutante>	 kind of busy handling gitlab alerts
[01:14:10] <rzl>	 ack
[01:14:11] <mutante>	 but on that one. only got the resolved page now
[01:14:49] <rzl>	 grafana1002 is super sluggish over ssh talso
[01:14:51] <rzl>	 *also
[01:15:39] <icinga-wm>	 RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 122712 bytes in 1.748 second response time https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring
[01:15:41] <icinga-wm>	 RECOVERY - PHP7 rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 302 Found - 566 bytes in 6.714 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[01:15:50] <mutante>	 ^ ok.. that for now
[01:16:12] <rzl>	 is it possible the gitlab backup caused some network saturation?
[01:16:15] <mutante>	 grafana1002 - i can get on it and load seems to recover
[01:16:21] <mutante>	 yes, it is
[01:16:31] <mutante>	 I am still copying actually
[01:16:48] <mutante>	 but I can stop it 
[01:17:00] <mutante>	 I just wanted to feel better by having a second copy in the other DC
[01:17:13] <mutante>	 because.. long story, but otherwise we only had one copy
[01:17:21] <mutante>	 and backups need to be fixed
[01:19:29] <mutante>	 rzl: I have 37G of 45G I need .. hrmm
[01:19:52] <mutante>	 41 now
[01:20:21] <rzl>	 mutante: understood -- grafana is recovering but still sluggish, 
[01:20:22] <mutante>	 grafana dashboard loading
[01:20:27] <rzl>	 well there was supposed to be a link there
[01:20:34] <rzl>	 https://grafana.wikimedia.org/goto/d6GuDM9nk?orgId=1
[01:21:09] <rzl>	 looking into docker-registry and the other stuff that hiccuped too, not sure if they shared a common row or something
[01:21:19] <rzl>	 I mean, all VMs, but maybe on the same host
[01:21:35] <mutante>	 nothing happened when I copied within the same DC. then I got the idea to also copy cross DC
[01:21:40] <mutante>	 it's rack A1
[01:21:49] <rzl>	 if we do that again, maybe we throttle the transfer :)
[01:21:54] <mutante>	 setting that to active in netbox as well
[01:22:42] <mutante>	 the source is the same ganeti host I guess
[01:23:13] <rzl>	 and ack, re A1 -- I can see the traffic increases in librenms but I'm not network-smart enough to know if that was actually the cause of those healthcheck failures -- seems plausible though
[01:23:18] <mutante>	 absolutely, will use --bandwidth something if I do that again (I hope that's not the case that we have to)
[01:23:33] <rzl>	 I'm going to wander back off if you're all set, then :) thanks
[01:24:07] <mutante>	 sorry about the alert. thanks 
[01:24:20] <mutante>	 it's a mess with the gitlab backups :(
[01:24:39] <mutante>	 and I did that to avoid Murphy's law, not trigger it, heh
[01:25:25] <mutante>	 the good part: at least we _have_ a complete full backup now. that wasn't the case. laters
[01:27:07] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 verb={LIST,PATCH,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[01:28:08] <mutante>	 rsync finished
[01:30:33] <icinga-wm>	 RECOVERY - Check systemd state on gitlab1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:32:14] <mutante>	 woah, and that means restore of the backup on the passive host worked as well.. that's good
[01:33:27] <icinga-wm>	 RECOVERY - etcd request latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=28
[01:33:29] <icinga-wm>	 RECOVERY - k8s API server requests latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[01:38:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:43:45] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:45:19] <icinga-wm>	 PROBLEM - puppet last run on gitlab1003 is CRITICAL: CRITICAL: Puppet last ran 1 day ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[01:49:13] <icinga-wm>	 PROBLEM - SSH on wtp1046.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:50:29] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1039 is CRITICAL: CRITICAL - degraded: The following units failed: session-341469.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:57:25] <icinga-wm>	 RECOVERY - puppet last run on gitlab1003 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[02:01:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298560)', diff saved to https://phabricator.wikimedia.org/P28618 and previous config saved to /var/cache/conftool/dbconfig/20220527-020111-ladsgroup.json
[02:01:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:01:20] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[02:03:07] <icinga-wm>	 RECOVERY - SSH on wtp1025.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:16:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P28619 and previous config saved to /var/cache/conftool/dbconfig/20220527-021616-ladsgroup.json
[02:16:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:25:03] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:31:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P28620 and previous config saved to /var/cache/conftool/dbconfig/20220527-023122-ladsgroup.json
[02:31:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:46:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298560)', diff saved to https://phabricator.wikimedia.org/P28621 and previous config saved to /var/cache/conftool/dbconfig/20220527-024627-ladsgroup.json
[02:46:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
[02:46:32] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
[02:46:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:46:34] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
[02:46:34] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[02:46:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:46:38] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
[02:46:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:46:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:46:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:50:13] <icinga-wm>	 RECOVERY - SSH on wtp1046.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:50:43] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:52:45] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqsin is OK: OK: host 103.102.166.130, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:13:55] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 111 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[03:22:03] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:24:25] <icinga-wm>	 PROBLEM - Check systemd state on doc1001 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-doc1002.eqiad.wmnet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:26:05] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1004 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.069 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:50:51] <DannyS712>	 jenkins seems to think that every patch is a merge conflict
[03:50:51] <DannyS712>	 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/799430/14
[03:50:51] <DannyS712>	 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/SyntaxHighlight_GeSHi/+/793622
[03:51:54] <DannyS712>	 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/examples/+/799464 literally just creating an empty patch
[03:58:32] <DannyS712>	 ^ reported at T309371
[03:58:33] <stashbot>	 T309371: Gerrit: all patches are being reported as merge conflicts - https://phabricator.wikimedia.org/T309371
[04:03:14] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (8) rsyslog on kubernetes1014:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[04:18:29] <icinga-wm>	 RECOVERY - Check systemd state on doc1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:42:14] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (3) rsyslog on kubestage1003:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[04:54:03] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:54:41] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:05:09] <icinga-wm>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:06:43] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqsin is OK: OK: host 103.102.166.130, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:19:00] <wikibugs>	 10SRE, 10ops-eqiad: db1128 faulty memory - https://phabricator.wikimedia.org/T309291 (10Marostegui) Sounds good @wiki_willy - let us know when we'd need to schedule some downtime for the host. Thanks!
[05:35:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: After ugprading mysql', diff saved to https://phabricator.wikimedia.org/P28622 and previous config saved to /var/cache/conftool/dbconfig/20220527-053510-root.json
[05:35:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:37:30] <wikibugs>	 (03PS1) 10Marostegui: control-mariadb-10.6-bullseye: Bump version [software] - 10https://gerrit.wikimedia.org/r/800596 (https://phabricator.wikimedia.org/T308915)
[05:37:38] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] control-mariadb-10.6-bullseye: Bump version [software] - 10https://gerrit.wikimedia.org/r/800596 (https://phabricator.wikimedia.org/T308915) (owner: 10Marostegui)
[05:39:24] <marostegui>	 uh?
[05:39:56] <wikibugs>	 (03CR) 10Marostegui: "recheck" [software] - 10https://gerrit.wikimedia.org/r/800596 (https://phabricator.wikimedia.org/T308915) (owner: 10Marostegui)
[05:40:47] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] control-mariadb-10.6-bullseye: Bump version [software] - 10https://gerrit.wikimedia.org/r/800596 (https://phabricator.wikimedia.org/T308915) (owner: 10Marostegui)
[05:41:21] <wikibugs>	 (03Merged) 10jenkins-bot: control-mariadb-10.6-bullseye: Bump version [software] - 10https://gerrit.wikimedia.org/r/800596 (https://phabricator.wikimedia.org/T308915) (owner: 10Marostegui)
[05:48:40] <wikibugs>	 10SRE, 10Cloud-Services, 10Datasets-General-or-Unknown, 10affects-Kiwix-and-openZIM: Mirror more Kiwix downloads directories - https://phabricator.wikimedia.org/T57503 (10Kelson) @ArielGlenn Can you please reassign the ticket? I have no clue who - concretly - is WMCS?
[05:50:15] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: After ugprading mysql', diff saved to https://phabricator.wikimedia.org/P28623 and previous config saved to /var/cache/conftool/dbconfig/20220527-055014-root.json
[05:50:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:05:18] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: After ugprading mysql', diff saved to https://phabricator.wikimedia.org/P28624 and previous config saved to /var/cache/conftool/dbconfig/20220527-060518-root.json
[06:05:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:07:49] <icinga-wm>	 PROBLEM - SSH on wtp1025.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:11:21] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:12:35] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=eqiad&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1
[06:17:09] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=eqiad&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1
[06:20:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: After ugprading mysql', diff saved to https://phabricator.wikimedia.org/P28625 and previous config saved to /var/cache/conftool/dbconfig/20220527-062022-root.json
[06:20:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:20:46] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to contint-admins for taavi - https://phabricator.wikimedia.org/T309375 (10Majavah)
[06:20:47] <wikibugs>	 (03Abandoned) 10Elukey: Add Aiko and Kevin to the deployment group [puppet] - 10https://gerrit.wikimedia.org/r/791036 (https://phabricator.wikimedia.org/T307927) (owner: 10Elukey)
[06:29:39] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:32:31] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to contint-admins for taavi - https://phabricator.wikimedia.org/T309375 (10TheresNoTime) fwiw, +1 — be very useful to have an additional user who could resolve issues like {T309371}
[06:35:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: After ugprading mysql', diff saved to https://phabricator.wikimedia.org/P28626 and previous config saved to /var/cache/conftool/dbconfig/20220527-063525-root.json
[06:35:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:39:01] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance elastic1076-production-search-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[06:44:01] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) resolved: Elasticsearch instance elastic1076-production-search-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[06:46:26] <wikibugs>	 (03PS4) 10Slyngshede: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257
[06:46:35] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257 (owner: 10Slyngshede)
[06:47:27] <wikibugs>	 (03PS5) 10Slyngshede: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257
[06:47:37] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257 (owner: 10Slyngshede)
[06:50:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: After ugprading mysql', diff saved to https://phabricator.wikimedia.org/P28627 and previous config saved to /var/cache/conftool/dbconfig/20220527-065029-root.json
[06:50:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:52:30] <wikibugs>	 (03PS1) 10Slyngshede: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/800612
[06:52:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/800612 (owner: 10Slyngshede)
[06:53:22] <wikibugs>	 (03Abandoned) 10Slyngshede: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/800612 (owner: 10Slyngshede)
[06:55:52] <TheresNoTime>	 Anyone around who can restart zuul? https://phabricator.wikimedia.org/T308943#7947453 suggests it's the resolution to T309371
[06:55:53] <stashbot>	 T309371: Gerrit: all patches are being reported as merge conflicts - https://phabricator.wikimedia.org/T309371
[07:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220527T0700)
[07:01:53] <wikibugs>	 (03PS1) 10Slyngshede: P:hadoop::master - Remove Hadoop FairScheduler log cleanup. [puppet] - 10https://gerrit.wikimedia.org/r/800614
[07:02:02] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] P:hadoop::master - Remove Hadoop FairScheduler log cleanup. [puppet] - 10https://gerrit.wikimedia.org/r/800614 (owner: 10Slyngshede)
[07:02:41] <wikibugs>	 (03Abandoned) 10Slyngshede: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257 (owner: 10Slyngshede)
[07:12:16] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:17:29] <slyngs>	 TheresNoTime: I'll give it a whack, and see it behaves better
[07:21:58] <wikibugs>	 (03Restored) 10Slyngshede: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257 (owner: 10Slyngshede)
[07:22:55] <wikibugs>	 (03PS2) 10Slyngshede: P:hadoop::master - Remove Hadoop FairScheduler log cleanup [puppet] - 10https://gerrit.wikimedia.org/r/800614
[07:23:32] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] P:hadoop::master - Remove Hadoop FairScheduler log cleanup [puppet] - 10https://gerrit.wikimedia.org/r/800614 (owner: 10Slyngshede)
[07:24:56] <wikibugs>	 (03PS3) 10Slyngshede: P:hadoop::master - Remove Hadoop FairScheduler log cleanup [puppet] - 10https://gerrit.wikimedia.org/r/800614
[07:25:02] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to contint-admins for taavi - https://phabricator.wikimedia.org/T309375 (10Zabe)
[07:28:30] <wikibugs>	 (03PS6) 10Slyngshede: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257
[07:29:06] <wikibugs>	 (03CR) 10jenkins-bot: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257 (owner: 10Slyngshede)
[07:30:28] <wikibugs>	 (03PS7) 10Slyngshede: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257
[07:30:56] <slyngs>	 TheresNoTime: It seems to be running again.
[07:31:11] <TheresNoTime>	 slyngs: thank you! :D
[07:32:04] <_joe_>	 TheresNoTime: sigh sorry I was under the shower
[07:32:07] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] aptrepo: add opensearch2 thirdparty component [puppet] - 10https://gerrit.wikimedia.org/r/800294 (https://phabricator.wikimedia.org/T304440) (owner: 10Cwhite)
[07:32:38] <slyngs>	 _joe_: I view it as a learning experience... that and it explained why my own stuff broke :-)
[07:32:47] <_joe_>	 eheh 
[07:32:47] <TheresNoTime>	 :-P
[07:33:23] <slyngs>	 _joe_: Zuul on contint1001 is masked though, and I'm not sure it that's be design
[07:33:37] <_joe_>	 uhm
[07:34:24] <_joe_>	 I assume that's to get it not to restart automatically by package upgrades?
[07:34:36] <_joe_>	 but hasharAway will know better when he's back
[07:35:13] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/798886 (https://phabricator.wikimedia.org/T237224) (owner: 10Cwhite)
[07:37:38] <_joe_>	 slyngs: ah I see, contint2001 is the currently active server
[07:37:48] <_joe_>	 so that's why zuul is masked on 1001
[07:38:12] <TheresNoTime>	 that'll do it! :D
[07:38:36] <wikibugs>	 (03PS8) 10Slyngshede: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257
[07:39:10] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257 (owner: 10Slyngshede)
[07:39:20] <slyngs>	 _joe_: How do you check which server is active?
[07:39:46] <_joe_>	 slyngs: see operations/puppet/hieradata/hosts/contint1001.yaml and the corresponding for 2001
[07:40:21] <wikibugs>	 (03PS9) 10Slyngshede: Remove cleanup on unused Fairscheduler for Hadoop. [puppet] - 10https://gerrit.wikimedia.org/r/799257
[07:40:40] <_joe_>	 slyngs: for multi-dc stuff that is used to serve user traffcic, we have discovery.wmnet records pointing to the nearest active cluster for you
[07:45:16] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35584/console" [puppet] - 10https://gerrit.wikimedia.org/r/799257 (owner: 10Slyngshede)
[07:47:15] <wikibugs>	 (03CR) 10Slyngshede: "Updated patch to remove timer and cleanup script." [puppet] - 10https://gerrit.wikimedia.org/r/799257 (owner: 10Slyngshede)
[07:47:56] <wikibugs>	 (03Abandoned) 10Slyngshede: P:hadoop::master - Remove Hadoop FairScheduler log cleanup [puppet] - 10https://gerrit.wikimedia.org/r/800614 (owner: 10Slyngshede)
[07:59:39] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:03:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (8) rsyslog on kubernetes1014:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:07:50] <_joe_>	 !log restarted rsyslog on kubernetes1014
[08:07:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:12:59] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (8) rsyslog on kubernetes1014:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:27:59] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: (8) rsyslog on kubernetes1014:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:31:58] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: (3) rsyslog on kubestage1003:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:44:47] <wikibugs>	 (03PS4) 10Filippo Giunchedi: puppetdb: create dbs before grants [puppet] - 10https://gerrit.wikimedia.org/r/800031 (https://phabricator.wikimedia.org/T296550)
[08:44:49] <wikibugs>	 (03PS7) 10Filippo Giunchedi: cfssl: write pretty json [puppet] - 10https://gerrit.wikimedia.org/r/800029
[08:45:54] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] cfssl: write pretty json [puppet] - 10https://gerrit.wikimedia.org/r/800029 (owner: 10Filippo Giunchedi)
[08:46:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: puppetdb: create dbs before grants (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/800031 (https://phabricator.wikimedia.org/T296550) (owner: 10Filippo Giunchedi)
[08:48:07] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "Thanks for the review!" [puppet] - 10https://gerrit.wikimedia.org/r/793714 (https://phabricator.wikimedia.org/T302232) (owner: 10Elukey)
[08:48:09] <wikibugs>	 (03PS8) 10Filippo Giunchedi: cfssl: write pretty json [puppet] - 10https://gerrit.wikimedia.org/r/800029
[08:48:29] <wikibugs>	 (03PS11) 10Elukey: Add new Cassandra cluster for ML cache/feature-store workloads in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/793714 (https://phabricator.wikimedia.org/T302232)
[08:49:26] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35585/console" [puppet] - 10https://gerrit.wikimedia.org/r/793714 (https://phabricator.wikimedia.org/T302232) (owner: 10Elukey)
[08:54:26] <wikibugs>	 (03PS1) 10Jelto: idp: add gitlab-new to idp [puppet] - 10https://gerrit.wikimedia.org/r/800666 (https://phabricator.wikimedia.org/T307142)
[08:58:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[08:58:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[08:58:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:58:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1175 (T298560)', diff saved to https://phabricator.wikimedia.org/P28629 and previous config saved to /var/cache/conftool/dbconfig/20220527-085819-ladsgroup.json
[08:58:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:58:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:58:26] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[09:01:55] <icinga-wm>	 PROBLEM - SSH on labweb1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:05:09] <icinga-wm>	 PROBLEM - SSH on cp5012.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:08:09] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: add euwiki & fawiki articlequality isvcs [deployment-charts] - 10https://gerrit.wikimedia.org/r/800670 (https://phabricator.wikimedia.org/T307418)
[09:16:31] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "Left a note that can be bypassed, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/799257 (owner: 10Slyngshede)
[09:17:01] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] ml-services: add euwiki & fawiki articlequality isvcs [deployment-charts] - 10https://gerrit.wikimedia.org/r/800670 (https://phabricator.wikimedia.org/T307418) (owner: 10Kevin Bazira)
[09:17:14] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] wikimedia.org: add gitlab-new records + PTR [dns] - 10https://gerrit.wikimedia.org/r/799334 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[09:17:16] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "LGTM thanks" [puppet] - 10https://gerrit.wikimedia.org/r/800252 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[09:17:25] <wikibugs>	 (03PS3) 10Jelto: wikimedia.org: add gitlab-new records + PTR [dns] - 10https://gerrit.wikimedia.org/r/799334 (https://phabricator.wikimedia.org/T307142)
[09:17:48] <wikibugs>	 (03CR) 10AikoChou: [C: 03+1] ml-services: add euwiki & fawiki articlequality isvcs [deployment-charts] - 10https://gerrit.wikimedia.org/r/800670 (https://phabricator.wikimedia.org/T307418) (owner: 10Kevin Bazira)
[09:17:59] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to PII in Superset for TheresNoTime - https://phabricator.wikimedia.org/T309383 (10TheresNoTime)
[09:18:24] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] profile::auto_restarts: make comment match class name, minor grammar [puppet] - 10https://gerrit.wikimedia.org/r/800235 (owner: 10Dzahn)
[09:19:31] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] parsoid::testing: add an auto_restart service for nginx [puppet] - 10https://gerrit.wikimedia.org/r/800241 (owner: 10Dzahn)
[09:22:27] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[09:22:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[09:22:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:22:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1127 (T298560)', diff saved to https://phabricator.wikimedia.org/P28630 and previous config saved to /var/cache/conftool/dbconfig/20220527-092233-ladsgroup.json
[09:22:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:22:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:22:41] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[09:23:11] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to PII in Superset for TheresNoTime - https://phabricator.wikimedia.org/T309383 (10KSiebert) I am Sammy's manager and give my permission.
[09:24:41] <Amir1>	 !log killed hewiki's refresh link suggestions job (T299021)
[09:24:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:47] <stashbot>	 T299021: Shorten running time of refreshLinkRecommendations.php - https://phabricator.wikimedia.org/T299021
[09:26:01] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
[09:26:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:24] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
[09:26:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:01] <wikibugs>	 (03PS2) 10Jbond: resolvconf: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/800255 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[09:36:08] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/800031 (https://phabricator.wikimedia.org/T296550) (owner: 10Filippo Giunchedi)
[09:37:28] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: puppetdb postgress server: fix dependcey loop - https://phabricator.wikimedia.org/T296550 (10jbond) With filippos latest patch the only outstanding error is   ` May 27 08:10:10 filippo-pdb-01 puppet-agent[17242]: (/Stage[main]/Postgr...
[09:39:03] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/800029 (owner: 10Filippo Giunchedi)
[09:39:48] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "I fixed the spec test otherwise lgtm thanks" [puppet] - 10https://gerrit.wikimedia.org/r/800255 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[09:48:01] <jelto>	 !log run authdns-update for gitlab-new https://gerrit.wikimedia.org/r/c/operations/dns/+/799334 - T307142
[09:48:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:09] <stashbot>	 T307142: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142
[09:52:48] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mediawiki-httpd: correctly link expires.conf [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/800675 (https://phabricator.wikimedia.org/T309358)
[09:53:02] <wikibugs>	 (03PS1) 10Jbond: redfish: add `files` to the list of data parameters to request [software/spicerack] - 10https://gerrit.wikimedia.org/r/800676
[09:54:21] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] developer-portal: add to service catalog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/799429 (https://phabricator.wikimedia.org/T297140) (owner: 10BryanDavis)
[09:54:37] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] developer-portal: add to service catalog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/799429 (https://phabricator.wikimedia.org/T297140) (owner: 10BryanDavis)
[09:57:09] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/798664 (https://phabricator.wikimedia.org/T308308) (owner: 10Alexandros Kosiaris)
[09:59:19] <wikibugs>	 (03CR) 10Filippo Giunchedi: cfssl: write pretty json (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/800029 (owner: 10Filippo Giunchedi)
[10:01:15] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: admin: Add sgimeno to restricted [puppet] - 10https://gerrit.wikimedia.org/r/798667 (https://phabricator.wikimedia.org/T309045)
[10:01:37] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] admin: Add sgimeno to restricted [puppet] - 10https://gerrit.wikimedia.org/r/798667 (https://phabricator.wikimedia.org/T309045) (owner: 10Alexandros Kosiaris)
[10:02:47] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] mediawiki-httpd: correctly link expires.conf [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/800675 (https://phabricator.wikimedia.org/T309358) (owner: 10Giuseppe Lavagetto)
[10:03:05] <icinga-wm>	 RECOVERY - SSH on labweb1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:04:21] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: admin: Add sgimeno to restricted [puppet] - 10https://gerrit.wikimedia.org/r/798667 (https://phabricator.wikimedia.org/T309045)
[10:04:45] <wikibugs>	 (03CR) 10Alexandros Kosiaris: admin: Add sgimeno to restricted (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/798667 (https://phabricator.wikimedia.org/T309045) (owner: 10Alexandros Kosiaris)
[10:05:17] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "This is ready and awaits manager approval" [puppet] - 10https://gerrit.wikimedia.org/r/798667 (https://phabricator.wikimedia.org/T309045) (owner: 10Alexandros Kosiaris)
[10:05:44] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be1068.eqiad.wmnet with OS bullseye
[10:05:49] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be1068.eqiad.wmnet with OS bullseye
[10:05:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:06:52] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] mediawiki-httpd: correctly link expires.conf [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/800675 (https://phabricator.wikimedia.org/T309358) (owner: 10Giuseppe Lavagetto)
[10:12:28] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be1071.eqiad.wmnet with OS bullseye
[10:12:32] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be1071.eqiad.wmnet with OS bullseye
[10:12:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:20:16] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1068.eqiad.wmnet with reason: host reimage
[10:20:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:23:18] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[10:24:51] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1068.eqiad.wmnet with reason: host reimage
[10:24:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:15] <wikibugs>	 (03PS1) 10Ladsgroup: docroot: Improve design of noc.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/800680
[10:33:42] <wikibugs>	 (03PS1) 10Btullis: Configure .gitignore to exclude the vendor subdirectory [puppet] - 10https://gerrit.wikimedia.org/r/800681
[10:38:48] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1068.eqiad.wmnet with OS bullseye
[10:38:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:52] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be1068.eqiad.wmnet with OS bullseye completed: - ms-be1068 (**PASS**)   - Downtim...
[10:42:53] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] cfssl: write pretty json (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/800029 (owner: 10Filippo Giunchedi)
[10:43:25] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
[10:43:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:43:41] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] Configure .gitignore to exclude the vendor subdirectory [puppet] - 10https://gerrit.wikimedia.org/r/800681 (owner: 10Btullis)
[10:45:42] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS bullseye
[10:45:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:45:47] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be1069.eqiad.wmnet with OS bullseye
[10:46:31] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
[10:46:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:50:17] <icinga-wm>	 PROBLEM - puppet last run on gitlab2001 is CRITICAL: CRITICAL: Puppet last ran 1 day ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[10:52:44] <icinga-wm>	 RECOVERY - puppet last run on gitlab2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[10:56:02] <icinga-wm>	 PROBLEM - Host ms-be1070 is DOWN: PING CRITICAL - Packet loss = 100%
[10:57:28] <icinga-wm>	 RECOVERY - Host ms-be1070 is UP: PING OK - Packet loss = 0%, RTA = 0.17 ms
[10:58:50] <wikibugs>	 (03PS1) 10Gergő Tisza: Log output of scheduled MediaWiki maintenance scripts [puppet] - 10https://gerrit.wikimedia.org/r/800683 (https://phabricator.wikimedia.org/T285896)
[11:00:05] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
[11:00:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:01:29] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1071.eqiad.wmnet with OS bullseye
[11:01:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:01:33] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be1071.eqiad.wmnet with OS bullseye completed: - ms-be1071 (**PASS**)   - Downtim...
[11:03:13] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
[11:03:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:06:22] <icinga-wm>	 RECOVERY - SSH on cp5012.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:08:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkJobUnstable) firing: WCQS_Streaming_Updater in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[11:12:42] <icinga-wm>	 RECOVERY - SSH on wtp1025.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:12:54] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
[11:12:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:13:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkJobUnstable) resolved: WCQS_Streaming_Updater in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[11:15:14] <wikibugs>	 (03PS1) 10Jcrespo: mariabackup: Make the vendor detection account for known variations [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/800708 (https://phabricator.wikimedia.org/T309303)
[11:18:03] <wikibugs>	 (03PS1) 10Jelto: wikimedia.org: move gitlab-replica from netbox to dns repo [dns] - 10https://gerrit.wikimedia.org/r/800709 (https://phabricator.wikimedia.org/T307142)
[11:18:40] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
[11:18:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:58] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wikimedia.org: move gitlab-replica from netbox to dns repo [dns] - 10https://gerrit.wikimedia.org/r/800709 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[11:20:46] <wikibugs>	 (03PS2) 10Jcrespo: mariabackup: Make the vendor detection account for known variations [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/800708 (https://phabricator.wikimedia.org/T309303)
[11:21:53] <wikibugs>	 (03CR) 10Jelto: "fail expected, see https://wikitech.wikimedia.org/wiki/DNS/Netbox#Atomically_deploy_auto-generated_records_and_a_manual_change" [dns] - 10https://gerrit.wikimedia.org/r/800709 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[11:32:43] <wikibugs>	 (03PS1) 10Ladsgroup: Add change_user_editcount_to_unsigned_T309311.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/800710 (https://phabricator.wikimedia.org/T309311)
[11:33:34] <wikibugs>	 (03PS2) 10Jelto: wikimedia.org: move gitlab-replica from netbox to dns repo [dns] - 10https://gerrit.wikimedia.org/r/800709 (https://phabricator.wikimedia.org/T307142)
[11:33:59] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:34:28] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wikimedia.org: move gitlab-replica from netbox to dns repo [dns] - 10https://gerrit.wikimedia.org/r/800709 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[11:38:44] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be1070.eqiad.wmnet with OS bullseye
[11:38:49] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be1070.eqiad.wmnet with OS bullseye
[11:38:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:43:41] <logmsgbot>	 !log jnuche@deploy1002 install-world aborted:  (duration: 00m 02s)
[11:43:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:52:02] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
[11:52:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:55:09] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
[11:55:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:55:36] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1069.eqiad.wmnet with OS bullseye
[11:55:41] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be1069.eqiad.wmnet with OS bullseye completed: - ms-be1069 (**PASS**)   - Downtim...
[11:55:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35586/console" [puppet] - 10https://gerrit.wikimedia.org/r/800031 (https://phabricator.wikimedia.org/T296550) (owner: 10Filippo Giunchedi)
[12:09:04] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1070.eqiad.wmnet with OS bullseye
[12:09:07] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be1070.eqiad.wmnet with OS bullseye completed: - ms-be1070 (**PASS**)   - Downtim...
[12:09:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:12:23] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Add change_user_editcount_to_unsigned_T309311.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/800710 (https://phabricator.wikimedia.org/T309311) (owner: 10Ladsgroup)
[12:12:39] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Add drop_page_restrictions_T60674.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/800183 (https://phabricator.wikimedia.org/T60674) (owner: 10Ladsgroup)
[12:13:01] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Icinga: add page hashtag to paging host alerts [puppet] - 10https://gerrit.wikimedia.org/r/799903 (owner: 10Volans)
[12:14:17] <wikibugs>	 (03CR) 10Cathal Mooney: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/800709 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[12:14:50] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[12:14:51] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add drop_page_restrictions_T60674.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/800183 (https://phabricator.wikimedia.org/T60674) (owner: 10Ladsgroup)
[12:14:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:55] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add change_user_editcount_to_unsigned_T309311.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/800710 (https://phabricator.wikimedia.org/T309311) (owner: 10Ladsgroup)
[12:15:17] <wikibugs>	 (03Merged) 10jenkins-bot: Add drop_page_restrictions_T60674.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/800183 (https://phabricator.wikimedia.org/T60674) (owner: 10Ladsgroup)
[12:15:21] <wikibugs>	 (03Merged) 10jenkins-bot: Add change_user_editcount_to_unsigned_T309311.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/800710 (https://phabricator.wikimedia.org/T309311) (owner: 10Ladsgroup)
[12:17:15] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:17:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:19:19] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariabackup: Make the vendor detection account for known variations [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/800708 (https://phabricator.wikimedia.org/T309303) (owner: 10Jcrespo)
[12:22:05] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariabackup: Make the vendor detection account for known variations [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/800708 (https://phabricator.wikimedia.org/T309303) (owner: 10Jcrespo)
[12:23:36] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[12:23:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:24:20] <wikibugs>	 (03PS1) 10Urbanecm: throttle: Add new throttle rule + remove expired ones [mediawiki-config] - 10https://gerrit.wikimedia.org/r/800711 (https://phabricator.wikimedia.org/T309395)
[12:25:57] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:26:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:27:32] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[12:27:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:27:37] <wikibugs>	 10SRE, 10ops-eqiad: db1128 faulty memory - https://phabricator.wikimedia.org/T309291 (10Marostegui)
[12:32:39] <wikibugs>	 (03Restored) 10Matthias Mullie: Add ImageSuggestions to extension-list and config var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/766615 (https://phabricator.wikimedia.org/T302711) (owner: 10Matthias Mullie)
[12:32:49] <wikibugs>	 (03PS2) 10Matthias Mullie: Add ImageSuggestions to extension-list and config var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/766615 (https://phabricator.wikimedia.org/T302711)
[12:33:02] <wikibugs>	 (03Abandoned) 10Matthias Mullie: Add ImageSuggestions to extension-list and config var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/766615 (https://phabricator.wikimedia.org/T302711) (owner: 10Matthias Mullie)
[12:35:39] <logmsgbot>	 !log jnuche@deploy1002 install-world aborted:  (duration: 01m 32s)
[12:35:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:36:36] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:36:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:37:35] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "recheck" [dns] - 10https://gerrit.wikimedia.org/r/800709 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[12:40:41] <wikibugs>	 10SRE, 10SRE-OnFire (FY2021/2022-Q4), 10DBA, 10GlobalBlocking, 10Wikimedia-Incident: 2022-05-05 Wikimedia full site outage - https://phabricator.wikimedia.org/T307647 (10Marostegui)
[12:40:51] <wikibugs>	 10SRE-OnFire, 10DBA, 10Blocked-on-schema-change, 10Sustainability (Incident Followup): Adjust the field type of globalblocks timestamp columns to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T307501 (10Marostegui) 05Open→03Stalled Going to stall this until we are on 10.6
[12:41:21] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] wikimedia.org: move gitlab-replica from netbox to dns repo [dns] - 10https://gerrit.wikimedia.org/r/800709 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[12:43:26] <jelto>	 !log run authdns-update for gitlab-replica https://gerrit.wikimedia.org/r/c/operations/dns/+/800709 - T307142
[12:43:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:43:31] <stashbot>	 T307142: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142
[12:52:24] <logmsgbot>	 !log jnuche@deploy1002 install-world aborted:  (duration: 03m 22s)
[12:52:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:57:20] <wikibugs>	 (03PS1) 10Jelto: wikimedia.org: move gitlab from netbox to dns repo [dns] - 10https://gerrit.wikimedia.org/r/800719 (https://phabricator.wikimedia.org/T307142)
[12:58:09] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F - https://phabricator.wikimedia.org/T304989 (10cmooney)
[12:58:38] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wikimedia.org: move gitlab from netbox to dns repo [dns] - 10https://gerrit.wikimedia.org/r/800719 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[12:59:05] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Configure cloudsw1-e4-eqiad and cloudsw1-f4-eqiad - https://phabricator.wikimedia.org/T304936 (10cmooney) 05Open→03Resolved Work for this is now completed, will update design task once confirmed there are no niggles with reimaging.
[12:59:11] <wikibugs>	 (03PS1) 10Marostegui: site.pp: db1128 current situation [puppet] - 10https://gerrit.wikimedia.org/r/800720 (https://phabricator.wikimedia.org/T309303)
[13:00:03] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] site.pp: db1128 current situation [puppet] - 10https://gerrit.wikimedia.org/r/800720 (https://phabricator.wikimedia.org/T309303) (owner: 10Marostegui)
[13:00:46] <wikibugs>	 (03CR) 10Jelto: "fail expected, see https://wikitech.wikimedia.org/wiki/DNS/Netbox#Atomically_deploy_auto-generated_records_and_a_manual_change" [dns] - 10https://gerrit.wikimedia.org/r/800719 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[13:01:40] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[13:03:42] <wikibugs>	 (03CR) 10Cathal Mooney: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/800719 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[13:03:50] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM!" [dns] - 10https://gerrit.wikimedia.org/r/800719 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[13:04:04] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[13:04:05] <logmsgbot>	 !log cmooney@cumin1001 END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
[13:04:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:28] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[13:04:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:06] <icinga-wm>	 PROBLEM - SSH on labweb1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:08:44] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:08:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:08:51] <wikibugs>	 (03CR) 10Cathal Mooney: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/800719 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[13:10:18] <wikibugs>	 (03PS1) 10Elukey: ml-services: bump docker image for articlequality pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/800723
[13:12:15] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM!" [dns] - 10https://gerrit.wikimedia.org/r/800719 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[13:13:27] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] wikimedia.org: move gitlab from netbox to dns repo [dns] - 10https://gerrit.wikimedia.org/r/800719 (https://phabricator.wikimedia.org/T307142) (owner: 10Jelto)
[13:13:44] <wikibugs>	 (03CR) 10Kevin Bazira: [C: 03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/800723 (owner: 10Elukey)
[13:14:50] <jelto>	 !log run authdns-update for gitlab.wikimedia.org https://gerrit.wikimedia.org/r/c/operations/dns/+/800719 - T307142
[13:14:53] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] ml-services: bump docker image for articlequality pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/800723 (owner: 10Elukey)
[13:14:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:14:55] <stashbot>	 T307142: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142
[13:15:18] <wikibugs>	 (03CR) 10AikoChou: [C: 03+1] ml-services: bump docker image for articlequality pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/800723 (owner: 10Elukey)
[13:16:15] <logmsgbot>	 !log jnuche@deploy1002 install-world aborted:  (duration: 00m 25s)
[13:16:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:17:14] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
[13:17:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:19] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
[13:20:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:06] <wikibugs>	 (03CR) 10Jelto: [C: 04-1] "I have some concerns if that helps. The idea is bacula can catch backups on gitlab1004 when it's not possible on gitlab1001 due to disk is" [puppet] - 10https://gerrit.wikimedia.org/r/800358 (https://phabricator.wikimedia.org/T274463) (owner: 10Dzahn)
[13:27:55] <wikibugs>	 (03PS1) 10Cathal Mooney: Add includes in reverse DNS zone files for new cloudsw subnets [dns] - 10https://gerrit.wikimedia.org/r/800727 (https://phabricator.wikimedia.org/T304989)
[13:28:43] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add includes in reverse DNS zone files for new cloudsw subnets [dns] - 10https://gerrit.wikimedia.org/r/800727 (https://phabricator.wikimedia.org/T304989) (owner: 10Cathal Mooney)
[13:29:50] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[13:29:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:32] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[13:35:42] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:35:56] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:36:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:10] <wikibugs>	 (03PS1) 10Jelto: gitlab: use gitlab1004 as replia/passive host [puppet] - 10https://gerrit.wikimedia.org/r/800728 (https://phabricator.wikimedia.org/T307142)
[13:39:36] <wikibugs>	 (03PS1) 10Cathal Mooney: Add new per-rack cloudsw subnets for e4 and f4 to networks data [puppet] - 10https://gerrit.wikimedia.org/r/800730 (https://phabricator.wikimedia.org/T304989)
[13:39:47] <wikibugs>	 (03CR) 10Cathal Mooney: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/800727 (https://phabricator.wikimedia.org/T304989) (owner: 10Cathal Mooney)
[13:46:19] <wikibugs>	 (03PS1) 10Cathal Mooney: Install server changes to support new subnets cloud racks c8 and d5 [puppet] - 10https://gerrit.wikimedia.org/r/800731 (https://phabricator.wikimedia.org/T304989)
[13:47:15] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Add includes in reverse DNS zone files for new cloudsw subnets [dns] - 10https://gerrit.wikimedia.org/r/800727 (https://phabricator.wikimedia.org/T304989) (owner: 10Cathal Mooney)
[13:48:02] <wikibugs>	 (03CR) 10Cathal Mooney: "Self-merging as there is time sensitivity to the order of changes between zone file and Netbox.  Third similar change this week so I am co" [dns] - 10https://gerrit.wikimedia.org/r/800727 (https://phabricator.wikimedia.org/T304989) (owner: 10Cathal Mooney)
[13:48:40] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/changeprop: sync
[13:48:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:50] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
[13:48:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:43] <wikibugs>	 (03PS1) 10Elukey: ml-services: bump docker image for draftquality [deployment-charts] - 10https://gerrit.wikimedia.org/r/800732 (https://phabricator.wikimedia.org/T309102)
[13:49:50] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Add includes in reverse DNS zone files for new cloudsw subnets [dns] - 10https://gerrit.wikimedia.org/r/800727 (https://phabricator.wikimedia.org/T304989) (owner: 10Cathal Mooney)
[13:51:47] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[13:51:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:24] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:55:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:47] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] ml-services: bump docker image for draftquality [deployment-charts] - 10https://gerrit.wikimedia.org/r/800732 (https://phabricator.wikimedia.org/T309102) (owner: 10Elukey)
[13:56:35] <wikibugs>	 (03PS1) 10Andrew Bogott: Horizon: disable creation of new proxies under .wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/800735 (https://phabricator.wikimedia.org/T305391)
[13:56:38] <wikibugs>	 (03CR) 10Kevin Bazira: ml-services: bump docker image for draftquality (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/800732 (https://phabricator.wikimedia.org/T309102) (owner: 10Elukey)
[13:56:52] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[13:56:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:44] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
[13:57:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:06] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[14:00:08] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:00:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:56] <wikibugs>	 (03CR) 10Herron: [C: 03+1] opensearch_dashboards: add backup script enable job (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/798886 (https://phabricator.wikimedia.org/T237224) (owner: 10Cwhite)
[14:10:07] <wikibugs>	 (03CR) 10Herron: [C: 03+1] "👍" [puppet] - 10https://gerrit.wikimedia.org/r/800294 (https://phabricator.wikimedia.org/T304440) (owner: 10Cwhite)
[14:22:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[14:22:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[14:22:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1130 (T60674)', diff saved to https://phabricator.wikimedia.org/P28632 and previous config saved to /var/cache/conftool/dbconfig/20220527-142219-ladsgroup.json
[14:22:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:29] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[14:26:01] <wikibugs>	 (03PS1) 10Ladsgroup: Remove page_restrictions field from maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/800739 (https://phabricator.wikimedia.org/T60674)
[14:26:32] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Remove page_restrictions field from maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/800739 (https://phabricator.wikimedia.org/T60674) (owner: 10Ladsgroup)
[14:29:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1130 (T60674)', diff saved to https://phabricator.wikimedia.org/P28633 and previous config saved to /var/cache/conftool/dbconfig/20220527-142921-ladsgroup.json
[14:29:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:28] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[14:29:38] <icinga-wm>	 PROBLEM - SSH on druid1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:29:53] <wikibugs>	 (03PS2) 10Ladsgroup: Remove page_restrictions field from maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/800739 (https://phabricator.wikimedia.org/T60674)
[14:29:58] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Remove page_restrictions field from maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/800739 (https://phabricator.wikimedia.org/T60674) (owner: 10Ladsgroup)
[14:31:40] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[14:42:15] <wikibugs>	 (03PS1) 10Ladsgroup: Depool clouddb10(17|18|19|20) [puppet] - 10https://gerrit.wikimedia.org/r/800692 (https://phabricator.wikimedia.org/T60674)
[14:42:23] <wikibugs>	 (03PS2) 10Ladsgroup: Depool clouddb10(17|18|19|20) [puppet] - 10https://gerrit.wikimedia.org/r/800692 (https://phabricator.wikimedia.org/T60674)
[14:42:44] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Depool clouddb10(17|18|19|20) [puppet] - 10https://gerrit.wikimedia.org/r/800692 (https://phabricator.wikimedia.org/T60674) (owner: 10Ladsgroup)
[14:44:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P28634 and previous config saved to /var/cache/conftool/dbconfig/20220527-144426-ladsgroup.json
[14:44:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:16] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[14:51:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298560)', diff saved to https://phabricator.wikimedia.org/P28635 and previous config saved to /var/cache/conftool/dbconfig/20220527-145135-ladsgroup.json
[14:51:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:40] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[14:53:55] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic, 10User-zeljkofilipin: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10zeljkofilipin)
[14:59:22] <wikibugs>	 10SRE, 10Cloud-Services, 10Datasets-General-or-Unknown, 10affects-Kiwix-and-openZIM: Mirror more Kiwix downloads directories - https://phabricator.wikimedia.org/T57503 (10Aklapper) Unassigning (if I understand correctly); this is already tagged with #Cloud-Services
[14:59:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P28636 and previous config saved to /var/cache/conftool/dbconfig/20220527-145931-ladsgroup.json
[14:59:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:49] <wikibugs>	 10SRE, 10Cloud-Services, 10Datasets-General-or-Unknown, 10affects-Kiwix-and-openZIM: Mirror more Kiwix downloads directories - https://phabricator.wikimedia.org/T57503 (10Aklapper) a:05ArielGlenn→03None
[15:06:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P28637 and previous config saved to /var/cache/conftool/dbconfig/20220527-150640-ladsgroup.json
[15:06:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:32] <icinga-wm>	 RECOVERY - SSH on labweb1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:12:18] <wikibugs>	 (03PS1) 10Ladsgroup: dbproxy: Repool the old batch, Depool the new one [puppet] - 10https://gerrit.wikimedia.org/r/800693 (https://phabricator.wikimedia.org/T60674)
[15:12:27] <wikibugs>	 (03PS2) 10Ladsgroup: dbproxy: Repool the old batch, Depool the new one [puppet] - 10https://gerrit.wikimedia.org/r/800693 (https://phabricator.wikimedia.org/T60674)
[15:13:01] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] dbproxy: Repool the old batch, Depool the new one [puppet] - 10https://gerrit.wikimedia.org/r/800693 (https://phabricator.wikimedia.org/T60674) (owner: 10Ladsgroup)
[15:14:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1130 (T60674)', diff saved to https://phabricator.wikimedia.org/P28638 and previous config saved to /var/cache/conftool/dbconfig/20220527-151436-ladsgroup.json
[15:14:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[15:14:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[15:14:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:44] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[15:14:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3315 (T60674)', diff saved to https://phabricator.wikimedia.org/P28639 and previous config saved to /var/cache/conftool/dbconfig/20220527-151444-ladsgroup.json
[15:14:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:17:16] <icinga-wm>	 PROBLEM - SSH on pki2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:21:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P28640 and previous config saved to /var/cache/conftool/dbconfig/20220527-152145-ladsgroup.json
[15:21:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:23:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T60674)', diff saved to https://phabricator.wikimedia.org/P28641 and previous config saved to /var/cache/conftool/dbconfig/20220527-152355-ladsgroup.json
[15:24:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:01] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[15:26:51] <wikibugs>	 (03Abandoned) 10Zabe: Acquire fresh actor id [extensions/CheckUser] (wmf/1.39.0-wmf.12) - 10https://gerrit.wikimedia.org/r/798817 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[15:30:48] <icinga-wm>	 RECOVERY - SSH on druid1006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:31:18] <wikibugs>	 (03PS1) 10Ladsgroup: dbproxy: Repool clouddb101(3|4|5|6) [puppet] - 10https://gerrit.wikimedia.org/r/800748 (https://phabricator.wikimedia.org/T60674)
[15:31:45] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] dbproxy: Repool clouddb101(3|4|5|6) [puppet] - 10https://gerrit.wikimedia.org/r/800748 (https://phabricator.wikimedia.org/T60674) (owner: 10Ladsgroup)
[15:33:34] <wikibugs>	 (03CR) 10Dzahn: "It was only meant to capture once the files that currently only exist on gitlab1004 but not on another server. It was not about a long-ter" [puppet] - 10https://gerrit.wikimedia.org/r/800358 (https://phabricator.wikimedia.org/T274463) (owner: 10Dzahn)
[15:36:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298560)', diff saved to https://phabricator.wikimedia.org/P28642 and previous config saved to /var/cache/conftool/dbconfig/20220527-153650-ladsgroup.json
[15:36:52] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[15:36:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[15:36:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:57] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[15:36:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1166 (T298560)', diff saved to https://phabricator.wikimedia.org/P28643 and previous config saved to /var/cache/conftool/dbconfig/20220527-153658-ladsgroup.json
[15:37:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:39:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P28644 and previous config saved to /var/cache/conftool/dbconfig/20220527-153900-ladsgroup.json
[15:39:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:41:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[15:41:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[15:41:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:41:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:46:04] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudweb2002-dev is not behind LVS [puppet] - 10https://gerrit.wikimedia.org/r/795365 (owner: 10Majavah)
[15:46:24] <wikibugs>	 (03PS3) 10Jbond: WIP: Early start on firmware cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/763215
[15:50:36] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP: Early start on firmware cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/763215 (owner: 10Jbond)
[15:50:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
[15:50:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
[15:50:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1131 (T309311)', diff saved to https://phabricator.wikimedia.org/P28645 and previous config saved to /var/cache/conftool/dbconfig/20220527-155049-ladsgroup.json
[15:50:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:59] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[15:53:11] <wikibugs>	 (03PS1) 10Andrew Bogott: Revert "cloudweb2002-dev is not behind LVS" [puppet] - 10https://gerrit.wikimedia.org/r/800695
[15:54:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P28646 and previous config saved to /var/cache/conftool/dbconfig/20220527-155405-ladsgroup.json
[15:54:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:39] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Revert "cloudweb2002-dev is not behind LVS" [puppet] - 10https://gerrit.wikimedia.org/r/800695 (owner: 10Andrew Bogott)
[15:55:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131 (T309311)', diff saved to https://phabricator.wikimedia.org/P28647 and previous config saved to /var/cache/conftool/dbconfig/20220527-155510-ladsgroup.json
[15:55:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:00:36] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Beta Cluster: ship logs from docker services to logstash [puppet] - 10https://gerrit.wikimedia.org/r/800282 (https://phabricator.wikimedia.org/T309319) (owner: 10Ori)
[16:02:47] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Horizon: disable creation of new proxies under .wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/800735 (https://phabricator.wikimedia.org/T305391) (owner: 10Andrew Bogott)
[16:03:43] <wikibugs>	 (03PS1) 10Ahmon Dancy: Turn mw_releases into a list [puppet] - 10https://gerrit.wikimedia.org/r/800758 (https://phabricator.wikimedia.org/T299648)
[16:04:49] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] Turn mw_releases into a list [puppet] - 10https://gerrit.wikimedia.org/r/800758 (https://phabricator.wikimedia.org/T299648) (owner: 10Ahmon Dancy)
[16:09:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T60674)', diff saved to https://phabricator.wikimedia.org/P28648 and previous config saved to /var/cache/conftool/dbconfig/20220527-160910-ladsgroup.json
[16:09:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[16:09:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[16:09:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:17] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[16:09:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:10:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P28649 and previous config saved to /var/cache/conftool/dbconfig/20220527-161015-ladsgroup.json
[16:10:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:11:40] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudcontrol100[6-7].wikimedia.org - https://phabricator.wikimedia.org/T306853 (10Andrew) a:05Andrew→03Jclark-ctr I just noticed that this is still assigned to me! I don't think there any action items l...
[16:16:08] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[16:16:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[16:16:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[16:16:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:16] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[16:16:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:30] <logmsgbot>	 !log andrew@deploy1002 Started deploy [horizon/deploy@9d02cd6]: bug T305391
[16:18:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:36] <stashbot>	 T305391: Disable creation of new web proxies under .wmflabs.org - https://phabricator.wikimedia.org/T305391
[16:21:54] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[16:21:55] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[16:21:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[16:21:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[16:22:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T60674)', diff saved to https://phabricator.wikimedia.org/P28650 and previous config saved to /var/cache/conftool/dbconfig/20220527-162204-ladsgroup.json
[16:22:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:18] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[16:23:42] <wikibugs>	 (03PS4) 10Jbond: WIP: Early start on firmware cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/763215
[16:24:09] <logmsgbot>	 !log andrew@deploy1002 Finished deploy [horizon/deploy@9d02cd6]: bug T305391 (duration: 05m 39s)
[16:24:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:24:15] <stashbot>	 T305391: Disable creation of new web proxies under .wmflabs.org - https://phabricator.wikimedia.org/T305391
[16:25:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P28651 and previous config saved to /var/cache/conftool/dbconfig/20220527-162520-ladsgroup.json
[16:25:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:18] <wikibugs>	 (03PS1) 10Krinkle: Follow-up I8d62aedb: Fix .rotation mixin [core] (wmf/1.39.0-wmf.13) - 10https://gerrit.wikimedia.org/r/800696
[16:28:35] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP: Early start on firmware cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/763215 (owner: 10Jbond)
[16:30:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T60674)', diff saved to https://phabricator.wikimedia.org/P28652 and previous config saved to /var/cache/conftool/dbconfig/20220527-163025-ladsgroup.json
[16:30:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:32] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[16:40:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1131 (T309311)', diff saved to https://phabricator.wikimedia.org/P28653 and previous config saved to /var/cache/conftool/dbconfig/20220527-164026-ladsgroup.json
[16:40:28] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[16:40:29] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[16:40:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:40:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3316 (T309311)', diff saved to https://phabricator.wikimedia.org/P28654 and previous config saved to /var/cache/conftool/dbconfig/20220527-164034-ladsgroup.json
[16:40:37] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[16:40:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:40:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:40:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:45:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P28655 and previous config saved to /var/cache/conftool/dbconfig/20220527-164530-ladsgroup.json
[16:45:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:51:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T309311)', diff saved to https://phabricator.wikimedia.org/P28656 and previous config saved to /var/cache/conftool/dbconfig/20220527-165117-ladsgroup.json
[16:51:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:51:23] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[17:00:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P28657 and previous config saved to /var/cache/conftool/dbconfig/20220527-170035-ladsgroup.json
[17:00:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:06:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P28658 and previous config saved to /var/cache/conftool/dbconfig/20220527-170622-ladsgroup.json
[17:06:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:04] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic, 10User-zeljkofilipin: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10phuedx) >>! In T306181#7914450, @akosiaris wrote: > I notice that things that take like 1s...
[17:12:58] <wikibugs>	 (03PS5) 10Stang: Add language fallback support for wmgSiteLogoVariants [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799415 (https://phabricator.wikimedia.org/T305692)
[17:15:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298560)', diff saved to https://phabricator.wikimedia.org/P28659 and previous config saved to /var/cache/conftool/dbconfig/20220527-171537-ladsgroup.json
[17:15:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T60674)', diff saved to https://phabricator.wikimedia.org/P28660 and previous config saved to /var/cache/conftool/dbconfig/20220527-171541-ladsgroup.json
[17:15:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[17:15:43] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[17:15:44] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[17:15:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T60674)', diff saved to https://phabricator.wikimedia.org/P28661 and previous config saved to /var/cache/conftool/dbconfig/20220527-171548-ladsgroup.json
[17:15:49] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[17:15:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:16:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:16:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P28662 and previous config saved to /var/cache/conftool/dbconfig/20220527-172127-ladsgroup.json
[17:21:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:24:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T60674)', diff saved to https://phabricator.wikimedia.org/P28663 and previous config saved to /var/cache/conftool/dbconfig/20220527-172444-ladsgroup.json
[17:24:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:24:51] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[17:24:59] <wikibugs>	 (03PS1) 10Majavah: dynamicproxy: add zones endpoint [puppet] - 10https://gerrit.wikimedia.org/r/800775
[17:25:14] <wikibugs>	 (03PS1) 10Ladsgroup: db_maint_mapper_sal: Add Category:MariaDB to the report [software] - 10https://gerrit.wikimedia.org/r/800776
[17:26:30] <wikibugs>	 (03PS2) 10Ladsgroup: db_maint_mapper_sal: Add Category:MariaDB to the report [software] - 10https://gerrit.wikimedia.org/r/800776
[17:29:04] <wikibugs>	 (03CR) 10Andrew Bogott: dynamicproxy: add zones endpoint (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/800775 (owner: 10Majavah)
[17:30:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P28664 and previous config saved to /var/cache/conftool/dbconfig/20220527-173042-ladsgroup.json
[17:30:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T309311)', diff saved to https://phabricator.wikimedia.org/P28665 and previous config saved to /var/cache/conftool/dbconfig/20220527-173632-ladsgroup.json
[17:36:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[17:36:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[17:36:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:40] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[17:36:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1098:3316 (T309311)', diff saved to https://phabricator.wikimedia.org/P28666 and previous config saved to /var/cache/conftool/dbconfig/20220527-173641-ladsgroup.json
[17:36:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:39:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P28667 and previous config saved to /var/cache/conftool/dbconfig/20220527-173949-ladsgroup.json
[17:39:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:40:16] <mutante>	 topranks: I saw changes that move DNS entries out of netbox and into the DNS repo (for gitlab-new etc). I need the exact same thing but for gerrit. Previously tried to add in netbox but it wasn't the correct way. Though have done a ton of DNS changes before netbox was around. What is the correct approach, could I check with you possibly?
[17:41:03] <mutante>	 it's the same special case where there are public IPs, not behind LVS, secondary service IP vs server IP ...
[17:42:23] <topranks>	 mutante: ok yeah, on the wider point of what should be left in Netbox, and what the valid reasons for doing it outside that are I'm not that familiar.
[17:42:26] <mutante>	 ah, sorry, forgot about the timezone for a moment. It's Friday night already. I'll ask the same thing in Phabricator or next week.
[17:42:35] <topranks>	 But seems reasonable given your description
[17:42:39] <topranks>	 no probs, I have a few mins
[17:42:48] <mutante>	 gitlab is following what gerrit did previously
[17:43:26] <topranks>	 Basic thing is to first update Netbox, clear the hostname from the "DNS" field of the IP address in question
[17:43:44] <mutante>	 so.. if I were to add names directly in the DNS repo and do nothing in netbox.. basically just like I would have done it before netbox existed.. and just autdns-update and don't use any cookbook.. then is it ok now?
[17:43:52] <topranks>	 And best to add a description starting "Keep manual DNS" to it instead
[17:43:55] <topranks>	 Like these:
[17:43:56] <mutante>	 or all that but run the cookbook afterwards?
[17:43:56] <topranks>	 Keep manual DNS: 
[17:44:09] <topranks>	 https://netbox.wikimedia.org/search/?q=gitlab-replica.wikimedia.org
[17:44:16] <topranks>	 No you need to remove the DNS hostname from the IP in netbox
[17:44:19] <mutante>	 oh, so they ARE actually still in netbox
[17:44:28] <mutante>	 then moving them out of netbox wasnt what I thought it was
[17:44:38] <topranks>	 Without doing that, and running the sre.dns.netbox cookbook, there will be double entries
[17:44:50] <topranks>	 Which will mean CI will fail for your manual patch of the zonefile.
[17:44:57] <topranks>	 jelto was following wikitech let me see if I can find.
[17:45:00] <topranks>	 But basic approach is:
[17:45:07] <topranks>	 1) Upload patch with manual entries
[17:45:25] <topranks>	 2) Remove the entries from Netbox as I described above
[17:45:42] <topranks>	 3) Run CI again on gerrit, should get nice green tick
[17:45:47] <topranks>	 4) Merge
[17:45:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P28668 and previous config saved to /var/cache/conftool/dbconfig/20220527-174547-ladsgroup.json
[17:45:52] <topranks>	 5) Run authdns update
[17:45:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:46:10] <mutante>	 but what if they don't exist in netbox at all yet
[17:46:26] <mutante>	 I had deleted my previous entries because they were not correct
[17:47:09] <mutante>	 I had followed docs for "special case" on wikitech. but that was a different type of special cases afaict
[17:47:11] <topranks>	 Ah sorry
[17:47:21] <topranks>	 If they don't exist in Netbox you can just follow the old manual process
[17:47:42] <topranks>	 I was talking about a situation an existing hostname was moving from netbox to manual
[17:47:57] <topranks>	 If it doesn't exist just edit the zonefile and submit patch via gerrit
[17:48:00] <mutante>	 :)) that's great. That makes it easier 
[17:48:36] <mutante>	 previously I had tried the other approach with the "reserved in netbox" entries but that was not for this type of service IP 
[17:48:50] <topranks>	 cool.  I'm out the door now but if nobody else looks at it I can review Monday am put me on as a reviewer.
[17:48:54] <wikibugs>	 10SRE, 10Data-Engineering, 10Traffic-Icebox: Mobile redirects drop provenance parameters - https://phabricator.wikimedia.org/T252227 (10Milimetric) > I'm very intrigued @Milimetric about your comment about reinstrumenting pageviews in a declarative way (that sounds like it could help with some of our work ar...
[17:49:09] <topranks>	 The IP should be present in Netbox - this isn't a DNS thing - but just so it's documented / not used for anything else.
[17:50:00] <mutante>	 eh, ok, I got a bit confused still about it being in netbox or not. let me do that next week with a review
[17:50:08] <mutante>	 enjoy the weekend
[17:51:55] <topranks>	 ok cool same to you :)
[17:52:07] <mutante>	 thanks, cya
[17:54:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P28669 and previous config saved to /var/cache/conftool/dbconfig/20220527-175455-ladsgroup.json
[17:54:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:58:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T309311)', diff saved to https://phabricator.wikimedia.org/P28670 and previous config saved to /var/cache/conftool/dbconfig/20220527-175819-ladsgroup.json
[17:58:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:58:26] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[17:59:37] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to contint-admins for taavi - https://phabricator.wikimedia.org/T309375 (10Dzahn)
[18:00:50] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to contint-admins for taavi - https://phabricator.wikimedia.org/T309375 (10Dzahn) checked off boxes (L3 signed, NDA, has existing shell access, etc).   Will need approval from group approver (Tyler).
[18:00:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298560)', diff saved to https://phabricator.wikimedia.org/P28671 and previous config saved to /var/cache/conftool/dbconfig/20220527-180052-ladsgroup.json
[18:00:54] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[18:00:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[18:00:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:01:00] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[18:01:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:01:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:08:06] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] db_maint_mapper_sal: Add Category:MariaDB to the report [software] - 10https://gerrit.wikimedia.org/r/800776 (owner: 10Ladsgroup)
[18:10:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T60674)', diff saved to https://phabricator.wikimedia.org/P28672 and previous config saved to /var/cache/conftool/dbconfig/20220527-181000-ladsgroup.json
[18:10:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[18:10:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[18:10:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:06] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[18:10:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:33] <icinga-wm>	 PROBLEM - SSH on labweb1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:10:45] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] db_maint_mapper_sal: Add Category:MariaDB to the report [software] - 10https://gerrit.wikimedia.org/r/800776 (owner: 10Ladsgroup)
[18:13:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P28673 and previous config saved to /var/cache/conftool/dbconfig/20220527-181324-ladsgroup.json
[18:13:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:22] <wikibugs>	 10SRE, 10Analytics, 10LDAP-Access-Requests: Grant Access to `wmf` for `Dmantena` - https://phabricator.wikimedia.org/T308294 (10Milimetric) @Tsevener is right, and that's the access that @RhinosF1 pointed to.  @Dmantena: unfortunately, due to how authentication and authorization works more broadly at wmf, th...
[18:14:25] <wikibugs>	 (03Merged) 10jenkins-bot: db_maint_mapper_sal: Add Category:MariaDB to the report [software] - 10https://gerrit.wikimedia.org/r/800776 (owner: 10Ladsgroup)
[18:14:35] <wikibugs>	 10SRE, 10Data-Engineering, 10LDAP-Access-Requests: Grant Access to `wmf` for `Dmantena` - https://phabricator.wikimedia.org/T308294 (10Milimetric)
[18:16:45] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[18:16:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[18:16:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1110 (T60674)', diff saved to https://phabricator.wikimedia.org/P28674 and previous config saved to /var/cache/conftool/dbconfig/20220527-181650-ladsgroup.json
[18:16:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:57] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[18:20:35] <icinga-wm>	 RECOVERY - SSH on pki2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:25:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T60674)', diff saved to https://phabricator.wikimedia.org/P28675 and previous config saved to /var/cache/conftool/dbconfig/20220527-182523-ladsgroup.json
[18:25:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:31] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[18:28:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P28676 and previous config saved to /var/cache/conftool/dbconfig/20220527-182829-ladsgroup.json
[18:28:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:29:11] <wikibugs>	 (03CR) 10Majavah: dynamicproxy: add zones endpoint (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/800775 (owner: 10Majavah)
[18:40:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P28677 and previous config saved to /var/cache/conftool/dbconfig/20220527-184028-ladsgroup.json
[18:40:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T309311)', diff saved to https://phabricator.wikimedia.org/P28678 and previous config saved to /var/cache/conftool/dbconfig/20220527-184334-ladsgroup.json
[18:43:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
[18:43:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
[18:43:41] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[18:43:42] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[18:43:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:47] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[18:43:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[18:49:34] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[18:49:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3312 (T309311)', diff saved to https://phabricator.wikimedia.org/P28679 and previous config saved to /var/cache/conftool/dbconfig/20220527-184938-ladsgroup.json
[18:49:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:46] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[18:51:04] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] "PCC checks out: https://puppet-compiler.wmflabs.org/pcc-worker1002/35587/" [puppet] - 10https://gerrit.wikimedia.org/r/798886 (https://phabricator.wikimedia.org/T237224) (owner: 10Cwhite)
[18:51:10] <wikibugs>	 (03PS8) 10Cwhite: opensearch_dashboards: add backup script enable job [puppet] - 10https://gerrit.wikimedia.org/r/798886 (https://phabricator.wikimedia.org/T237224)
[18:51:39] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q4:(Need By: TBD) rack/setup/install an-presto10[06-15].eqiad.wmnet - https://phabricator.wikimedia.org/T306835 (10Jclark-ctr) name rack Unit Port CableID an-presto1006 e1 29 29 20220068 an-presto1007 e1 31 31 20220061 an-presto1008 e2 31 31 20220066 an-pre...
[18:52:20] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q4:(Need By: TBD) rack/setup/install an-presto10[06-15].eqiad.wmnet - https://phabricator.wikimedia.org/T306835 (10Jclark-ctr)
[18:53:32] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q4:(Need By: TBD) rack/setup/install an-presto10[06-15].eqiad.wmnet - https://phabricator.wikimedia.org/T306835 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson
[18:53:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[18:53:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[18:53:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:53:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:55:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P28680 and previous config saved to /var/cache/conftool/dbconfig/20220527-185533-ladsgroup.json
[18:55:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:28] <wikibugs>	 (03Abandoned) 10Stang: zhwikiquote: Add logo variants [mediawiki-config] - 10https://gerrit.wikimedia.org/r/792973 (https://phabricator.wikimedia.org/T308620) (owner: 10Stang)
[18:57:40] <wikibugs>	 10SRE, 10Data-Engineering, 10LDAP-Access-Requests: Grant Access to `wmf` for `Dmantena` - https://phabricator.wikimedia.org/T308294 (10Dzahn) thanks @Milimetric.that makes sense. it was just out of habit to still use that tag. gotcha for next time
[19:03:14] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[19:03:16] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[19:03:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:03:19] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudweb2002-dev is not behind LVS [puppet] - 10https://gerrit.wikimedia.org/r/800787
[19:03:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3316 (T309311)', diff saved to https://phabricator.wikimedia.org/P28682 and previous config saved to /var/cache/conftool/dbconfig/20220527-190320-ladsgroup.json
[19:03:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:03:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:03:29] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[19:04:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudweb2002-dev is not behind LVS [puppet] - 10https://gerrit.wikimedia.org/r/800787 (owner: 10Andrew Bogott)
[19:06:25] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[19:08:40] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to contint-admins for taavi - https://phabricator.wikimedia.org/T309375 (10Dzahn) a:03thcipriani (if this needs an additional sponsor I can be that)
[19:10:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T309311)', diff saved to https://phabricator.wikimedia.org/P28683 and previous config saved to /var/cache/conftool/dbconfig/20220527-191015-ladsgroup.json
[19:10:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:23] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[19:10:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T60674)', diff saved to https://phabricator.wikimedia.org/P28684 and previous config saved to /var/cache/conftool/dbconfig/20220527-191039-ladsgroup.json
[19:10:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[19:10:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[19:10:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:45] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[19:10:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3315 (T60674)', diff saved to https://phabricator.wikimedia.org/P28685 and previous config saved to /var/cache/conftool/dbconfig/20220527-191047-ladsgroup.json
[19:10:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:11:43] <icinga-wm>	 RECOVERY - SSH on labweb1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:15:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T309311)', diff saved to https://phabricator.wikimedia.org/P28686 and previous config saved to /var/cache/conftool/dbconfig/20220527-191508-ladsgroup.json
[19:15:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:07] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] parsoid::testing: add an auto_restart service for nginx [puppet] - 10https://gerrit.wikimedia.org/r/800241 (owner: 10Dzahn)
[19:18:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T60674)', diff saved to https://phabricator.wikimedia.org/P28687 and previous config saved to /var/cache/conftool/dbconfig/20220527-191829-ladsgroup.json
[19:18:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:18:36] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[19:20:28] <wikibugs>	 10SRE, 10serviceops, 10Patch-For-Review: Migrate node-based services in production to node14 - https://phabricator.wikimedia.org/T306995 (10Jdforrester-WMF)
[19:22:37] <wikibugs>	 (03PS45) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[19:25:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[19:25:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P28688 and previous config saved to /var/cache/conftool/dbconfig/20220527-192521-ladsgroup.json
[19:25:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:07] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[19:30:09] <wikibugs>	 10SRE, 10serviceops, 10Patch-For-Review: Migrate node-based services in production to node14 - https://phabricator.wikimedia.org/T306995 (10Jdforrester-WMF)
[19:30:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P28689 and previous config saved to /var/cache/conftool/dbconfig/20220527-193013-ladsgroup.json
[19:30:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:33:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P28690 and previous config saved to /var/cache/conftool/dbconfig/20220527-193334-ladsgroup.json
[19:33:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:40:07] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[19:40:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P28691 and previous config saved to /var/cache/conftool/dbconfig/20220527-194026-ladsgroup.json
[19:40:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:44:33] <wikibugs>	 (03PS1) 10Stang: Add wmgSiteLogoVariants support for Chinese Wikimedia projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/800793 (https://phabricator.wikimedia.org/T308620)
[19:44:48] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add wmgSiteLogoVariants support for Chinese Wikimedia projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/800793 (https://phabricator.wikimedia.org/T308620) (owner: 10Stang)
[19:45:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P28692 and previous config saved to /var/cache/conftool/dbconfig/20220527-194518-ladsgroup.json
[19:45:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:47:34] <wikibugs>	 (03CR) 10Stang: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/800793 (https://phabricator.wikimedia.org/T308620) (owner: 10Stang)
[19:48:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P28693 and previous config saved to /var/cache/conftool/dbconfig/20220527-194839-ladsgroup.json
[19:48:40] <wikibugs>	 (03PS6) 10Stang: Add language fallback support for wmgSiteLogoVariants [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799415 (https://phabricator.wikimedia.org/T305692)
[19:48:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:49:38] <wikibugs>	 (03CR) 10Stang: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/800793 (https://phabricator.wikimedia.org/T308620) (owner: 10Stang)
[19:51:19] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] aptrepo: add opensearch2 thirdparty component [puppet] - 10https://gerrit.wikimedia.org/r/800294 (https://phabricator.wikimedia.org/T304440) (owner: 10Cwhite)
[19:55:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T309311)', diff saved to https://phabricator.wikimedia.org/P28694 and previous config saved to /var/cache/conftool/dbconfig/20220527-195531-ladsgroup.json
[19:55:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[19:55:34] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[19:55:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:55:38] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[19:55:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3312 (T309311)', diff saved to https://phabricator.wikimedia.org/P28695 and previous config saved to /var/cache/conftool/dbconfig/20220527-195539-ladsgroup.json
[19:55:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:55:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:55:52] <wikibugs>	 (03PS1) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[19:55:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:56:30] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[20:00:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T309311)', diff saved to https://phabricator.wikimedia.org/P28696 and previous config saved to /var/cache/conftool/dbconfig/20220527-200023-ladsgroup.json
[20:00:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[20:00:27] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[20:00:29] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[20:00:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:30] <wikibugs>	 (03PS2) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[20:00:32] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[20:00:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1165 (T309311)', diff saved to https://phabricator.wikimedia.org/P28697 and previous config saved to /var/cache/conftool/dbconfig/20220527-200037-ladsgroup.json
[20:00:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:53] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[20:01:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[20:02:33] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[20:02:42] <wikibugs>	 (03CR) 10Stang: "Will wait for the dependent patch got merged. Also don't forget addition of $wmgSiteLogoVariantFallback for these four newly added sites." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/800793 (https://phabricator.wikimedia.org/T308620) (owner: 10Stang)
[20:03:24] <wikibugs>	 (03PS3) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[20:03:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T60674)', diff saved to https://phabricator.wikimedia.org/P28698 and previous config saved to /var/cache/conftool/dbconfig/20220527-200344-ladsgroup.json
[20:03:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:51] <stashbot>	 T60674: Drop page.page_restrictions column from Wikimedia wikis - https://phabricator.wikimedia.org/T60674
[20:03:59] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[20:04:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T309311)', diff saved to https://phabricator.wikimedia.org/P28699 and previous config saved to /var/cache/conftool/dbconfig/20220527-200453-ladsgroup.json
[20:04:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:06:09] <wikibugs>	 (03PS1) 10Andrew Bogott: Add fake db passwords for OpenStack Heato [labs/private] - 10https://gerrit.wikimedia.org/r/800796 (https://phabricator.wikimedia.org/T309407)
[20:07:11] <wikibugs>	 (03PS2) 10Andrew Bogott: Add fake db passwords for OpenStack Heat [labs/private] - 10https://gerrit.wikimedia.org/r/800796 (https://phabricator.wikimedia.org/T309407)
[20:14:21] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Add fake db passwords for OpenStack Heat [labs/private] - 10https://gerrit.wikimedia.org/r/800796 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[20:14:27] <wikibugs>	 (03PS4) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[20:15:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[20:19:58] <wikibugs>	 (03PS5) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[20:20:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P28700 and previous config saved to /var/cache/conftool/dbconfig/20220527-201959-ladsgroup.json
[20:20:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:20:35] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[20:21:34] <wikibugs>	 (03PS6) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[20:22:54] <wikibugs>	 (03PS7) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[20:24:48] <wikibugs>	 (03PS46) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[20:25:02] <wikibugs>	 (03PS8) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[20:27:36] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[20:35:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P28701 and previous config saved to /var/cache/conftool/dbconfig/20220527-203504-ladsgroup.json
[20:35:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:36:15] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[20:41:53] <wikibugs>	 (03PS47) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[20:43:41] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:43:55] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 unresponsive - https://phabricator.wikimedia.org/T309413 (10TheresNoTime) p:05Triage→03High
[20:44:55] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[20:45:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T309311)', diff saved to https://phabricator.wikimedia.org/P28702 and previous config saved to /var/cache/conftool/dbconfig/20220527-204505-ladsgroup.json
[20:45:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:45:13] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[20:50:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T309311)', diff saved to https://phabricator.wikimedia.org/P28703 and previous config saved to /var/cache/conftool/dbconfig/20220527-205009-ladsgroup.json
[20:50:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[20:50:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[20:50:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:16] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[20:50:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1180 (T309311)', diff saved to https://phabricator.wikimedia.org/P28704 and previous config saved to /var/cache/conftool/dbconfig/20220527-205017-ladsgroup.json
[20:50:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:51:58] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 unresponsive - https://phabricator.wikimedia.org/T309413 (10dancy) I rebooted using the horizon UI.
[20:52:11] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 unresponsive - https://phabricator.wikimedia.org/T309413 (10TheresNoTime) 05Open→03Resolved a:03dancy @dancy rebooted `deployment-deploy03` and it is now accessible
[20:54:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T309311)', diff saved to https://phabricator.wikimedia.org/P28705 and previous config saved to /var/cache/conftool/dbconfig/20220527-205434-ladsgroup.json
[20:54:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P28706 and previous config saved to /var/cache/conftool/dbconfig/20220527-210010-ladsgroup.json
[21:00:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:26] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/800282 (https://phabricator.wikimedia.org/T309319) (owner: 10Ori)
[21:09:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P28707 and previous config saved to /var/cache/conftool/dbconfig/20220527-210939-ladsgroup.json
[21:09:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:14:02] <wikibugs>	 (03PS48) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[21:15:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P28708 and previous config saved to /var/cache/conftool/dbconfig/20220527-211515-ladsgroup.json
[21:15:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:16:33] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[21:24:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P28709 and previous config saved to /var/cache/conftool/dbconfig/20220527-212444-ladsgroup.json
[21:24:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:27:26] <wikibugs>	 (03PS49) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[21:30:01] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[21:30:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T309311)', diff saved to https://phabricator.wikimedia.org/P28710 and previous config saved to /var/cache/conftool/dbconfig/20220527-213020-ladsgroup.json
[21:30:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[21:30:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[21:30:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:26] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[21:30:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3312 (T309311)', diff saved to https://phabricator.wikimedia.org/P28711 and previous config saved to /var/cache/conftool/dbconfig/20220527-213028-ladsgroup.json
[21:30:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:32:30] <wikibugs>	 (03PS9) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[21:32:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298560)', diff saved to https://phabricator.wikimedia.org/P28712 and previous config saved to /var/cache/conftool/dbconfig/20220527-213255-ladsgroup.json
[21:33:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:33:02] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[21:33:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[21:33:49] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 241, down: 1, dormant: 0, excluded: 1, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:35:35] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:36:22] <wikibugs>	 (03PS10) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[21:37:51] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:38:17] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 242, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:39:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T309311)', diff saved to https://phabricator.wikimedia.org/P28713 and previous config saved to /var/cache/conftool/dbconfig/20220527-213949-ladsgroup.json
[21:39:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[21:39:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[21:39:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:39:56] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[21:39:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1168 (T309311)', diff saved to https://phabricator.wikimedia.org/P28714 and previous config saved to /var/cache/conftool/dbconfig/20220527-213957-ladsgroup.json
[21:40:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:40:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:40:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:40] <wikibugs>	 (03PS50) 10Raymond Ndibe: Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040)
[21:41:50] <wikibugs>	 (03PS11) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[21:42:32] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 unresponsive - https://phabricator.wikimedia.org/T309413 (10TheresNoTime) 05Resolved→03Open a:05dancy→03TheresNoTime Issue repeated, looking at it now
[21:43:21] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[21:43:49] <wikibugs>	 (03PS12) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[21:44:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T309311)', diff saved to https://phabricator.wikimedia.org/P28715 and previous config saved to /var/cache/conftool/dbconfig/20220527-214414-ladsgroup.json
[21:44:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:44:45] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:45:11] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Create REST api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/777037 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[21:48:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P28716 and previous config saved to /var/cache/conftool/dbconfig/20220527-214800-ladsgroup.json
[21:48:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:49:35] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is CRITICAL: 46.85 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6
[21:51:11] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:51:39] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 241, down: 1, dormant: 0, excluded: 1, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:51:49] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is OK: (C)60 le (W)70 le 75.78 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6
[21:52:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T309311)', diff saved to https://phabricator.wikimedia.org/P28717 and previous config saved to /var/cache/conftool/dbconfig/20220527-215213-ladsgroup.json
[21:52:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:52:19] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[21:55:39] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:56:07] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 242, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:59:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P28718 and previous config saved to /var/cache/conftool/dbconfig/20220527-215919-ladsgroup.json
[21:59:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:03:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P28719 and previous config saved to /var/cache/conftool/dbconfig/20220527-220305-ladsgroup.json
[22:03:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:05:45] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 crashed twice - https://phabricator.wikimedia.org/T309413 (10TheresNoTime) p:05High→03Triage a:05TheresNoTime→03None
[22:07:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P28720 and previous config saved to /var/cache/conftool/dbconfig/20220527-220718-ladsgroup.json
[22:07:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:14:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P28721 and previous config saved to /var/cache/conftool/dbconfig/20220527-221424-ladsgroup.json
[22:14:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:18:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298560)', diff saved to https://phabricator.wikimedia.org/P28722 and previous config saved to /var/cache/conftool/dbconfig/20220527-221810-ladsgroup.json
[22:18:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:18:18] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[22:18:45] <icinga-wm>	 PROBLEM - SSH on cp5012.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:22:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P28723 and previous config saved to /var/cache/conftool/dbconfig/20220527-222223-ladsgroup.json
[22:22:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:29:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T309311)', diff saved to https://phabricator.wikimedia.org/P28724 and previous config saved to /var/cache/conftool/dbconfig/20220527-222929-ladsgroup.json
[22:29:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:29:37] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[22:33:57] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[22:34:23] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 241, down: 1, dormant: 0, excluded: 1, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[22:36:39] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 crashed twice - https://phabricator.wikimedia.org/T309413 (10TheresNoTime) While running a step of `beta-update-databases-eqiad`, we go OOM and unresponsive:  `   PID  USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM...
[22:37:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T309311)', diff saved to https://phabricator.wikimedia.org/P28725 and previous config saved to /var/cache/conftool/dbconfig/20220527-223728-ladsgroup.json
[22:37:30] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[22:37:32] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[22:37:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:36] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[22:37:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:40:45] <icinga-wm>	 PROBLEM - SSH on wtp1039.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:41:20] <wikibugs>	 (03PS2) 10Jforrester: Follow-up I8d62aedb: Fix .rotation mixin [core] (wmf/1.39.0-wmf.13) - 10https://gerrit.wikimedia.org/r/800696 (owner: 10Krinkle)
[22:41:39] <wikibugs>	 (03CR) 10Jforrester: "Re-cherry-picked now it's merged so we get the nice git hash in the blame." [core] (wmf/1.39.0-wmf.13) - 10https://gerrit.wikimedia.org/r/800696 (owner: 10Krinkle)
[22:55:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[22:55:12] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
[22:55:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[22:55:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:55:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[22:55:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:55:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:55:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:58:06] <wikibugs>	 (03PS13) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[22:58:08] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 crashed twice - https://phabricator.wikimedia.org/T309413 (10TheresNoTime) a:03TheresNoTime
[22:59:03] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 242, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[23:00:34] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[23:00:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[23:00:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:00:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1182 (T309311)', diff saved to https://phabricator.wikimedia.org/P28726 and previous config saved to /var/cache/conftool/dbconfig/20220527-230040-ladsgroup.json
[23:00:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:00:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:00:48] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[23:00:49] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[23:02:37] <wikibugs>	 (03PS14) 10Andrew Bogott: Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407)
[23:06:17] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Rough in manifest and config for OpenStack Heat [puppet] - 10https://gerrit.wikimedia.org/r/800794 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[23:09:49] <wikibugs>	 (03PS1) 10Andrew Bogott: Pass in codfw1dev-specific rabbit pass to heat profile [puppet] - 10https://gerrit.wikimedia.org/r/800810 (https://phabricator.wikimedia.org/T309407)
[23:13:04] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Pass in codfw1dev-specific rabbit pass to heat profile [puppet] - 10https://gerrit.wikimedia.org/r/800810 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[23:16:57] <wikibugs>	 (03PS1) 10Andrew Bogott: Add initial (mostly empty) policy.yaml for OpenStack heat [puppet] - 10https://gerrit.wikimedia.org/r/800811 (https://phabricator.wikimedia.org/T309407)
[23:17:47] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add initial (mostly empty) policy.yaml for OpenStack heat [puppet] - 10https://gerrit.wikimedia.org/r/800811 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[23:21:49] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 crashed twice - https://phabricator.wikimedia.org/T309413 (10Zabe) FTR, it seems like beta-update-databases-eqiad was running out of memory while trying to perform the migration added in https://gerrit.wikimedia.org/r/c/med...
[23:26:53] <wikibugs>	 (03PS2) 10Andrew Bogott: Add initial (mostly empty) policy.yaml for OpenStack heat [puppet] - 10https://gerrit.wikimedia.org/r/800811 (https://phabricator.wikimedia.org/T309407)
[23:29:00] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Add initial (mostly empty) policy.yaml for OpenStack heat [puppet] - 10https://gerrit.wikimedia.org/r/800811 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[23:36:10] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 ran out of memory twice while trying to perform a WikiLambda db migration - https://phabricator.wikimedia.org/T309413 (10TheresNoTime)
[23:38:12] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 ran out of memory twice while trying to perform a WikiLambda db migration - https://phabricator.wikimedia.org/T309413 (10TheresNoTime)
[23:40:05] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-deploy03 ran out of memory twice while trying to perform a WikiLambda db migration - https://phabricator.wikimedia.org/T309413 (10TheresNoTime)
[23:41:53] <icinga-wm>	 RECOVERY - SSH on wtp1039.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:44:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T309311)', diff saved to https://phabricator.wikimedia.org/P28727 and previous config saved to /var/cache/conftool/dbconfig/20220527-234427-ladsgroup.json
[23:44:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:44:38] <stashbot>	 T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311
[23:55:26] <wikibugs>	 (03PS1) 10Andrew Bogott: Add transport_url to heat.conf [puppet] - 10https://gerrit.wikimedia.org/r/800823 (https://phabricator.wikimedia.org/T309407)
[23:57:40] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Add transport_url to heat.conf [puppet] - 10https://gerrit.wikimedia.org/r/800823 (https://phabricator.wikimedia.org/T309407) (owner: 10Andrew Bogott)
[23:59:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P28728 and previous config saved to /var/cache/conftool/dbconfig/20220527-235932-ladsgroup.json
[23:59:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log