[00:38:28] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_eventlogging_legacy.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:42:18] <icinga-wm>	 RECOVERY - Check systemd state on logstash1026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:54:36] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[00:55:00] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1082 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[01:13:04] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubemaster1001 is CRITICAL: instance=10.64.0.117 verb=UPDATE https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[01:34:12] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[02:26:38] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1082 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[02:57:32] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:00:50] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1082 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[03:10:04] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb={CREATE,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[03:14:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[03:19:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[03:27:54] <wikibugs>	 (03PS5) 10Ori: Initial Debian packaging [software/varnish/libvmod-querysort] - 10https://gerrit.wikimedia.org/r/810551 (https://phabricator.wikimedia.org/T138093)
[03:58:54] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:19:16] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 verb={CREATE,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[04:38:10] <icinga-wm>	 PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[04:55:16] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1082 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[04:55:58] <icinga-wm>	 PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:29:42] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1082 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[05:39:47] <wikibugs>	 (03PS1) 10Marostegui: db2153: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/810706 (https://phabricator.wikimedia.org/T311493)
[05:40:57] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2153: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/810706 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[05:47:56] <wikibugs>	 (03PS1) 10Marostegui: db2154: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/810707 (https://phabricator.wikimedia.org/T311493)
[05:49:06] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2154: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/810707 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[05:51:17] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: codfw s4 sanitarium master switch
[05:51:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:51:38] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: codfw s4 sanitarium master switch
[05:51:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:57:24] <icinga-wm>	 RECOVERY - SSH on wtp1044.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:04:33] <wikibugs>	 (03PS1) 10Marostegui: db2155: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/810711 (https://phabricator.wikimedia.org/T311493)
[06:07:21] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2155: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/810711 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[06:10:09] <icinga-wm>	 PROBLEM - SSH on mw1321.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:10:11] <wikibugs>	 (03PS1) 10Marostegui: mariadb: db2073 no longer sanitarium master [puppet] - 10https://gerrit.wikimedia.org/r/810713 (https://phabricator.wikimedia.org/T311493)
[06:11:32] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: db2073 no longer sanitarium master [puppet] - 10https://gerrit.wikimedia.org/r/810713 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[06:13:33] <icinga-wm>	 PROBLEM - SSH on bast3005 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring
[06:18:41] <icinga-wm>	 RECOVERY - SSH on bast3005 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[06:20:46] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Decommission db2091 [puppet] - 10https://gerrit.wikimedia.org/r/810715 (https://phabricator.wikimedia.org/T311803)
[06:24:12] <logmsgbot>	 !log marostegui@cumin2002 START - Cookbook sre.hosts.decommission for hosts db2091.codfw.wmnet
[06:24:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:27:44] <wikibugs>	 (03CR) 10Ayounsi: "Thanks for the cleanup, lgtm overall with 2 comments." [puppet] - 10https://gerrit.wikimedia.org/r/810323 (owner: 10Muehlenhoff)
[06:28:23] <logmsgbot>	 !log marostegui@cumin2002 START - Cookbook sre.dns.netbox
[06:28:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:29:34] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Decommission db2091 [puppet] - 10https://gerrit.wikimedia.org/r/810715 (https://phabricator.wikimedia.org/T311803) (owner: 10Marostegui)
[06:32:21] <logmsgbot>	 !log marostegui@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[06:32:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:34:41] <logmsgbot>	 !log marostegui@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2091.codfw.wmnet
[06:34:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:36:02] <wikibugs>	 10ops-codfw, 10decommission-hardware, 10Patch-For-Review: decommission db2091 - https://phabricator.wikimedia.org/T311803 (10Marostegui) a:03Papaul
[06:36:23] <wikibugs>	 10ops-codfw, 10decommission-hardware, 10Patch-For-Review: decommission db2091 - https://phabricator.wikimedia.org/T311803 (10Marostegui) Ready for you @Papaul!
[06:38:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/809038 (https://phabricator.wikimedia.org/T311445) (owner: 10BCornwall)
[06:39:37] <logmsgbot>	 !log marostegui@cumin2002 START - Cookbook sre.hosts.decommission for hosts db2092.codfw.wmnet
[06:39:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:39:57] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Decommission db2092 [puppet] - 10https://gerrit.wikimedia.org/r/810819 (https://phabricator.wikimedia.org/T311802)
[06:41:28] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: mediawiki: add scap restarts script (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/810027 (owner: 10Giuseppe Lavagetto)
[06:43:48] <logmsgbot>	 !log marostegui@cumin2002 START - Cookbook sre.dns.netbox
[06:43:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:47:10] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, adding Andrea since she's been working on Bullseye support. To avoid the obvious conflict with https://gerrit.wikimedia.org/r/c/oper" [puppet] - 10https://gerrit.wikimedia.org/r/810323 (owner: 10Muehlenhoff)
[06:47:19] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Decommission db2092 [puppet] - 10https://gerrit.wikimedia.org/r/810819 (https://phabricator.wikimedia.org/T311802) (owner: 10Marostegui)
[06:47:44] <logmsgbot>	 !log marostegui@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[06:47:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:47:55] <icinga-wm>	 PROBLEM - Check systemd state on logstash2026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:49:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[06:49:41] <logmsgbot>	 !log marostegui@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2092.codfw.wmnet
[06:49:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:49:48] <wikibugs>	 10ops-codfw, 10decommission-hardware: decommission db2092 - https://phabricator.wikimedia.org/T311802 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin2002 for hosts: `db2092.codfw.wmnet` - db2092.codfw.wmnet (**PASS**)   - Downtimed host on Icinga/Alertmanager   - Found phys...
[06:49:54] <wikibugs>	 10ops-codfw, 10decommission-hardware: decommission db2092 - https://phabricator.wikimedia.org/T311802 (10Marostegui) @Papaul this is all yours!
[06:50:01] <wikibugs>	 10ops-codfw, 10decommission-hardware: decommission db2092 - https://phabricator.wikimedia.org/T311802 (10Marostegui) a:03Papaul
[06:51:17] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: mediawiki: add scap restarts script [puppet] - 10https://gerrit.wikimedia.org/r/810027
[06:51:19] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: scap: use the new script to restart php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/810031
[06:51:21] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: scap: drop unused parameters from the configuration [puppet] - 10https://gerrit.wikimedia.org/r/810048
[06:53:01] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:54:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[06:55:23] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove access for mewoph [puppet] - 10https://gerrit.wikimedia.org/r/810822
[06:56:39] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM overall, see inline" [puppet] - 10https://gerrit.wikimedia.org/r/810106 (owner: 10Andrea Denisse)
[06:57:39] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove access for mewoph [puppet] - 10https://gerrit.wikimedia.org/r/810822 (owner: 10Muehlenhoff)
[07:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220704T0700)
[07:00:39] <wikibugs>	 (03CR) 10Muehlenhoff: "Sure thing, I'll rebase the patch when https://gerrit.wikimedia.org/r/c/operations/puppet/+/802593 is merged." [puppet] - 10https://gerrit.wikimedia.org/r/810323 (owner: 10Muehlenhoff)
[07:02:46] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] "Looks good, merging" [puppet] - 10https://gerrit.wikimedia.org/r/802146 (https://phabricator.wikimedia.org/T286856) (owner: 10Majavah)
[07:04:24] <wikibugs>	 (03CR) 10Muehlenhoff: "We can skip this, the entire aptrepo config for stretch will be removed at large once all Stretch hosts are gone (2-3 months)." [puppet] - 10https://gerrit.wikimedia.org/r/810459 (owner: 10Majavah)
[07:10:10] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] snapshot: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810319 (owner: 10Muehlenhoff)
[07:10:19] <wikibugs>	 (03PS6) 10Giuseppe Lavagetto: mediawiki: add scap restarts script [puppet] - 10https://gerrit.wikimedia.org/r/810027
[07:10:21] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: scap: use the new script to restart php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/810031
[07:10:22] <icinga-wm>	 RECOVERY - SSH on mw1321.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:10:23] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: scap: drop unused parameters from the configuration [puppet] - 10https://gerrit.wikimedia.org/r/810048
[07:11:57] <wikibugs>	 (03PS5) 10Majavah: kubeadm: drop support for 1.20 [puppet] - 10https://gerrit.wikimedia.org/r/802143
[07:11:59] <wikibugs>	 (03PS5) 10Majavah: aptrepo: add thirdparty/kubeadm-k8s-1-22 [puppet] - 10https://gerrit.wikimedia.org/r/802146 (https://phabricator.wikimedia.org/T286856)
[07:12:01] <wikibugs>	 (03Abandoned) 10Majavah: aptrepo: drop kubeadm components from stretch [puppet] - 10https://gerrit.wikimedia.org/r/810459 (owner: 10Majavah)
[07:12:16] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Productionize db2157 [puppet] - 10https://gerrit.wikimedia.org/r/810826 (https://phabricator.wikimedia.org/T311493)
[07:12:27] <wikibugs>	 (03PS3) 10Muehlenhoff: graphite: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810306
[07:14:13] <wikibugs>	 (03PS7) 10Giuseppe Lavagetto: mediawiki: add scap restarts script [puppet] - 10https://gerrit.wikimedia.org/r/810027
[07:14:15] <wikibugs>	 (03PS6) 10Giuseppe Lavagetto: scap: use the new script to restart php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/810031
[07:14:17] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: scap: drop unused parameters from the configuration [puppet] - 10https://gerrit.wikimedia.org/r/810048
[07:15:12] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Productionize db2157 [puppet] - 10https://gerrit.wikimedia.org/r/810826 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[07:17:07] <wikibugs>	 (03PS8) 10Giuseppe Lavagetto: mediawiki: add scap restarts script [puppet] - 10https://gerrit.wikimedia.org/r/810027
[07:17:35] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] graphite: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810306 (owner: 10Muehlenhoff)
[07:19:49] <wikibugs>	 (03PS9) 10Giuseppe Lavagetto: mediawiki: add scap restarts script [puppet] - 10https://gerrit.wikimedia.org/r/810027
[07:20:23] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Productionize db2156 [puppet] - 10https://gerrit.wikimedia.org/r/810829 (https://phabricator.wikimedia.org/T311493)
[07:20:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] prometheus::postgres_exporter: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810318 (owner: 10Muehlenhoff)
[07:21:46] <wikibugs>	 (03PS2) 10Marostegui: mariadb: Productionize db2156 [puppet] - 10https://gerrit.wikimedia.org/r/810829 (https://phabricator.wikimedia.org/T311493)
[07:22:42] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Productionize db2156 [puppet] - 10https://gerrit.wikimedia.org/r/810829 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[07:25:03] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Remove insetup from db215[6-7] [puppet] - 10https://gerrit.wikimedia.org/r/810830 (https://phabricator.wikimedia.org/T311493)
[07:28:35] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
[07:28:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:53] <wikibugs>	 (03PS10) 10Giuseppe Lavagetto: mediawiki: add scap restarts script [puppet] - 10https://gerrit.wikimedia.org/r/810027
[07:32:01] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36175/console" [puppet] - 10https://gerrit.wikimedia.org/r/810027 (owner: 10Giuseppe Lavagetto)
[07:32:56] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: DHCPd: update config to log more info - https://phabricator.wikimedia.org/T309524 (10ayounsi)
[07:34:15] <wikibugs>	 (03CR) 10WMDE-Fisch: [C: 03+1] "Good to go now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808803 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight)
[07:34:24] <wikibugs>	 (03CR) 10WMDE-Fisch: [C: 03+1] "Good to go now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight)
[07:34:37] <wikibugs>	 (03PS2) 10WMDE-Fisch: Drop dependent feature flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808803 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight)
[07:35:49] <wikibugs>	 (03PS8) 10WMDE-Fisch: Drop deprecated feature flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight)
[07:37:04] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] site.pp: Remove insetup from db215[6-7] [puppet] - 10https://gerrit.wikimedia.org/r/810830 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[07:37:23] <wikibugs>	 10SRE, 10Infrastructure-Foundations: DHCPd: update config to log more info - https://phabricator.wikimedia.org/T309524 (10ayounsi)
[07:38:34] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove access for mattcleinman [puppet] - 10https://gerrit.wikimedia.org/r/810832
[07:38:41] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1082 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[07:38:56] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review, 10cloud-services-team (Kanban): Replace labstore100[67] with clouddumps100[12] - https://phabricator.wikimedia.org/T309346 (10ayounsi)
[07:39:35] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
[07:39:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:40:21] <icinga-wm>	 PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_hourly_appserver.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:40:49] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[07:43:03] <wikibugs>	 10SRE, 10ops-esams, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (10ayounsi)
[07:44:17] <wikibugs>	 10SRE, 10ops-esams, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (10ayounsi) a:05ayounsi→03RobH Thanks Faidon. I ended up emailing them as the dashboard seems limited. @robh: we can proceed with the decom of that box, followi...
[07:44:25] <wikibugs>	 10SRE, 10ops-esams, 10DC-Ops, 10Infrastructure-Foundations, 10decommission-hardware: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (10ayounsi)
[07:44:54] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove access for mattcleinman [puppet] - 10https://gerrit.wikimedia.org/r/810832 (owner: 10Muehlenhoff)
[07:45:03] <icinga-wm>	 RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:45:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Raise profile::cumin::monitoring_agentrun::crit [puppet] - 10https://gerrit.wikimedia.org/r/807497 (owner: 10Muehlenhoff)
[07:46:56] <wikibugs>	 (03CR) 10Vgutierrez: prometheus: Add custom vm.max_map_count metric (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/809038 (https://phabricator.wikimedia.org/T311445) (owner: 10BCornwall)
[07:47:11] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 3.774 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[07:56:45] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1 C: 03+2] define osm::planet_sync move from cron to systemd timers. [puppet] - 10https://gerrit.wikimedia.org/r/810304 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede)
[07:57:21] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] define osm::planet_sync move from cron to systemd timers. [puppet] - 10https://gerrit.wikimedia.org/r/810304 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede)
[07:58:24] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] P:aptrepo::wikimedia move private repo to nginx and uninstall apache [puppet] - 10https://gerrit.wikimedia.org/r/809969 (owner: 10Slyngshede)
[08:02:34] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netbox, 10netops: Netbox Juniper report - https://phabricator.wikimedia.org/T306238 (10ayounsi) @jbond, I was wondering if the upgrade went as expected. And if there was a timeline for OIDC. No urgency at all, even if it's 1, 6 or more months, just doing some planning :)
[08:04:12] <elukey>	 !log kill leftover processes of user `mewoph` on stat100x to allow puppet runs
[08:04:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:06:36] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.idm.logout Logging MewOphaswongse out of all services on: 1299 hosts
[08:06:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:07:07] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MewOphaswongse out of all services on: 1299 hosts
[08:07:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:07:21] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.idm.logout Logging MewOphaswongse out of all services on: 634 hosts
[08:07:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:07:37] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MewOphaswongse out of all services on: 634 hosts
[08:07:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:00] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10netops: drmrs: primary software task - https://phabricator.wikimedia.org/T282788 (10ayounsi)
[08:10:26] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1082 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:11:03] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10netops, 10Patch-For-Review: drmrs: initial geodns configuration - https://phabricator.wikimedia.org/T304089 (10ayounsi) 05Open→03Resolved a:03ayounsi I think everything here is done, and follow up is in {T311472} Feel free to re-open if needed.
[08:16:19] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add db2157 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/810838 (https://phabricator.wikimedia.org/T311493)
[08:17:26] <wikibugs>	 (03CR) 10Volans: [C: 03+2] sre.ganeti.makevm: Remove duplicated space [cookbooks] - 10https://gerrit.wikimedia.org/r/810338 (owner: 10David Caro)
[08:17:35] <wikibugs>	 (03CR) 10Volans: [C: 03+2] "Thanks for the fix" [cookbooks] - 10https://gerrit.wikimedia.org/r/810338 (owner: 10David Caro)
[08:18:05] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] instances.yaml: Add db2157 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/810838 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[08:18:07] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Validate new (anycast) IPv6 /48 announcement being accepted by transits - https://phabricator.wikimedia.org/T301900 (10ayounsi) @cmooney I had a quick look at NTT/Telia/Lumen looking glass and the prefix seems to be accepted properly.  Longer term, this could ma...
[08:18:43] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netbox, 10netops: Improve Netbox import script to avoid port-number collisions in JunOS - https://phabricator.wikimedia.org/T301392 (10ayounsi)
[08:21:09] <wikibugs>	 (03Merged) 10jenkins-bot: sre.ganeti.makevm: Remove duplicated space [cookbooks] - 10https://gerrit.wikimedia.org/r/810338 (owner: 10David Caro)
[08:22:07] <wikibugs>	 (03PS11) 10Giuseppe Lavagetto: mediawiki: add scap restarts script [puppet] - 10https://gerrit.wikimedia.org/r/810027
[08:23:36] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36176/console" [puppet] - 10https://gerrit.wikimedia.org/r/810027 (owner: 10Giuseppe Lavagetto)
[08:23:44] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] uwsgi: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810321 (owner: 10Muehlenhoff)
[08:24:05] <wikibugs>	 10SRE, 10Icinga, 10Observability-Alerting: PROBLEM: Icinga on alert2001.wikimedia.org is CRITICAL - https://phabricator.wikimedia.org/T311926 (10Volans) The alert has fired 22 times over the weekend. AFAICT all of them around the :34 minute mark, that seems suspiciously close to the start time of the timer f...
[08:24:07] <logmsgbot>	 !log marostegui@cumin2002 dbctl commit (dc=all): 'Add db2157 to s5 T311493', diff saved to https://phabricator.wikimedia.org/P30758 and previous config saved to /var/cache/conftool/dbconfig/20220704-082406-marostegui.json
[08:24:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:12] <stashbot>	 T311493: Productionize db2153.codfw.wmnet - db2174.codfw.wmnet - https://phabricator.wikimedia.org/T311493
[08:24:33] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netbox, 10netops: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10ayounsi)
[08:26:01] <wikibugs>	 (03PS12) 10Giuseppe Lavagetto: mediawiki: add scap restarts script [puppet] - 10https://gerrit.wikimedia.org/r/810027
[08:31:34] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10netops, 10serviceops: Support services VIPs with not marked as VIP in Netbox - https://phabricator.wikimedia.org/T295793 (10ayounsi) Pinging @jelto for gitlab (not sure if the issue is still present or relevant) and @akosiaris for lists1001 (I can confirm the mi...
[08:32:10] <wikibugs>	 10SRE, 10Icinga, 10Observability-Alerting: PROBLEM: Icinga on alert2001.wikimedia.org is CRITICAL - https://phabricator.wikimedia.org/T311926 (10jcrespo) Thanks, @Volans, to me that would indicate a problem in the way the alert is setup/coordinated, and less of the infrastructure itself (even if latency is a...
[08:32:59] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36177/console" [puppet] - 10https://gerrit.wikimedia.org/r/810027 (owner: 10Giuseppe Lavagetto)
[08:33:46] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1 C: 03+2] mediawiki: add scap restarts script [puppet] - 10https://gerrit.wikimedia.org/r/810027 (owner: 10Giuseppe Lavagetto)
[08:34:03] <wikibugs>	 (03PS13) 10Giuseppe Lavagetto: mediawiki: add scap restarts script [puppet] - 10https://gerrit.wikimedia.org/r/810027
[08:34:32] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 verb={CREATE,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[08:35:40] <wikibugs>	 (03PS1) 10Elukey: profile::thanos::swift: add a read only account for ml-serve [puppet] - 10https://gerrit.wikimedia.org/r/810840 (https://phabricator.wikimedia.org/T311628)
[08:38:34] <wikibugs>	 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: wait reboot time timeout on aqs nodes - https://phabricator.wikimedia.org/T307260 (10Volans) 05Open→03Resolved This should be resolved. Feel free to reopen it in case it's not.
[08:45:10] <wikibugs>	 (03CR) 10Slavina Stefanova: "out of curiosity, why run black and isort via bash scripts instead of using pre-commit?" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810295 (owner: 10David Caro)
[08:48:05] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Packet Drops on Eqiad ASW -> CR uplinks - https://phabricator.wikimedia.org/T291627 (10ayounsi) With {T304712} will give us the possibility to move to 40G uplinks (instead of 4x10G) for some rows: C as of now, and D once {T308331} is done. This could be a good t...
[08:50:13] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on an-worker1082 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough Btullis Investigating in T311991 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:52:01] <wikibugs>	 (03CR) 10David Caro: Add mypy, black and isort tests (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810295 (owner: 10David Caro)
[08:53:19] <wikibugs>	 10SRE, 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations, 10Patch-For-Review: Allow idrac ftp fetching of firmware updates (either to existing ftp or new solution) - https://phabricator.wikimedia.org/T283771 (10ayounsi)
[08:53:23] <wikibugs>	 (03PS1) 10Elukey: Upgrade kserve images to upstream release 0.8 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/810841 (https://phabricator.wikimedia.org/T311982)
[08:53:26] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE Observability (FY2022/2023-Q1), 10User-fgiunchedi: Spicerack: add support for Alertmanager - https://phabricator.wikimedia.org/T293209 (10Volans) >>! In T293209#8043558, @fgiunchedi wrote: > I don't know offhand how to best achiev...
[08:53:32] <wikibugs>	 (03PS3) 10Majavah: keyholder::monitoring: drop absented resources [puppet] - 10https://gerrit.wikimedia.org/r/810041
[08:53:41] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Drop dependent feature flags (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808803 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight)
[08:55:04] <wikibugs>	 (03CR) 10Slavina Stefanova: alerts: add a default duration of 1h (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810367 (owner: 10David Caro)
[08:56:59] <wikibugs>	 (03CR) 10David Caro: Add mypy, black and isort tests (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810295 (owner: 10David Caro)
[08:59:51] <wikibugs>	 (03PS7) 10David Caro: cloudnet: add show, reboot_node and roll_reboot_cloudnets [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810368
[09:00:29] <wikibugs>	 (03PS4) 10Jcrespo: bacula::director: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810325 (owner: 10Muehlenhoff)
[09:02:28] <wikibugs>	 (03CR) 10Slavina Stefanova: Add mypy, black and isort tests (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810295 (owner: 10David Caro)
[09:04:07] <wikibugs>	 (03CR) 10Slavina Stefanova: Add mypy, black and isort tests (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810295 (owner: 10David Caro)
[09:04:52] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] bacula::director: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810325 (owner: 10Muehlenhoff)
[09:05:04] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] cloudnet: add show, reboot_node and roll_reboot_cloudnets [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810368 (owner: 10David Caro)
[09:29:48] <wikibugs>	 (03CR) 10Slavina Stefanova: [C: 03+1] Add mypy, black and isort tests [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810295 (owner: 10David Caro)
[09:30:26] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: conftool::scripts: fix restart of multiple services [puppet] - 10https://gerrit.wikimedia.org/r/810850
[09:31:06] <wikibugs>	 (03PS2) 10Muehlenhoff: profile::mariadb::packages_client: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810847
[09:34:15] <wikibugs>	 (03Merged) 10jenkins-bot: Use our own alert managing [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/805108 (https://phabricator.wikimedia.org/T309789) (owner: 10David Caro)
[09:34:16] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs: added vm_console runbook [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/805316 (https://phabricator.wikimedia.org/T309930) (owner: 10David Caro)
[09:34:16] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs.ceph: don't use sre upgrade-and-reboot [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/805327 (https://phabricator.wikimedia.org/T309786) (owner: 10David Caro)
[09:34:18] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs: move alerting code to a library [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/805376 (https://phabricator.wikimedia.org/T309786) (owner: 10David Caro)
[09:34:20] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs.ceph.upgrade*: add sal logs [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/805377 (https://phabricator.wikimedia.org/T309786) (owner: 10David Caro)
[09:34:22] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs.ceph: move core code to a library [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/805741 (https://phabricator.wikimedia.org/T309786) (owner: 10David Caro)
[09:34:24] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs.alert/ceph: allow downtiming alerts [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/805742 (owner: 10David Caro)
[09:34:26] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs.openstack: Add runbook to increase the quotas [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/806429 (https://phabricator.wikimedia.org/T297606) (owner: 10David Caro)
[09:34:30] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] profile::mariadb::packages_client: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810847 (owner: 10Muehlenhoff)
[09:36:22] <wikibugs>	 (03PS3) 10Muehlenhoff: profile::mariadb::packages_client: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810847
[09:42:20] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/810847 (owner: 10Muehlenhoff)
[09:47:04] <wikibugs>	 (03CR) 10jenkins-bot: profile::mariadb::packages_client: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810847 (owner: 10Muehlenhoff)
[09:50:27] <wikibugs>	 (03CR) 10David Caro: alerts: add a default duration of 1h (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810367 (owner: 10David Caro)
[09:53:46] <wikibugs>	 (03CR) 10LSobanski: [C: 03+1] typos: add "vtrs" [puppet] - 10https://gerrit.wikimedia.org/r/810403 (owner: 10Dzahn)
[09:56:46] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:57:29] <wikibugs>	 (03PS3) 10David Caro: wmcs.openstack: move libs to it's own module [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/809543
[09:57:31] <wikibugs>	 (03PS2) 10David Caro: alerts: add a default duration of 1h [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810367
[09:57:33] <wikibugs>	 (03CR) 10David Caro: alerts: add a default duration of 1h (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810367 (owner: 10David Caro)
[09:57:35] <wikibugs>	 (03PS2) 10David Caro: wmcs.lib.openstack: move to a directory module [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810451
[09:57:37] <wikibugs>	 (03PS8) 10David Caro: cloudnet: add show, reboot_node and roll_reboot_cloudnets [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810368
[09:57:39] <wikibugs>	 (03PS1) 10David Caro: openstack: move known nodes to the openstack lib [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810854
[09:59:14] <wikibugs>	 (03CR) 10Slavina Stefanova: [C: 03+1] alerts: add a default duration of 1h [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810367 (owner: 10David Caro)
[10:12:02] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "beta-only, no-op for prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810452 (https://phabricator.wikimedia.org/T310905) (owner: 10Urbanecm)
[10:13:01] <Amir1>	 jouncebot: nowandnext
[10:13:01] <jouncebot>	 For the next 20 hour(s) and 46 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220704T0700)
[10:13:01] <jouncebot>	 In 20 hour(s) and 46 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220705T0700)
[10:13:23] <Amir1>	 sigh
[10:13:35] <Amir1>	 I'm going to backport the meta rc fix regardless
[10:13:43] <wikibugs>	 (03Merged) 10jenkins-bot: [beta] Temporarily allow everyone to enroll as mentor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810452 (https://phabricator.wikimedia.org/T310905) (owner: 10Urbanecm)
[10:13:51] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Revert "Revert "RecentChange: Straight join to actor table when needed"" [core] (wmf/1.39.0-wmf.18) - 10https://gerrit.wikimedia.org/r/810139 (https://phabricator.wikimedia.org/T311360) (owner: 10Zabe)
[10:13:57] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] RecentChange: Make join to comment table also straight [core] (wmf/1.39.0-wmf.18) - 10https://gerrit.wikimedia.org/r/810138 (https://phabricator.wikimedia.org/T311360) (owner: 10Zabe)
[10:14:16] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "beta-only, no-op for prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808268 (https://phabricator.wikimedia.org/T310905) (owner: 10Urbanecm)
[10:14:56] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, note that you'll have to set permissions/ACL on teh containers using the admin account" [puppet] - 10https://gerrit.wikimedia.org/r/810840 (https://phabricator.wikimedia.org/T311628) (owner: 10Elukey)
[10:15:35] <wikibugs>	 (03Merged) 10jenkins-bot: [beta] Growth: Enable structured mentor list at enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808268 (https://phabricator.wikimedia.org/T310905) (owner: 10Urbanecm)
[10:15:50] * urbanecm done
[10:15:53] <wikibugs>	 (03PS2) 10Hnowlan: similar-users: make max queries per account configurable [deployment-charts] - 10https://gerrit.wikimedia.org/r/808923 (https://phabricator.wikimedia.org/T310646)
[10:16:20] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] keyholder::monitoring: drop absented resources [puppet] - 10https://gerrit.wikimedia.org/r/810041 (owner: 10Majavah)
[10:16:22] <wikibugs>	 (03CR) 10Hnowlan: similar-users: make max queries per account configurable (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/808923 (https://phabricator.wikimedia.org/T310646) (owner: 10Hnowlan)
[10:17:05] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[10:17:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:18] <_joe_>	 !log upgraded etcdmirror to 0.0.7 on conf2006, now going with the rest of codfw
[10:17:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:18:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[10:18:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[10:18:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:18:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:18:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[10:18:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:30] <moritzm>	 !log installing gnupg2 security updates
[10:21:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:35] <_joe_>	 !log restarting etcdmirror on conf2005
[10:21:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:23:55] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[10:23:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:24:48] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[10:24:49] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[10:24:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:24:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:25] <godog>	 !log silence etcd p a g e 
[10:25:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:48] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[10:25:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:52] <_joe_>	 !log rollback etcdmirror to 0.0.6 on conf2005
[10:25:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:04] <icinga-wm>	 RECOVERY - k8s API server requests latencies on kubemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[10:30:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[10:30:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:34:18] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Revert "RecentChange: Straight join to actor table when needed"" [core] (wmf/1.39.0-wmf.18) - 10https://gerrit.wikimedia.org/r/810139 (https://phabricator.wikimedia.org/T311360) (owner: 10Zabe)
[10:34:24] <wikibugs>	 (03Merged) 10jenkins-bot: RecentChange: Make join to comment table also straight [core] (wmf/1.39.0-wmf.18) - 10https://gerrit.wikimedia.org/r/810138 (https://phabricator.wikimedia.org/T311360) (owner: 10Zabe)
[10:35:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[10:35:05] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[10:35:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:35:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:05] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[10:39:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[10:41:49] <wikibugs>	 (03CR) 10Jaime Nuche: "Thanks for the fix Daniel!" [puppet] - 10https://gerrit.wikimedia.org/r/808061 (https://phabricator.wikimedia.org/T310740) (owner: 10Dzahn)
[10:44:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[10:44:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:44:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[10:48:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[10:48:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[10:48:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:48:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:51:55] <wikibugs>	 (03PS1) 10Ladsgroup: Add statsd metric collection on db calls [extensions/GlobalBlocking] (wmf/1.39.0-wmf.18) - 10https://gerrit.wikimedia.org/r/810518 (https://phabricator.wikimedia.org/T307648)
[10:52:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[10:52:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:52:19] <kostajh>	 Amir1: if you're doing backports, can we deploy a GrowthExperiments patch that fixes a bad breakage?
[10:52:30] <Amir1>	 kostajh: I accept bribe
[10:52:42] <kostajh>	 hehe
[10:52:50] <Amir1>	 (I can do the backport, can you test it?)
[10:52:56] <kostajh>	 it's pretty minor https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/810509. And it's mitigated in that we have switched off the feature
[10:53:23] <Amir1>	 sure, let's do it
[10:53:23] <kostajh>	 so it could wait another ~20 hours or whatever
[10:53:27] <kostajh>	 alright
[10:53:33] <kostajh>	 I can test it, yes
[10:53:40] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] conftool::scripts: fix restart of multiple services [puppet] - 10https://gerrit.wikimedia.org/r/810850 (owner: 10Giuseppe Lavagetto)
[10:54:55] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.39.0-wmf.18/includes: Backport: [[gerrit:810139|Revert "Revert "RecentChange: Straight join to actor table when needed"" (T311360)]] (duration: 03m 49s)
[10:54:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:54:58] <stashbot>	 T311360: RecentChanges timing out - https://phabricator.wikimedia.org/T311360
[10:55:57] <wikibugs>	 (03PS1) 10Filippo Giunchedi: rest: fix getLag typo and add test [software/etcd-mirror] - 10https://gerrit.wikimedia.org/r/810864 (https://phabricator.wikimedia.org/T309546)
[10:58:22] <wikibugs>	 (03CR) 10Vlad.shapik: "recheck" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/800170 (https://phabricator.wikimedia.org/T252719) (owner: 10Vlad.shapik)
[10:59:00] <kostajh>	 urbanecm: maybe you are also around to help verify it on cswiki?
[10:59:05] <kostajh>	 Amir1: ready when you are :)
[10:59:15] <urbanecm>	 kostajh: sure thing
[10:59:21] <Amir1>	 I am ready, let me know when it's merged
[10:59:32] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] AddImageArticleTarget: Update to new mediaClass/mediaTag format [extensions/GrowthExperiments] (wmf/1.39.0-wmf.18) - 10https://gerrit.wikimedia.org/r/810509 (https://phabricator.wikimedia.org/T311916) (owner: 10Urbanecm)
[11:01:01] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] rest: fix getLag typo and add test [software/etcd-mirror] - 10https://gerrit.wikimedia.org/r/810864 (https://phabricator.wikimedia.org/T309546) (owner: 10Filippo Giunchedi)
[11:02:50] <wikibugs>	 (03Merged) 10jenkins-bot: rest: fix getLag typo and add test [software/etcd-mirror] - 10https://gerrit.wikimedia.org/r/810864 (https://phabricator.wikimedia.org/T309546) (owner: 10Filippo Giunchedi)
[11:06:29] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10CAS-SSO: Enable OIDC - https://phabricator.wikimedia.org/T311999 (10MoritzMuehlenhoff)
[11:06:45] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10CAS-SSO: Enable OIDC in CAS - https://phabricator.wikimedia.org/T311999 (10MoritzMuehlenhoff) p:05Triage→03Medium
[11:08:25] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable OIDC in Gradle build [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/810867 (https://phabricator.wikimedia.org/T311999)
[11:08:32] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netbox, 10netops: Netbox Juniper report - https://phabricator.wikimedia.org/T306238 (10MoritzMuehlenhoff) >>! In T306238#8047863, @ayounsi wrote: > @jbond, I was wondering if the upgrade went as expected. And if there was a timeline for OIDC. > No urgency at all, even...
[11:10:35] <wikibugs>	 (03CR) 10Hashar: "recheck after adding libexiv2-dev to the CI image ( https://gerrit.wikimedia.org/r/c/integration/config/+/810866 )" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/800170 (https://phabricator.wikimedia.org/T252719) (owner: 10Vlad.shapik)
[11:11:24] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=POST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[11:20:01] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add db2156 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/810874 (https://phabricator.wikimedia.org/T311493)
[11:21:06] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] instances.yaml: Add db2156 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/810874 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[11:22:45] <wikibugs>	 (03Merged) 10jenkins-bot: AddImageArticleTarget: Update to new mediaClass/mediaTag format [extensions/GrowthExperiments] (wmf/1.39.0-wmf.18) - 10https://gerrit.wikimedia.org/r/810509 (https://phabricator.wikimedia.org/T311916) (owner: 10Urbanecm)
[11:22:53] <urbanecm>	 Amir1: it's merged now
[11:23:54] <Amir1>	 will do it soon
[11:24:46] <kostajh>	 urbanecm: for testing, I guess we re-enable the tasktype on e.g. cswiki, then switch to mwdebug and verify an edit? Or do you think verifying an edit on testwiki is enough, then we can re-enable the tsak type on the wikis?
[11:25:30] <urbanecm>	 kostajh: I'd test at testwiki via mwdebug, if it works, re-enable at cswiki, test there too (still at mwdebug) and if it works, sync (and re-enable everywhere)
[11:26:24] <kostajh>	 ok
[11:27:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[11:27:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:31:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[11:31:30] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[11:31:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:31:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:32:17] <wikibugs>	 (03CR) 10Hashar: "recheck after adding libboost-python-dev to the CI image https://gerrit.wikimedia.org/r/810871" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/800170 (https://phabricator.wikimedia.org/T252719) (owner: 10Vlad.shapik)
[11:33:54] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[11:33:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:34:53] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] profile::mariadb::packages_wmf: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810846 (owner: 10Muehlenhoff)
[11:35:03] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] profile::mariadb::packages_client: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810847 (owner: 10Muehlenhoff)
[11:36:41] <logmsgbot>	 !log marostegui@cumin2002 dbctl commit (dc=all): 'Add db2156 to s3 T311493', diff saved to https://phabricator.wikimedia.org/P30774 and previous config saved to /var/cache/conftool/dbconfig/20220704-113640-marostegui.json
[11:36:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:36:44] <stashbot>	 T311493: Productionize db2153.codfw.wmnet - db2174.codfw.wmnet - https://phabricator.wikimedia.org/T311493
[11:39:54] <wikibugs>	 (03PS1) 10Matthias Mullie: Retrieve pages-with-suggestion via Elastic scroll directly [extensions/ImageSuggestions] (wmf/1.39.0-wmf.18) - 10https://gerrit.wikimedia.org/r/810889 (https://phabricator.wikimedia.org/T311476)
[11:40:57] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Promote db1173 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/810878 (https://phabricator.wikimedia.org/T311522)
[11:41:24] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Wait for the failover day" [puppet] - 10https://gerrit.wikimedia.org/r/810878 (https://phabricator.wikimedia.org/T311522) (owner: 10Marostegui)
[11:41:45] <Amir1>	 urbanecm: kostajh live in mwdebug1002
[11:41:54] <urbanecm>	 looking
[11:42:11] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Update s6-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/810881 (https://phabricator.wikimedia.org/T311522)
[11:42:23] <wikibugs>	 (03CR) 10Marostegui: "Wait for the failover day" [dns] - 10https://gerrit.wikimedia.org/r/810881 (https://phabricator.wikimedia.org/T311522) (owner: 10Marostegui)
[11:43:14] <urbanecm>	 https://test.wikipedia.org/w/index.php?title=Brudalen&diff=516168&oldid=485928: looks like an image was added. testing at cswiki now.
[11:43:22] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] mariadb: Promote db1173 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/810878 (https://phabricator.wikimedia.org/T311522) (owner: 10Marostegui)
[11:43:31] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] wmnet: Update s6-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/810881 (https://phabricator.wikimedia.org/T311522) (owner: 10Marostegui)
[11:43:40] <kostajh>	 urbanecm: ack, testwiki looks good for my edit as well
[11:44:39] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add statsd metric collection on db calls [extensions/GlobalBlocking] (wmf/1.39.0-wmf.18) - 10https://gerrit.wikimedia.org/r/810518 (https://phabricator.wikimedia.org/T307648) (owner: 10Ladsgroup)
[11:45:40] <urbanecm>	 cswiki works too: https://cs.wikipedia.org/w/index.php?title=Politick%C3%A1_a_pr%C3%A1vn%C3%AD_komise_%C3%BAst%C5%99edn%C3%ADho_v%C3%BDboru_Komunistick%C3%A9_strany_%C4%8C%C3%ADny&diff=21439374&oldid=19920770
[11:46:02] <kostajh>	 urbanecm: lgtm
[11:46:11] <urbanecm>	 Amir1: let's sync please!
[11:46:18] <Amir1>	 awesome
[11:48:08] <wikibugs>	 (03Merged) 10jenkins-bot: Add statsd metric collection on db calls [extensions/GlobalBlocking] (wmf/1.39.0-wmf.18) - 10https://gerrit.wikimedia.org/r/810518 (https://phabricator.wikimedia.org/T307648) (owner: 10Ladsgroup)
[11:50:00] <wikibugs>	 (03PS1) 10Jcrespo: Add new user for dbbackups database for django dashboard [puppet] - 10https://gerrit.wikimedia.org/r/810885 (https://phabricator.wikimedia.org/T283017)
[11:50:03] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addimage/AddImageArticleTarget.js: Backport: [[gerrit:810509|AddImageArticleTarget: Update to new mediaClass/mediaTag format (T311916)]] (duration: 03m 33s)
[11:50:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:08] <stashbot>	 T311916: "Add an image" structured edits add a blank line instead of an image - https://phabricator.wikimedia.org/T311916
[11:54:11] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[11:54:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[11:54:54] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[11:54:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:55:22] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.39.0-wmf.18/extensions/GlobalBlocking/includes/GlobalBlocking.php: Backport: [[gerrit:810518|Add statsd metric collection on db calls (T307648)]] (duration: 03m 26s)
[11:55:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:55:25] <stashbot>	 T307648: Audit database usage of GlobalBlocking extension - https://phabricator.wikimedia.org/T307648
[11:55:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[11:55:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:56:57] <wikibugs>	 (03PS2) 10Jcrespo: Add new user for dbbackups database for django dashboard [puppet] - 10https://gerrit.wikimedia.org/r/810885 (https://phabricator.wikimedia.org/T283017)
[11:58:02] <icinga-wm>	 RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[12:05:02] <wikibugs>	 (03CR) 10Jcrespo: "FYI" [puppet] - 10https://gerrit.wikimedia.org/r/810885 (https://phabricator.wikimedia.org/T283017) (owner: 10Jcrespo)
[12:10:25] <wikibugs>	 (03CR) 10Matěj Suchánek: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810890 (owner: 10Matěj Suchánek)
[12:12:52] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "lgtm. needs to be scheduled via a backport window." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810890 (owner: 10Matěj Suchánek)
[12:13:34] <icinga-wm>	 PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:15:54] <wikibugs>	 (03PS1) 10Ladsgroup: Switchover s4 master [puppet] - 10https://gerrit.wikimedia.org/r/810908 (https://phabricator.wikimedia.org/T311611)
[12:17:08] <wikibugs>	 (03PS1) 10Ladsgroup: Switchover s4 master [dns] - 10https://gerrit.wikimedia.org/r/810909 (https://phabricator.wikimedia.org/T311611)
[12:17:36] <moritzm>	 !log installing 4.9.320 on stretch hosts
[12:17:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:19:03] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: remove distro-based conditionals for blackbox [puppet] - 10https://gerrit.wikimedia.org/r/810910 (https://phabricator.wikimedia.org/T305847)
[12:20:22] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Switchover s4 master [puppet] - 10https://gerrit.wikimedia.org/r/810908 (https://phabricator.wikimedia.org/T311611) (owner: 10Ladsgroup)
[12:20:43] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Switchover s4 master [dns] - 10https://gerrit.wikimedia.org/r/810909 (https://phabricator.wikimedia.org/T311611) (owner: 10Ladsgroup)
[12:20:53] <wikibugs>	 (03CR) 10Ladsgroup: [C: 04-2] "until the day of switchover" [puppet] - 10https://gerrit.wikimedia.org/r/810908 (https://phabricator.wikimedia.org/T311611) (owner: 10Ladsgroup)
[12:21:20] <wikibugs>	 (03CR) 10Ladsgroup: [C: 04-2] "until the day of switchover" [dns] - 10https://gerrit.wikimedia.org/r/810909 (https://phabricator.wikimedia.org/T311611) (owner: 10Ladsgroup)
[12:21:56] <wikibugs>	 (03CR) 10Marostegui: "Was the "-" allowed in usernames?" [puppet] - 10https://gerrit.wikimedia.org/r/810885 (https://phabricator.wikimedia.org/T283017) (owner: 10Jcrespo)
[12:22:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: "This change will require some coordination between merging and upgrading blackbox exporter on buster hosts. I can take care of production," [puppet] - 10https://gerrit.wikimedia.org/r/810910 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[12:22:29] <wikibugs>	 (03CR) 10Vlad.shapik: "Thank you for the patch. The changes are used in the patch(Icabc39dab7347ac5c6d75f834a06ddfca5c4ca09) which this is partially based on." [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/806333 (owner: 10Hnowlan)
[12:23:23] <wikibugs>	 (03CR) 10Majavah: prometheus: remove distro-based conditionals for blackbox (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/810910 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[12:23:44] <wikibugs>	 10SRE, 10Thumbor, 10Wikimedia-SVG-rendering, 10Upstream: Incorrect text positioning in SVG rasterization (scale/transform; font-size; kerning) - https://phabricator.wikimedia.org/T36947 (10JoKalliauer)
[12:27:31] <_joe_>	 !log updated etcdmirror to 0.0.8 everywhere
[12:27:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:36:34] <wikibugs>	 (03CR) 10Jcrespo: "All characters are technically allowed, it was _ the one that gave us issues in the past (because it was a wildcard) not '-'." [puppet] - 10https://gerrit.wikimedia.org/r/810885 (https://phabricator.wikimedia.org/T283017) (owner: 10Jcrespo)
[12:37:50] <wikibugs>	 (03PS1) 10David Caro: wmcs.openstack.cloudgw: add reboot_node and roll_reboot_cloudgws [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810914
[12:37:52] <wikibugs>	 (03PS1) 10David Caro: wmcs.openstack: Use the known cloudcontrols instead of asking [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810915
[12:38:22] <jynus>	 !log running alter table on dbbackups db T283017
[12:38:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:38:26] <stashbot>	 T283017: Create a dashboard for database backups monitoring/reporting - https://phabricator.wikimedia.org/T283017
[12:40:47] <wikibugs>	 (03CR) 10Matěj Suchánek: Don't call deprecated IContextSource::getStats (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810890 (owner: 10Matěj Suchánek)
[12:43:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmcs.openstack.cloudgw: add reboot_node and roll_reboot_cloudgws [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810914 (owner: 10David Caro)
[12:43:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmcs.openstack: Use the known cloudcontrols instead of asking [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810915 (owner: 10David Caro)
[12:51:15] <godog>	 taavi: thanks for assisting re: blackbox-exporter, would you have time today ?
[12:52:48] <wikibugs>	 (03CR) 10Filippo Giunchedi: prometheus: remove distro-based conditionals for blackbox (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/810910 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[12:59:51] <wikibugs>	 (03PS1) 10Stang: zh(wikiversity|wiktionary): Disable local upload [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810916 (https://phabricator.wikimedia.org/T312012)
[13:08:32] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:10:44] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
[13:10:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:48] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
[13:11:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:14:58] <icinga-wm>	 RECOVERY - SSH on wtp1044.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:15:06] <wikibugs>	 (03PS1) 10Filippo Giunchedi: sre: add etcd-mirror lag page [alerts] - 10https://gerrit.wikimedia.org/r/810918 (https://phabricator.wikimedia.org/T309546)
[13:17:54] <wikibugs>	 (03PS1) 10Filippo Giunchedi: etcd: remove paging alert, moved to Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/810919 (https://phabricator.wikimedia.org/T309546)
[13:21:12] <wikibugs>	 (03PS2) 10David Caro: wmcs.openstack.cloudgw: add reboot_node and roll_reboot_cloudgws [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810914
[13:21:14] <wikibugs>	 (03PS2) 10David Caro: wmcs.openstack: Use the known cloudcontrols instead of asking [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810915
[13:22:59] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
[13:23:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:48] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
[13:24:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:15] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
[13:25:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:57] <wikibugs>	 (03CR) 10Jaime Nuche: "We don't need this change after Ahmon added this: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production" [puppet] - 10https://gerrit.wikimedia.org/r/807510 (https://phabricator.wikimedia.org/T310740) (owner: 10Jbond)
[13:27:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmcs.openstack.cloudgw: add reboot_node and roll_reboot_cloudgws [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810914 (owner: 10David Caro)
[13:27:13] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
[13:27:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:27:30] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmcs.openstack: Use the known cloudcontrols instead of asking [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810915 (owner: 10David Caro)
[13:34:36] <taavi>	 godog: sure, what about right now?
[13:35:09] <godog>	 taavi: SGTM!
[13:35:14] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
[13:35:42] <godog>	 taavi: see the procedure outlined in my comment, I'm upgrading blackbox-exporter now
[13:37:05] <taavi>	 the plan looks fine, lmk when I should start upgrading the toolforge hosts
[13:38:21] <godog>	 taavi: yeah you can upgrade, I used this fwiw apt -yq -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="-
[13:38:24] <godog>	 -force-confold" install prometheus-blackbox-exporter
[13:38:31] <godog>	 sigh, ok you get the idea, there's a dpkg prompt involved
[13:38:40] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
[13:39:03] <godog>	 !log upgrade prometheus-blackbox-exporter to 0.18.0+ds-3~bpo10+1 on prometheus and metricsinfra Buster hosts - T305847
[13:39:18] <wikibugs>	 (03PS1) 10Zabe: base: remove absented files [puppet] - 10https://gerrit.wikimedia.org/r/810925
[13:39:52] <godog>	 mmhh stashbot doesn't want to play with us atm
[13:40:15] <godog>	 taavi: I'm going to merge the change
[13:40:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: remove distro-based conditionals for blackbox [puppet] - 10https://gerrit.wikimedia.org/r/810910 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[13:40:28] <wikibugs>	 (03PS2) 10Filippo Giunchedi: prometheus: remove distro-based conditionals for blackbox [puppet] - 10https://gerrit.wikimedia.org/r/810910 (https://phabricator.wikimedia.org/T305847)
[13:40:51] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2] prometheus: remove distro-based conditionals for blackbox [puppet] - 10https://gerrit.wikimedia.org/r/810910 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[13:43:09] <taavi>	 ok, all done on my side
[13:43:45] <jinxer-wm>	 (JobUnavailable) firing: (8) Reduced availability for job blackbox/pingthing_proxied in ops@drmrs - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:43:56] <icinga-wm>	 PROBLEM - Check systemd state on prometheus3001 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-blackbox-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:44:24] <icinga-wm>	 PROBLEM - Check systemd state on prometheus4001 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-blackbox-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:44:24] <icinga-wm>	 PROBLEM - Check systemd state on prometheus6001 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-blackbox-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:44:38] <icinga-wm>	 PROBLEM - Check systemd state on prometheus5001 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-blackbox-exporter.service,wmf_auto_restart_prometheus-blackbox-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:44:43] <vgutierrez>	 expected? :)
[13:44:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job blackbox/pingthing in ops@drmrs - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:44:53] <taavi>	 yes
[13:44:59] <godog>	 kinda, I'm looking into it
[13:46:03] <wikibugs>	 10SRE-OnFire, 10DBA, 10Sustainability (Incident Followup): Investigate mariadb 10.6 performance regression during spikes/high load - https://phabricator.wikimedia.org/T311106 (10Marostegui) Just talked to Amir about the next test. Tomorrow morning we'll: - Depool db1132 - Enable performance schema - Repool i...
[13:46:32] <godog>	 ah yes, I goofed that one up, fixing
[13:46:53] <wikibugs>	 (03PS1) 10Elukey: profile::thanos::swift: add mlserve_ro account [labs/private] - 10https://gerrit.wikimedia.org/r/810926 (https://phabricator.wikimedia.org/T311628)
[13:47:22] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] profile::thanos::swift: add mlserve_ro account [labs/private] - 10https://gerrit.wikimedia.org/r/810926 (https://phabricator.wikimedia.org/T311628) (owner: 10Elukey)
[13:47:50] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/809338 (https://phabricator.wikimedia.org/T306654) (owner: 10Ssingh)
[13:48:45] <jinxer-wm>	 (JobUnavailable) firing: (19) Reduced availability for job blackbox/icmp in ops@drmrs - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:49:00] <wikibugs>	 (03PS1) 10Filippo Giunchedi: wmflib: remove distro conditionals from blackbox http module options [puppet] - 10https://gerrit.wikimedia.org/r/810927 (https://phabricator.wikimedia.org/T309546)
[13:49:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[13:49:29] <wikibugs>	 (03PS2) 10Filippo Giunchedi: wmflib: remove distro conditionals from blackbox http module options [puppet] - 10https://gerrit.wikimedia.org/r/810927 (https://phabricator.wikimedia.org/T309546)
[13:49:48] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
[13:49:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:53] <godog>	 serves me well for not testing changes in pontoon first
[13:51:10] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] wmflib: remove distro conditionals from blackbox http module options [puppet] - 10https://gerrit.wikimedia.org/r/810927 (https://phabricator.wikimedia.org/T309546) (owner: 10Filippo Giunchedi)
[13:51:12] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] admin: allow sudo for jclark-ctr for cookbooks [puppet] - 10https://gerrit.wikimedia.org/r/809338 (https://phabricator.wikimedia.org/T306654) (owner: 10Ssingh)
[13:51:20] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1069 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:51:35] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Create stewards-usergroup private mailing list - https://phabricator.wikimedia.org/T312018 (10MarcoAurelio)
[13:51:40] <wikibugs>	 (03PS2) 10Ssingh: admin: allow sudo for jclark-ctr for cookbooks [puppet] - 10https://gerrit.wikimedia.org/r/809338 (https://phabricator.wikimedia.org/T306654)
[13:52:28] <wikibugs>	 (03CR) 10Ssingh: "rebased, no code change" [puppet] - 10https://gerrit.wikimedia.org/r/809338 (https://phabricator.wikimedia.org/T306654) (owner: 10Ssingh)
[13:52:33] <wikibugs>	 (03CR) 10Ssingh: [V: 03+2 C: 03+2] admin: allow sudo for jclark-ctr for cookbooks [puppet] - 10https://gerrit.wikimedia.org/r/809338 (https://phabricator.wikimedia.org/T306654) (owner: 10Ssingh)
[13:53:50] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations (FY2021/2022-Q4), 10Patch-For-Review: Request sudo access for Jclark-ctr - https://phabricator.wikimedia.org/T306654 (10ssingh)
[13:54:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[13:54:18] <icinga-wm>	 RECOVERY - Check systemd state on prometheus4001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:54:20] <icinga-wm>	 RECOVERY - Check systemd state on prometheus6001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:54:46] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1069 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:54:52] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
[13:54:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:56] <godog>	 ok we're back
[13:56:22] <icinga-wm>	 RECOVERY - Check systemd state on prometheus3001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:56:53] <wikibugs>	 (03PS3) 10David Caro: wmcs.openstack.cloudgw: add reboot_node and roll_reboot_cloudgws [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810914
[13:56:55] <wikibugs>	 (03PS3) 10David Caro: wmcs.openstack: Use the known cloudcontrols instead of asking [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810915
[13:57:13] <godog>	 taavi: should be all good, i.e. upgrading blackbox-exporter and then running puppet, confirmed it works in production
[13:58:26] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: prometheus: adjust check::http params based on distro [puppet] - 10https://gerrit.wikimedia.org/r/809586 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[13:58:28] <taavi>	 great
[13:58:45] <jinxer-wm>	 (JobUnavailable) resolved: (19) Reduced availability for job blackbox/icmp in ops@drmrs - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:59:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[13:59:45] <jinxer-wm>	 (JobUnavailable) resolved: (8) Reduced availability for job blackbox/pingthing in ops@drmrs - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:01:07] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Create stewards-usergroup private mailing list - https://phabricator.wikimedia.org/T312018 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup
[14:01:30] <wikibugs>	 (03PS6) 10Filippo Giunchedi: icinga: check commons.w.o with blackbox exporter [puppet] - 10https://gerrit.wikimedia.org/r/804274 (https://phabricator.wikimedia.org/T305847)
[14:01:32] <wikibugs>	 (03PS5) 10Filippo Giunchedi: WIP irc check via blackbox [puppet] - 10https://gerrit.wikimedia.org/r/805815
[14:01:34] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: disable protocol fallback for blackbox::check::http [puppet] - 10https://gerrit.wikimedia.org/r/810929 (https://phabricator.wikimedia.org/T305847)
[14:01:38] <wikibugs>	 (03PS3) 10Ladsgroup: Set GlobalBlockingAllowedRanges for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810055 (https://phabricator.wikimedia.org/T307648)
[14:01:49] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Set GlobalBlockingAllowedRanges for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810055 (https://phabricator.wikimedia.org/T307648) (owner: 10Ladsgroup)
[14:03:00] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: scap: use the new script to restart php-fpm (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/810031 (owner: 10Giuseppe Lavagetto)
[14:03:12] <wikibugs>	 (03CR) 10Elukey: "Found the following after PCC:" [puppet] - 10https://gerrit.wikimedia.org/r/810840 (https://phabricator.wikimedia.org/T311628) (owner: 10Elukey)
[14:03:15] <wikibugs>	 (03Merged) 10jenkins-bot: Set GlobalBlockingAllowedRanges for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810055 (https://phabricator.wikimedia.org/T307648) (owner: 10Ladsgroup)
[14:04:45] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: add initial blackbox dns probes for wikipedia [puppet] - 10https://gerrit.wikimedia.org/r/809535 (https://phabricator.wikimedia.org/T169860) (owner: 10Filippo Giunchedi)
[14:05:06] <wikibugs>	 (03PS2) 10Elukey: profile::thanos::swift: add a read only account for ml-serve [puppet] - 10https://gerrit.wikimedia.org/r/810840 (https://phabricator.wikimedia.org/T311628)
[14:05:09] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] "Thank you Brett for the review! Merging, we can add/iterate on the probes at a later stage too" [puppet] - 10https://gerrit.wikimedia.org/r/809535 (https://phabricator.wikimedia.org/T169860) (owner: 10Filippo Giunchedi)
[14:05:10] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
[14:05:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:05:13] <_joe_>	 jouncebot: next
[14:05:13] <jouncebot>	 In 16 hour(s) and 54 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220705T0700)
[14:05:15] <wikibugs>	 (03PS2) 10Filippo Giunchedi: prometheus: add initial blackbox dns probes for wikipedia [puppet] - 10https://gerrit.wikimedia.org/r/809535 (https://phabricator.wikimedia.org/T169860)
[14:05:20] <_joe_>	 ok there's time :)
[14:05:50] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
[14:05:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:14] <icinga-wm>	 PROBLEM - puppet last run on ms-be2028 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[14:06:22] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] scap: use the new script to restart php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/810031 (owner: 10Giuseppe Lavagetto)
[14:06:30] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36179/console" [puppet] - 10https://gerrit.wikimedia.org/r/810840 (https://phabricator.wikimedia.org/T311628) (owner: 10Elukey)
[14:06:32] <wikibugs>	 (03PS7) 10Giuseppe Lavagetto: scap: use the new script to restart php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/810031
[14:07:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:07:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:08:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:08:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:08:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:08:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[14:09:59] <wikibugs>	 (03CR) 10Filippo Giunchedi: "My bad, account_name is AUTH_mlserve (i.e. same as the rw user) though you need stats_enabled: no for this account" [puppet] - 10https://gerrit.wikimedia.org/r/810840 (https://phabricator.wikimedia.org/T311628) (owner: 10Elukey)
[14:10:03] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] "LGTM Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/810925 (owner: 10Zabe)
[14:10:36] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:10:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:42] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:810055|Set GlobalBlockingAllowedRanges for testwiki (T307648)]] (duration: 03m 39s)
[14:10:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:46] <stashbot>	 T307648: Audit database usage of GlobalBlocking extension - https://phabricator.wikimedia.org/T307648
[14:10:56] <godog>	 dcaro: merged your change too
[14:11:20] <dcaro>	 godog: thanks! I was trying to find out the user<->irc name xd
[14:11:29] <wikibugs>	 (03PS1) 10Ladsgroup: Excempt WMCS ranges from globalblocking everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810932 (https://phabricator.wikimedia.org/T307648)
[14:11:31] <wikibugs>	 (03PS3) 10Elukey: profile::thanos::swift: add a read only account for ml-serve [puppet] - 10https://gerrit.wikimedia.org/r/810840 (https://phabricator.wikimedia.org/T311628)
[14:11:47] <godog>	 lolz
[14:12:06] <wikibugs>	 (03PS3) 10Filippo Giunchedi: prometheus: probe DNS for (www).wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/809536 (https://phabricator.wikimedia.org/T169860)
[14:12:32] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36180/console" [puppet] - 10https://gerrit.wikimedia.org/r/810840 (https://phabricator.wikimedia.org/T311628) (owner: 10Elukey)
[14:13:34] <wikibugs>	 (03PS2) 10Ladsgroup: Exempt WMCS ranges from globalblocking everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810932 (https://phabricator.wikimedia.org/T307648)
[14:14:15] <_joe_>	 I'll do a null deploy to verify the new php restart script works
[14:14:25] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] profile::thanos::swift: add a read only account for ml-serve [puppet] - 10https://gerrit.wikimedia.org/r/810840 (https://phabricator.wikimedia.org/T311628) (owner: 10Elukey)
[14:14:41] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
[14:14:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:07] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] profile::thanos::swift: add a read only account for ml-serve [puppet] - 10https://gerrit.wikimedia.org/r/810840 (https://phabricator.wikimedia.org/T311628) (owner: 10Elukey)
[14:15:36] <wikibugs>	 (03PS4) 10David Caro: P:wmcs: unify toolsdb profiles [puppet] - 10https://gerrit.wikimedia.org/r/789611 (owner: 10Majavah)
[14:15:48] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1082 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[14:15:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: probe DNS for (www).wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/809536 (https://phabricator.wikimedia.org/T169860) (owner: 10Filippo Giunchedi)
[14:16:26] <elukey>	 Emperor, godog - o/ I am going to roll restart thanos-fe's swift-proxy for https://gerrit.wikimedia.org/r/c/operations/puppet/+/810840/
[14:16:29] <elukey>	 (enable a new account)
[14:16:38] <godog>	 elukey: ack
[14:17:06] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
[14:17:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:37] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Exempt WMCS ranges from globalblocking everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810932 (https://phabricator.wikimedia.org/T307648) (owner: 10Ladsgroup)
[14:17:50] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: disable protocol fallback for blackbox::check::http [puppet] - 10https://gerrit.wikimedia.org/r/810929 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[14:18:07] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
[14:18:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:30] <wikibugs>	 10SRE, 10SRE-OnFire, 10Shellbox, 10serviceops, 10Sustainability (Incident Followup): Shellbox resource management - https://phabricator.wikimedia.org/T310557 (10LSobanski)
[14:18:41] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
[14:18:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:11] <wikibugs>	 (03Merged) 10jenkins-bot: Exempt WMCS ranges from globalblocking everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810932 (https://phabricator.wikimedia.org/T307648) (owner: 10Ladsgroup)
[14:19:43] <elukey>	 !log roll restart of thanos-fe's proxy to pick up a new account - T311628
[14:19:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:47] <stashbot>	 T311628: Create Swift account for readonly access to ML models - https://phabricator.wikimedia.org/T311628
[14:20:54] <logmsgbot>	 !log oblivian@deploy1002 Synchronized README: testing new php restart script (duration: 03m 23s)
[14:20:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:10] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
[14:25:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:44] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:25:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:41] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
[14:26:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:16] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:810932|Exempt WMCS ranges from globalblocking everywhere (T307648)]] (duration: 03m 26s)
[14:27:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:19] <stashbot>	 T307648: Audit database usage of GlobalBlocking extension - https://phabricator.wikimedia.org/T307648
[14:28:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:28:24] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:28:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:28:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:28:41] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: mediawiki: install php7.4 on the canaries [puppet] - 10https://gerrit.wikimedia.org/r/808909 (https://phabricator.wikimedia.org/T311386)
[14:29:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:29:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:32:21] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
[14:32:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:09] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: install php7.4 on the canaries [puppet] - 10https://gerrit.wikimedia.org/r/808909 (https://phabricator.wikimedia.org/T311386) (owner: 10Giuseppe Lavagetto)
[14:35:20] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
[14:35:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:16] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1082 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[14:54:00] <wikibugs>	 (03CR) 10Ayounsi: reports.network: improve IPv6 AAAA records checks (034 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/807986 (owner: 10Volans)
[14:54:18] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "Small suggestion, LGTM otherwise!" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/807986 (owner: 10Volans)
[15:00:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] sre: add etcd-mirror lag page [alerts] - 10https://gerrit.wikimedia.org/r/810918 (https://phabricator.wikimedia.org/T309546) (owner: 10Filippo Giunchedi)
[15:01:30] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] etcd: remove paging alert, moved to Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/810919 (https://phabricator.wikimedia.org/T309546) (owner: 10Filippo Giunchedi)
[15:02:50] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] etcd: remove paging alert, moved to Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/810919 (https://phabricator.wikimedia.org/T309546) (owner: 10Filippo Giunchedi)
[15:02:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] sre: add etcd-mirror lag page [alerts] - 10https://gerrit.wikimedia.org/r/810918 (https://phabricator.wikimedia.org/T309546) (owner: 10Filippo Giunchedi)
[15:09:19] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10netops, 10serviceops: Support services VIPs with not marked as VIP in Netbox - https://phabricator.wikimedia.org/T295793 (10akosiaris) >>! In T295793#8047985, @ayounsi wrote: > Pinging @jelto for gitlab (not sure if the issue is still present or relevant) and @a...
[15:10:03] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:10:06] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10serviceops-collab, and 2 others: replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10akosiaris) Many thanks for the work on this one @Dzahn!
[15:13:47] <icinga-wm>	 PROBLEM - puppet last run on deneb is CRITICAL: CRITICAL: Puppet last ran 4 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[15:14:04] <vgutierrez>	 4 days ago?
[15:15:05] <jynus>	 I am more worried about the ns2-v4 Auth DNS check being in soft state
[15:15:39] <vgutierrez>	 I'm not seeing anything related on /alerts
[15:21:17] <moritzm>	 I disabled puppet on deneb for some debugging on Friday and just re-enabled it
[15:21:31] <moritzm>	 all benign :-)
[15:23:05] <vgutierrez>	 jynus: forced a recheck.. looking good again
[15:23:22] <jynus>	 ok, so then it was just a fluke + long time between rechecks
[15:23:25] <vgutierrez>	 yup
[15:23:30] <vgutierrez>	 ns2 looks good
[15:23:47] <wikibugs>	 (03PS2) 10Volans: reports.network: improve IPv6 AAAA records checks [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/807986
[15:24:13] <wikibugs>	 (03CR) 10Volans: "addressed comment" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/807986 (owner: 10Volans)
[15:28:23] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[15:28:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:28:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[15:28:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[15:28:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:28:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:26] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[15:29:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T305300)', diff saved to https://phabricator.wikimedia.org/P30781 and previous config saved to /var/cache/conftool/dbconfig/20220704-152931-ladsgroup.json
[15:29:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:34] <stashbot>	 T305300: Add lu_attachment_method column to localuser table - https://phabricator.wikimedia.org/T305300
[15:30:47] <wikibugs>	 10SRE-swift-storage, 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Create Swift account for readonly access to ML models - https://phabricator.wikimedia.org/T311628 (10elukey) The new mlserve:ro account has been added, but if I try to use s3cmd with the new credentials I get an error:  ` elukey@stat10...
[15:30:53] <icinga-wm>	 RECOVERY - puppet last run on deneb is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[15:31:46] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[15:31:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[15:31:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[15:31:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[15:31:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:32:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[15:32:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:32:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[15:32:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:32:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30782 and previous config saved to /var/cache/conftool/dbconfig/20220704-153218-ladsgroup.json
[15:32:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:32:22] <stashbot>	 T312027: Fix enwikivoyage drifts on pagelinks - https://phabricator.wikimedia.org/T312027
[15:33:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P30783 and previous config saved to /var/cache/conftool/dbconfig/20220704-153306-ladsgroup.json
[15:33:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:45] <wikibugs>	 (03CR) 10Muehlenhoff: doc: remove support for stretch, add support for bullseye (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/810401 (https://phabricator.wikimedia.org/T247653) (owner: 10Dzahn)
[15:34:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30784 and previous config saved to /var/cache/conftool/dbconfig/20220704-153428-ladsgroup.json
[15:34:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:35:23] <jynus>	 volans: :34 again, you were into something...
[15:36:29] <volans>	 :)
[15:37:21] <wikibugs>	 (03CR) 10Volans: [C: 03+2] reports.network: improve IPv6 AAAA records checks [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/807986 (owner: 10Volans)
[15:37:26] <jynus>	 for context for others, I was talking about insightful suggestion (I wouldn't have noticed myself) T311926#8047952
[15:37:26] <stashbot>	 T311926: PROBLEM: Icinga on alert2001.wikimedia.org is CRITICAL - https://phabricator.wikimedia.org/T311926
[15:37:59] <wikibugs>	 (03Abandoned) 10Volans: Netbox location: fix naming [puppet] - 10https://gerrit.wikimedia.org/r/807997 (owner: 10Volans)
[15:38:01] <jynus>	 hopefully tomorrow someone can have a look
[15:38:07] <wikibugs>	 (03Merged) 10jenkins-bot: reports.network: improve IPv6 AAAA records checks [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/807986 (owner: 10Volans)
[15:48:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P30785 and previous config saved to /var/cache/conftool/dbconfig/20220704-154810-ladsgroup.json
[15:48:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:49:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P30786 and previous config saved to /var/cache/conftool/dbconfig/20220704-154933-ladsgroup.json
[15:49:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:58:26] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] scap: Make scap3 provider packages depend on /usr/bin/scap (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/809270 (https://phabricator.wikimedia.org/T310740) (owner: 10Ahmon Dancy)
[16:03:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P30787 and previous config saved to /var/cache/conftool/dbconfig/20220704-160314-ladsgroup.json
[16:03:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:45] <icinga-wm>	 PROBLEM - puppet last run on ms-be2030 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[16:04:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P30788 and previous config saved to /var/cache/conftool/dbconfig/20220704-160439-ladsgroup.json
[16:04:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:51] <wikibugs>	 (03PS1) 10David Caro: novafullstack: black and isort [puppet] - 10https://gerrit.wikimedia.org/r/810949
[16:09:53] <wikibugs>	 (03PS1) 10David Caro: novafullstack: add types and some names refactor [puppet] - 10https://gerrit.wikimedia.org/r/810950
[16:10:16] <wikibugs>	 (03PS1) 10Aqu: [WIP] Build spark assembly for Spark3 [puppet] - 10https://gerrit.wikimedia.org/r/810951 (https://phabricator.wikimedia.org/T310578)
[16:10:58] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] novafullstack: add types and some names refactor [puppet] - 10https://gerrit.wikimedia.org/r/810950 (owner: 10David Caro)
[16:15:11] <icinga-wm>	 RECOVERY - Checks that the local airflow scheduler for airflow @research is working properly on an-airflow1002 is OK: OK: /usr/bin/env AIRFLOW_HOME=/srv/airflow-research /usr/lib/airflow/bin/airflow jobs check --job-type SchedulerJob --hostname an-airflow1002.eqiad.wmnet succeeded https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow
[16:18:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P30789 and previous config saved to /var/cache/conftool/dbconfig/20220704-161817-ladsgroup.json
[16:18:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:19:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30790 and previous config saved to /var/cache/conftool/dbconfig/20220704-161944-ladsgroup.json
[16:19:46] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[16:19:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:19:49] <stashbot>	 T312027: Fix enwikivoyage drifts on pagelinks - https://phabricator.wikimedia.org/T312027
[16:19:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:10] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[16:20:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30791 and previous config saved to /var/cache/conftool/dbconfig/20220704-162015-ladsgroup.json
[16:20:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30792 and previous config saved to /var/cache/conftool/dbconfig/20220704-162225-ladsgroup.json
[16:22:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P30793 and previous config saved to /var/cache/conftool/dbconfig/20220704-163730-ladsgroup.json
[16:37:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:47:29] <wikibugs>	 (03PS1) 10Volans: scripts/hiera_export: add ganeti group [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/810955
[16:48:02] <wikibugs>	 (03PS1) 10Volans: netbox::host: adapt to new Netbox data [puppet] - 10https://gerrit.wikimedia.org/r/810956
[16:48:29] <wikibugs>	 (03CR) 10Volans: "See also the related I4a857d6c14c227a810233ff1259d5b01635005b0" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/810955 (owner: 10Volans)
[16:48:39] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: mailman3: First/Last name should not be mandatory fields - https://phabricator.wikimedia.org/T312020 (10ssingh) p:05Triage→03Medium
[16:51:03] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1082 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[16:52:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P30795 and previous config saved to /var/cache/conftool/dbconfig/20220704-165235-ladsgroup.json
[16:52:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:55:11] <wikibugs>	 (03PS1) 10Zabe: vrts: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/810957 (https://phabricator.wikimedia.org/T308013)
[16:55:13] <wikibugs>	 (03PS1) 10Zabe: sudo: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/810958 (https://phabricator.wikimedia.org/T308013)
[16:55:15] <wikibugs>	 (03PS1) 10Zabe: vagrant: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/810959 (https://phabricator.wikimedia.org/T308013)
[16:55:17] <wikibugs>	 (03PS1) 10Zabe: utils: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/810960 (https://phabricator.wikimedia.org/T308013)
[17:01:36] <wikibugs>	 (03PS1) 10Muehlenhoff: bacula::storage: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/810961
[17:07:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30796 and previous config saved to /var/cache/conftool/dbconfig/20220704-170740-ladsgroup.json
[17:07:42] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[17:07:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:07:46] <stashbot>	 T312027: Fix enwikivoyage drifts on pagelinks - https://phabricator.wikimedia.org/T312027
[17:07:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:07:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[17:07:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:08:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30797 and previous config saved to /var/cache/conftool/dbconfig/20220704-170800-ladsgroup.json
[17:08:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30798 and previous config saved to /var/cache/conftool/dbconfig/20220704-170910-ladsgroup.json
[17:09:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:24:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P30799 and previous config saved to /var/cache/conftool/dbconfig/20220704-172415-ladsgroup.json
[17:24:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:39:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P30800 and previous config saved to /var/cache/conftool/dbconfig/20220704-173920-ladsgroup.json
[17:39:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30801 and previous config saved to /var/cache/conftool/dbconfig/20220704-175425-ladsgroup.json
[17:54:27] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db1100.eqiad.wmnet with reason: Maintenance
[17:54:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:31] <stashbot>	 T312027: Fix enwikivoyage drifts on pagelinks - https://phabricator.wikimedia.org/T312027
[17:54:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:41] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1100.eqiad.wmnet with reason: Maintenance
[17:54:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30802 and previous config saved to /var/cache/conftool/dbconfig/20220704-175446-ladsgroup.json
[17:54:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:56:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30803 and previous config saved to /var/cache/conftool/dbconfig/20220704-175655-ladsgroup.json
[17:56:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:51] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:07:13] <wikibugs>	 (03PS1) 10Raymond Ndibe: wmcs: changes to api service to manage toolforge replica.my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/810965 (https://phabricator.wikimedia.org/T304040)
[18:09:36] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[18:12:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P30804 and previous config saved to /var/cache/conftool/dbconfig/20220704-181200-ladsgroup.json
[18:12:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:15:02] <wikibugs>	 (03PS4) 10Raymond Ndibe: Modify maintain-dbusers.py to call the rest-api service [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040)
[18:20:21] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Modify maintain-dbusers.py to call the rest-api service [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[18:22:07] <wikibugs>	 (03CR) 10Ori: "This change is ready for review." [software/varnish/libvmod-querysort] - 10https://gerrit.wikimedia.org/r/810552 (owner: 10Ori)
[18:27:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P30805 and previous config saved to /var/cache/conftool/dbconfig/20220704-182706-ladsgroup.json
[18:27:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:42:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30806 and previous config saved to /var/cache/conftool/dbconfig/20220704-184211-ladsgroup.json
[18:42:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[18:42:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:42:16] <stashbot>	 T312027: Fix enwikivoyage drifts on pagelinks - https://phabricator.wikimedia.org/T312027
[18:42:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:42:26] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[18:42:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30807 and previous config saved to /var/cache/conftool/dbconfig/20220704-184231-ladsgroup.json
[18:42:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:42:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:06] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudservices1003.wikimedia.org
[18:43:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:50] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.wikimedia.org
[18:43:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:44:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30808 and previous config saved to /var/cache/conftool/dbconfig/20220704-184440-ladsgroup.json
[18:44:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:59] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudcontrol1003.wikimedia.org
[18:52:01] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1003.wikimedia.org
[18:52:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:17] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
[18:52:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:53:12] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.wikimedia.org
[18:53:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:53:49] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org
[18:53:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:59:31] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1004.wikimedia.org
[18:59:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:59:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P30809 and previous config saved to /var/cache/conftool/dbconfig/20220704-185945-ladsgroup.json
[18:59:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:42] <wikibugs>	 (03PS5) 10Raymond Ndibe: Modify maintain-dbusers.py to call the rest-api service [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040)
[19:01:33] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudcontrol2001-dev.wikimedia.org
[19:01:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:01:42] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org
[19:01:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:07:23] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1003.wikimedia.org
[19:07:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:14:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P30810 and previous config saved to /var/cache/conftool/dbconfig/20220704-191450-ladsgroup.json
[19:14:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:15:02] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2001-dev.wikimedia.org
[19:15:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:15:25] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
[19:15:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:09] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudcontrol2003-dev.wikimedia.org
[19:17:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:22:39] <wikibugs>	 (03PS6) 10Raymond Ndibe: Modify maintain-dbusers.py to call the rest-api service [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040)
[19:26:51] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
[19:26:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:26:57] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
[19:26:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:27:25] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudcontrol2004-dev.wikimedia.org
[19:27:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:28:15] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2003-dev.wikimedia.org
[19:28:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:29:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30811 and previous config saved to /var/cache/conftool/dbconfig/20220704-192955-ladsgroup.json
[19:29:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
[19:30:00] <stashbot>	 T312027: Fix enwikivoyage drifts on pagelinks - https://phabricator.wikimedia.org/T312027
[19:30:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
[19:30:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on 8 hosts with reason: Maintenance
[19:30:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 8 hosts with reason: Maintenance
[19:30:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[19:30:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:55] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[19:30:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:31:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[19:31:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:31:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[19:31:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:38:35] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1005.wikimedia.org
[19:38:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:40:47] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2004-dev.wikimedia.org
[19:40:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:43:23] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1044 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:43:24] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1023 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:43:24] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:43:43] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:43:43] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1029 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:43:45] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:43:47] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1027 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:11] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:13] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1019 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:17] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1026 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:23] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1041 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:24] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1038 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:31] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1017 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:31] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1021 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:32] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1028 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:33] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1020 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:43] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1043 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:43] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1035 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:44] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1030 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:53] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:44:54] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:45:01] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1036 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:45:01] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1045 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:45:17] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1033 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:45:23] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1047 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:45:33] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1046 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:45:34] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1037 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:45:34] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:45:35] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1034 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:45:41] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1039 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:47:33] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1045 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:47:55] <icinga-wm>	 PROBLEM - SSH on wtp1038.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:49:19] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:49:20] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:49:25] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1026 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:49:31] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1041 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:49:31] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1038 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:49:40] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:49:40] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1021 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:49:40] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1028 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:49:43] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1020 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:49:51] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1035 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:49:51] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:50:01] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1024 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:50:10] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1036 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:50:40] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1037 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:50:40] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1031 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:50:40] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1034 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:50:47] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1039 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:51:07] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1023 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:51:08] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:51:25] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1029 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:51:29] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1032 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:51:30] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1027 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:52:05] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1041 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:52:15] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1028 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:52:25] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:52:35] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:52:43] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1045 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:52:57] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1033 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:53:25] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
[19:53:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:41] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:53:43] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1028 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:53:59] <Amir1>	 andrewbogott: I don't know if you're aware of this flood^
[19:54:00] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:12] <andrewbogott>	 I am, I'm working on it
[19:54:23] <Amir1>	 ok, I get out of the way
[19:54:40] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1044 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:44] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc maximum on cloudvirt1023 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:44] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc maximum on cloudvirt1028 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:45] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1028 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:46] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc maximum on cloudvirt1041 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:47] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1041 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:48] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:49] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc maximum on cloudvirt1044 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:50] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1044 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:51] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1045 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:52] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc maximum on cloudvirt1046 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:53] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1046 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:54] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc maximum on cloudvirt1047 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:54:55] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1047 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott rabbitmq is losing its mind after some routine reboots https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:01] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1043 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:01] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1035 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:02] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1030 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:13] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:14] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:14] <icinga-wm>	 PROBLEM - puppet last run on ms-be2031 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[19:55:20] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1036 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:35] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1033 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:51] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1037 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:52] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1034 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:52] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:55:59] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1039 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:56:20] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1023 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:56:20] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1028 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:56:37] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:56:38] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1029 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:56:41] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:56:42] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1027 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:57:07] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:57:11] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1026 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:57:17] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1038 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:57:25] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1021 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:57:27] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1028 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:59:37] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1019 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:59:55] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1017 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:59:57] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1028 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:59:58] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1020 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:03:30] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1039 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:03:51] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:05:05] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1035 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:05:39] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1030 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:05:55] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1043 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:05:55] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1031 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:06:01] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1036 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:06:03] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1025 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:06:20] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1040 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:06:21] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1021 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:06:40] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1020 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:06:41] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1029 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:06:45] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1032 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:06:46] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1027 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:07] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1022 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:09] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1026 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:11] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1037 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:11] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1045 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:15] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1038 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:23] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1033 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:25] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:30] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:30] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1021 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:33] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1034 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:07:43] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:08:15] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1033 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:08:15] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1030 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:08:23] <icinga-wm>	 PROBLEM - Widespread puppet agent failures on alert1001 is CRITICAL: 0.01115 ge 0.01 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[20:08:55] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1040 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:08:57] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1021 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:09:00] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1028 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:09:19] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1004.wikimedia.org
[20:09:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:27] <wikibugs>	 (03PS7) 10Raymond Ndibe: Modify maintain-dbusers.py to call the rest-api service [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040)
[20:09:41] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1019 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:09:45] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:09:57] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1033 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:10:00] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1024 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:10:25] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1024 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:11:03] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1046 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:13:00] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1036 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:13:31] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1046 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:13:35] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 verb={CREATE,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[20:13:40] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1036 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:14:17] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1020 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:14:20] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:15:05] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1020 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:16:30] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1023 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:16:47] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:16:50] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1032 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:17:13] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1022 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:17:17] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1045 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:17:35] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1021 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:17:35] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1017 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:17:45] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1035 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:17:47] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1030 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:17:57] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:18:03] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1045 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:18:04] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1036 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:18:21] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1033 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:18:37] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:18:45] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1039 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:19:05] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:19:23] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1029 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:19:24] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:19:51] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:19:55] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1038 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:20:03] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1041 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:20:03] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1038 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:20:13] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1034 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:20:21] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1035 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:20:23] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:20:40] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1036 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:20:57] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1033 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:21:13] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1037 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:21:14] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1034 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:21:14] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1031 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:21:20] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1039 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:21:40] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1023 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:22:00] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1032 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:22:23] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1019 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:22:27] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1037 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:22:27] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:22:57] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:23:07] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:23:47] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1043 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:23:55] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1025 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:24:35] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1027 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:24:35] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1027 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:24:57] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1026 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:25:05] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1026 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:25:20] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1028 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:26:06] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:26:45] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1028 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:26:57] <icinga-wm>	 PROBLEM - puppet last run on ms-be2033 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[20:27:45] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1029 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:27:46] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:27:57] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1017 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:28:15] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1042 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:29:11] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1040 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:29:13] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1021 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:31:17] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1023 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:31:41] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1040 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:31:43] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1044 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:31:44] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:32:31] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:32:43] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1044 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:33:10] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1024 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:33:15] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1042 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:33:47] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1046 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:33:49] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1023 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:34:15] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1021 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:34:16] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1023 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:34:35] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1029 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:35:17] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1029 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:35:18] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1024 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:35:23] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1021 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:35:23] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:35:31] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1017 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:36:13] <icinga-wm>	 RECOVERY - Widespread puppet agent failures on alert1001 is OK: (C)0.01 ge (W)0.006 ge 0.003393 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[20:44:31] <wikibugs>	 (03PS1) 10Zabe: utils: chmod +x setup_rake.sh and vcl_ec2_nets.py [puppet] - 10https://gerrit.wikimedia.org/r/810973
[20:49:23] <icinga-wm>	 RECOVERY - SSH on wtp1038.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:50:15] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Wikimedia-Mailing-lists, 10netops, 10serviceops: Support services VIPs with not marked as VIP in Netbox - https://phabricator.wikimedia.org/T295793 (10Legoktm) Any IP (mis)configuration most likely predates Amir's and my involvement with mailman, we never touc...
[21:09:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[21:14:17] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1005-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[22:07:35] <icinga-wm>	 PROBLEM - puppet last run on ms-be2036 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[22:12:47] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:27:11] <icinga-wm>	 PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:56:06] <wikibugs>	 (03PS3) 10Krinkle: Update call to deprecated IContextSource::getStats [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810890 (owner: 10Matěj Suchánek)
[22:56:29] <wikibugs>	 (03PS4) 10Krinkle: static.php: Update call to deprecated IContextSource::getStats [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810890 (owner: 10Matěj Suchánek)
[22:56:33] <wikibugs>	 (03PS5) 10Krinkle: static.php: Update call to deprecated IContextSource::getStats [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810890 (owner: 10Matěj Suchánek)
[23:28:35] <icinga-wm>	 RECOVERY - SSH on wtp1044.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook