[00:00:05] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 45, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:05:39] <icinga-wm>	 PROBLEM - Check systemd state on ms-fe2005 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:06:39] <icinga-wm>	 PROBLEM - Check systemd state on ganeti2009 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:32:27] <icinga-wm>	 RECOVERY - Check systemd state on ms-fe2005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:33:27] <icinga-wm>	 RECOVERY - Check systemd state on ganeti2009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:12:20] <wikibugs>	 10SRE, 10CommRel-Specialists-Support (Jul-Sep-2021), 10Datacenter-Switchover: CommRel support for September 2021 Switchover - https://phabricator.wikimedia.org/T287546 (10sgrabarczuk)
[03:12:51] <icinga-wm>	 PROBLEM - Check systemd state on doh5002 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:39:49] <icinga-wm>	 RECOVERY - Check systemd state on doh5002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:56:33] <icinga-wm>	 PROBLEM - SSH on analytics1069.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:08:21] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:09:55] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 235, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:35:55] <icinga-wm>	 PROBLEM - Check systemd state on cumin2001 is CRITICAL: CRITICAL - degraded: The following units failed: database-backups-snapshots.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:39:43] <wikibugs>	 (03CR) 10VolkerE: [C: 04-1] "Some notes inside." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704376 (https://phabricator.wikimedia.org/T284877) (owner: 10Juan90264)
[05:57:21] <icinga-wm>	 RECOVERY - SSH on analytics1069.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:01:20] <wikibugs>	 (03PS7) 10Juan90264: Adding and use wordmark in azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704376 (https://phabricator.wikimedia.org/T284877)
[06:02:48] <wikibugs>	 (03CR) 10Juan90264: Adding and use wordmark in azwiki (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704376 (https://phabricator.wikimedia.org/T284877) (owner: 10Juan90264)
[06:15:52] <wikibugs>	 (03CR) 10Juan90264: "I hope you review this change. I confess I'm already getting tired of these changes, I have several other changes that many reviewers simp" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704376 (https://phabricator.wikimedia.org/T284877) (owner: 10Juan90264)
[07:23:03] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:24:37] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 236, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:30:25] <icinga-wm>	 PROBLEM - Check systemd state on ping3001 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:33:17] <wikibugs>	 10SRE, 10Traffic: Let's Encrypt issuance chains update - https://phabricator.wikimedia.org/T283164 (10Legoktm)
[10:57:21] <icinga-wm>	 RECOVERY - Check systemd state on ping3001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:43:15] <icinga-wm>	 PROBLEM - SSH on bast5002 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring
[11:45:09] <icinga-wm>	 RECOVERY - SSH on bast5002 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[15:07:27] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in codfw on alert1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET
[15:09:23] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET
[16:09:17] <wikibugs>	 10SRE: Remove libvips-tools from mediawiki appservers - https://phabricator.wikimedia.org/T290802 (10Reedy)
[16:09:28] <wikibugs>	 10SRE: Remove libvips-tools from mediawiki appservers - https://phabricator.wikimedia.org/T290802 (10Reedy)
[16:09:54] <wikibugs>	 10SRE: Remove libvips-tools from mediawiki appservers - https://phabricator.wikimedia.org/T290802 (10Reedy) 05Open→03Stalled Marking stalled until usages inside MW are removed.
[17:10:47] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1096 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[17:10:58] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on an-worker1096 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T290805 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[17:11:03] <wikibugs>	 10SRE, 10ops-eqiad: Degraded RAID on an-worker1096 - https://phabricator.wikimedia.org/T290805 (10ops-monitoring-bot)
[17:17:14] <icinga-wm>	 PROBLEM - Disk space on maps2006 is CRITICAL: DISK CRITICAL - free space: / 2500 MB (3% inode=98%): /tmp 2500 MB (3% inode=98%): /var/tmp 2500 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=maps2006&var-datasource=codfw+prometheus/ops
[18:28:34] <wikibugs>	 (03PS1) 10Urbanecm: Revert "test: Add electcomm and electionadmin groups" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720265 (https://phabricator.wikimedia.org/T290808)
[18:28:39] <wikibugs>	 (03PS2) 10Urbanecm: Revert "test: Add electcomm and electionadmin groups" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720265 (https://phabricator.wikimedia.org/T290808)
[18:28:45] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "emergency" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720265 (https://phabricator.wikimedia.org/T290808) (owner: 10Urbanecm)
[18:30:03] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "test: Add electcomm and electionadmin groups" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720265 (https://phabricator.wikimedia.org/T290808) (owner: 10Urbanecm)
[18:31:43] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 908bbf35235ea4129795dfbf4c0e646440152e18: Revert "test: Add electcomm and electionadmin groups" (T290808) (duration: 00m 58s)
[18:31:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:34:59] <urbanecm>	 !log [urbanecm@mwmaint2002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki {electionadmin,electcomm} # T290808
[18:35:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:32] <icinga-wm>	 PROBLEM - Juniper alarms on mr1-eqsin is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 103.102.166.128 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm
[18:50:34] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqsin is CRITICAL: CRITICAL: No response from remote host 103.102.166.128 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[18:52:14] <icinga-wm>	 RECOVERY - Juniper alarms on mr1-eqsin is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm
[18:52:18] <icinga-wm>	 RECOVERY - Router interfaces on mr1-eqsin is OK: OK: host 103.102.166.128, interfaces up: 32, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[18:58:08] <wikibugs>	 (03PS1) 10Urbanecm: testwiki: Fully remove securepoll-related groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720454 (https://phabricator.wikimedia.org/T290808)
[18:58:22] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "emergency" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720454 (https://phabricator.wikimedia.org/T290808) (owner: 10Urbanecm)
[18:59:16] <wikibugs>	 (03Merged) 10jenkins-bot: testwiki: Fully remove securepoll-related groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720454 (https://phabricator.wikimedia.org/T290808) (owner: 10Urbanecm)
[19:02:01] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 27814b8eaacb5ba2fee1b6167a36ea14356a1ecf: testwiki: Fully remove securepoll-related groups (T290808) (duration: 00m 57s)
[19:02:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:49] <wikibugs>	 (03PS1) 10Urbanecm: Add throttle rule for Czech wiki course [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720458 (https://phabricator.wikimedia.org/T290809)
[22:39:34] <icinga-wm>	 PROBLEM - Check systemd state on ganeti3003 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:05:04] <wikibugs>	 10SRE, 10ops-eqiad, 10Analytics: Degraded RAID on an-worker1096 - https://phabricator.wikimedia.org/T290805 (10Peachey88)
[23:05:34] <icinga-wm>	 RECOVERY - Check systemd state on ganeti3003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state