[00:33:10] PROBLEM - SSH on cp5005.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [00:42:10] 10SRE, 10Analytics-Radar, 10Traffic, 10Wikimedia-General-or-Unknown, 10Performance-Team (Radar): Requests for /static get an invalid WMF-Last-Access cookie for wikipedia.org on non-Wikipedia requests - https://phabricator.wikimedia.org/T261803 (10Krinkle) [01:33:56] RECOVERY - SSH on cp5005.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:28:17] (03PS3) 10Labdajiwa: Change category name of Babel extension on Javanese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702961 (https://phabricator.wikimedia.org/T286165) [02:48:08] RECOVERY - SSH on mw1284.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:57:46] PROBLEM - SSH on logstash2021.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [03:12:54] PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS6939/IPv4: Connect - HE, AS6939/IPv6: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [03:30:14] RECOVERY - BGP status on cr2-esams is OK: BGP OK - up: 423, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [03:58:34] RECOVERY - SSH on logstash2021.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:37:32] PROBLEM - SSH on wdqs2002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:41:46] (Processor usage over 85%) firing: Processor usage over 85% - https://alerts.wikimedia.org [05:48:24] 10SRE, 10MediaWiki-extensions-Score, 10Security-Team, 10Wikimedia-General-or-Unknown, and 4 others: Extension:Score / Lilypond is disabled on all wikis - https://phabricator.wikimedia.org/T257066 (10Legoktm) My tentative plan is to re-enable Score on test.wikipedia.org on Monday and leave it available for... [05:51:46] (Processor usage over 85%) resolved: Processor usage over 85% - https://alerts.wikimedia.org [07:39:14] RECOVERY - SSH on wdqs2002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [07:47:35] 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Disable moderation mail notifications for messages sent to archived lists - https://phabricator.wikimedia.org/T286371 (10Aklapper) Thanks! :) [10:01:18] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:59:12] (03PS1) 10R4356thwiki: Disable indexing in NS_USER and NS_USER_TALK on bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703884 (https://phabricator.wikimedia.org/T286152) [13:10:46] (03CR) 10Zabe: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703884 (https://phabricator.wikimedia.org/T286152) (owner: 10R4356thwiki) [13:16:56] (03CR) 10Zabe: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703884 (https://phabricator.wikimedia.org/T286152) (owner: 10R4356thwiki) [14:27:38] PROBLEM - ps1-603-eqsin-infeed-load-tower-B-single-phase on ps1-603-eqsin is CRITICAL: SNMP CRITICAL - ps1-603-eqsin-infeed-load-tower-B-single-phase *-1* https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:27:40] PROBLEM - ps1-604-eqsin-infeed-load-tower-B-single-phase on ps1-604-eqsin is CRITICAL: SNMP CRITICAL - ps1-604-eqsin-infeed-load-tower-B-single-phase *-1* https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:30:08] PROBLEM - Host ripe-atlas-eqsin IPv6 is DOWN: CRITICAL - Destination Unreachable (2001:df2:e500:201:103:102:166:20) [14:30:16] PROBLEM - Host ripe-atlas-eqsin is DOWN: PING CRITICAL - Packet loss = 100% [14:42:47] (Juniper alarm active) firing: Juniper alarm active - https://alerts.wikimedia.org [14:45:14] PROBLEM - IPMI Sensor Status on ganeti5003 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [14:47:04] PROBLEM - IPMI Sensor Status on cp5009 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [14:49:28] PROBLEM - IPMI Sensor Status on cp5001 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [14:52:58] PROBLEM - IPMI Sensor Status on cp5003 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [14:52:58] PROBLEM - IPMI Sensor Status on ganeti5001 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [14:54:02] PROBLEM - IPMI Sensor Status on cp5004 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [14:55:20] PROBLEM - IPMI Sensor Status on cp5010 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [14:55:55] PROBLEM - IPMI Sensor Status on cp5013 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [14:57:44] PROBLEM - IPMI Sensor Status on cp5006 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [14:58:04] PROBLEM - IPMI Sensor Status on ganeti5002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:00:42] PROBLEM - IPMI Sensor Status on dns5002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:02:08] PROBLEM - IPMI Sensor Status on cp5011 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:04:02] PROBLEM - IPMI Sensor Status on cp5016 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:04:34] PROBLEM - IPMI Sensor Status on cp5005 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:05:46] PROBLEM - IPMI Sensor Status on cp5007 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:08:06] PROBLEM - IPMI Sensor Status on lvs5002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:09:10] PROBLEM - IPMI Sensor Status on cp5014 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:09:38] PROBLEM - IPMI Sensor Status on lvs5003 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:12:34] PROBLEM - IPMI Sensor Status on cp5008 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:13:28] PROBLEM - IPMI Sensor Status on cp5015 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:14:54] PROBLEM - IPMI Sensor Status on dns5001 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:15:04] PROBLEM - IPMI Sensor Status on cp5012 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:15:04] PROBLEM - IPMI Sensor Status on lvs5001 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:15:16] PROBLEM - IPMI Sensor Status on cp5002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [16:09:10] PROBLEM - SSH on logstash2021.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [17:11:42] (03CR) 10Aftab: "There is a comma missing at the end of NS_USER_TALK => 'noindex,follow' . All of the lines here (https://github.com/wikimedia/operations-m" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703884 (https://phabricator.wikimedia.org/T286152) (owner: 10R4356thwiki) [17:44:22] (03CR) 10Zoranzoki21: [C: 03+1] Change category name of Babel extension on Javanese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702961 (https://phabricator.wikimedia.org/T286165) (owner: 10Labdajiwa) [17:54:43] (03PS1) 10Ladsgroup: librenms: Migrate crons to systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/703909 (https://phabricator.wikimedia.org/T273673) [17:55:42] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/703909 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [17:57:56] (03CR) 10Ladsgroup: "PCC https://puppet-compiler.wmflabs.org/compiler1003/850/" [puppet] - 10https://gerrit.wikimedia.org/r/703909 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [18:03:51] 10SRE, 10Traffic: LetsEncrypt cert expiration warning for some ncredir names - https://phabricator.wikimedia.org/T286377 (10Vgutierrez) p:05High→03Medium non-canonical-redirect-2 has been successfully renewed on July 3rd with the exception of wikimedia.com and *.wikimedia.com. This is expected and caused b... [18:12:39] (03PS1) 10Vgutierrez: nc_redirects: Remove wikimedia.com rule [puppet] - 10https://gerrit.wikimedia.org/r/703910 (https://phabricator.wikimedia.org/T286377) [18:12:41] (03PS1) 10Vgutierrez: acme-chief: Drop wikimedia.com related SNIs [puppet] - 10https://gerrit.wikimedia.org/r/703911 (https://phabricator.wikimedia.org/T286377) [18:13:07] (03PS1) 10Ladsgroup: arclamp: Migrate crons to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/703912 (https://phabricator.wikimedia.org/T273673) [18:14:22] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/703909 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [18:14:31] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/703912 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [18:15:22] 10SRE, 10Traffic, 10Patch-For-Review: LetsEncrypt cert expiration warning for some ncredir names - https://phabricator.wikimedia.org/T286377 (10Vgutierrez) I'll merge https://gerrit.wikimedia.org/r/703910 and https://gerrit.wikimedia.org/r/703911 on Monday to properly clean-up wikimedia.com from ncredir rule... [18:16:54] (03CR) 10Ladsgroup: "PCC happy https://puppet-compiler.wmflabs.org/compiler1003/852/" [puppet] - 10https://gerrit.wikimedia.org/r/703912 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [18:28:11] (03CR) 10R4356thwiki: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703884 (https://phabricator.wikimedia.org/T286152) (owner: 10R4356thwiki) [18:42:47] (Juniper alarm active) firing: Juniper alarm active - https://alerts.wikimedia.org [19:11:48] RECOVERY - SSH on logstash2021.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:54:26] RECOVERY - ps1-604-eqsin-infeed-load-tower-B-single-phase on ps1-604-eqsin is OK: SNMP OK - ps1-604-eqsin-infeed-load-tower-B-single-phase 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:55:54] RECOVERY - IPMI Sensor Status on cp5009 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [19:56:20] RECOVERY - ps1-603-eqsin-infeed-load-tower-B-single-phase on ps1-603-eqsin is OK: SNMP OK - ps1-603-eqsin-infeed-load-tower-B-single-phase 342 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:57:47] (Juniper alarm active) resolved: Juniper alarm active - https://alerts.wikimedia.org [19:58:14] RECOVERY - IPMI Sensor Status on cp5001 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:01:52] RECOVERY - IPMI Sensor Status on cp5003 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:01:52] RECOVERY - IPMI Sensor Status on ganeti5001 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:02:54] RECOVERY - IPMI Sensor Status on cp5004 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:04:16] RECOVERY - IPMI Sensor Status on cp5010 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:04:54] RECOVERY - IPMI Sensor Status on cp5013 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:06:42] RECOVERY - IPMI Sensor Status on cp5006 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:07:00] RECOVERY - IPMI Sensor Status on ganeti5002 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:09:36] RECOVERY - IPMI Sensor Status on dns5002 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:11:00] RECOVERY - IPMI Sensor Status on cp5011 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:12:54] RECOVERY - IPMI Sensor Status on cp5016 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:13:30] RECOVERY - IPMI Sensor Status on cp5005 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:14:40] RECOVERY - IPMI Sensor Status on cp5007 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:17:02] RECOVERY - IPMI Sensor Status on lvs5002 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:18:05] RECOVERY - IPMI Sensor Status on cp5014 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:18:34] RECOVERY - IPMI Sensor Status on lvs5003 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:21:32] RECOVERY - IPMI Sensor Status on cp5008 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:22:30] RECOVERY - IPMI Sensor Status on cp5015 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:23:58] RECOVERY - IPMI Sensor Status on dns5001 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:24:10] RECOVERY - IPMI Sensor Status on lvs5001 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:24:10] RECOVERY - IPMI Sensor Status on cp5012 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:24:20] RECOVERY - IPMI Sensor Status on cp5002 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:25:06] RECOVERY - IPMI Sensor Status on ganeti5003 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [20:42:22] RECOVERY - Host ripe-atlas-eqsin IPv6 is UP: PING OK - Packet loss = 0%, RTA = 229.91 ms [20:46:52] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 73 probes of 628 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [20:47:25] RECOVERY - Host ripe-atlas-eqsin is UP: PING OK - Packet loss = 0%, RTA = 225.13 ms [20:52:44] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 43 probes of 628 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas