[00:00:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:04:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:06:39] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.11, 4.32, 3.42 [00:07:51] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [00:09:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:09:51] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [00:11:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:16:15] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.75, 3.69, 3.69 [00:16:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:19:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:20:05] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.71, 4.58, 3.97 [00:24:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:29:41] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.12, 3.58, 3.87 [00:33:31] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.27, 4.02, 3.96 [00:33:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:35:26] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.24, 3.41, 3.73 [00:38:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:39:16] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.86, 4.53, 4.03 [00:43:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.45, 3.76, 3.87 [00:47:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.32, 3.69, 3.74 [00:47:45] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:49:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.37, 3.07, 3.51 [00:51:03] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.76, 4.44, 3.95 [00:52:45] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:53:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.47, 3.82, 3.80 [00:57:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.23, 4.11, 3.92 [00:57:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:59:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.58, 3.84, 3.86 [01:02:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:03:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.08, 3.38, 3.64 [01:05:01] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.62, 2.33, 3.22 [03:00:29] RECOVERY - db181 Backups SQL on db181 is OK: FILE_AGE OK: /var/log/sql-backup.log is 28 seconds old and 0 bytes [03:01:45] RECOVERY - db161 Backups SQL on db161 is OK: FILE_AGE OK: /var/log/sql-backup.log is 104 seconds old and 0 bytes [03:07:29] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.05, 21.35, 18.09 [03:09:25] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.25, 21.20, 17.59 [03:09:29] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.95, 24.18, 19.53 [03:09:51] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 22.38, 19.25, 15.57 [03:11:39] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:13:19] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.26, 23.46, 19.44 [03:13:36] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0007163286209 secs [03:13:52] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 24.41, 21.75, 17.38 [03:15:16] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 29.82, 25.83, 20.77 [03:15:52] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 21.42, 21.66, 17.89 [03:17:52] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 13.86, 19.21, 17.48 [03:19:11] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.26, 22.71, 20.70 [03:21:08] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.14, 19.96, 19.91 [03:21:29] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.13, 22.82, 22.28 [03:27:29] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.28, 17.76, 20.11 [03:39:29] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.76, 22.00, 20.41 [03:40:36] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.00, 19.37, 18.51 [03:41:29] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.90, 20.55, 20.10 [03:42:34] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 15.49, 17.95, 18.10 [03:43:29] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.87, 19.93, 19.91 [04:32:50] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o