[00:01:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:01:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.21, 24.08, 23.87 [00:03:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.07, 23.94, 23.90 [00:09:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.48, 23.45, 23.53 [00:15:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.96, 22.83, 23.41 [00:21:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.56, 22.37, 22.84 [00:27:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.81, 22.45, 22.95 [00:29:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.22, 22.80, 23.00 [00:31:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.12, 22.86, 23.03 [00:32:37] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 23.78, 21.01, 19.32 [00:33:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.01, 23.03, 23.06 [00:38:24] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.56, 20.28, 19.64 [00:43:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.13, 23.15, 23.37 [00:49:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.52, 23.96, 23.52 [00:52:38] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [00:54:39] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [234 system event log (SEL) entries present] [00:55:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:15:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 16.75, 21.23, 23.56 [01:21:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.54, 22.37, 22.96 [01:21:55] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 29.05, 22.86, 19.69 [01:29:37] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 15.90, 21.43, 20.88 [01:31:33] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.25, 19.95, 20.38 [01:35:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 23.23, 22.11, 21.19 [01:39:17] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.27, 19.62, 20.37 [02:15:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:17:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.34, 20.97, 23.75 [02:21:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.42, 23.41, 24.06 [02:33:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 13.94, 17.84, 22.99 [02:41:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 12.27, 15.68, 20.03 [02:47:07] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [02:49:08] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [235 system event log (SEL) entries present] [02:51:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.78, 21.68, 20.09 [03:03:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 18.47, 23.75, 23.14 [03:19:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 12.67, 17.40, 20.21 [03:45:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 29.66, 19.80, 16.36 [03:49:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.78, 21.62, 23.92 [03:55:02] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.64, 17.95, 14.38 [03:55:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.70, 23.63, 23.75 [03:57:02] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.07, 17.22, 14.53 [04:01:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.32, 23.76, 23.87 [04:11:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.09, 22.31, 22.70 [04:12:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:13:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.34, 20.18, 21.86 [04:17:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:25:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.18, 22.96, 22.13 [04:27:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.20, 22.69, 22.11 [04:29:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.15, 24.46, 22.81 [04:35:55] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.33, 22.80, 19.40 [04:39:46] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 15.51, 21.26, 19.68 [04:39:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.85, 23.51, 23.70 [04:41:41] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 12.87, 18.45, 18.85 [04:54:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:57:42] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [04:59:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:59:42] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [237 system event log (SEL) entries present] [04:59:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.20, 21.85, 21.43 [05:01:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.71, 21.04, 21.21 [05:02:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:03:33] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.55, 2.24, 0.97 [05:05:33] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.97, 2.17, 1.08 [05:07:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:07:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.30, 22.71, 21.64 [05:09:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.82, 21.99, 21.51 [05:13:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.21, 22.90, 21.92 [05:15:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.69, 23.64, 22.34 [05:19:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 30.53, 24.88, 22.93 [05:29:42] PROBLEM - db151 Backups SQL on db151 is WARNING: FILE_AGE WARNING: /var/log/sql-backup.log is 864187 seconds old and 40860 bytes [05:35:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.91, 23.01, 23.47 [05:47:15] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 29.73, 22.89, 19.37 [05:49:11] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 16.75, 20.59, 18.97 [05:51:06] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.51, 18.89, 18.53 [05:53:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.42, 22.21, 21.72 [05:55:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.84, 22.40, 21.89 [06:05:48] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.45, 19.58, 20.38 [06:09:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.19, 22.26, 21.19 [06:17:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 14.28, 21.12, 23.64 [06:23:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.01, 23.38, 23.04 [06:25:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.70, 24.04, 23.32 [06:27:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.27, 24.84, 24.09 [06:27:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.45, 22.40, 22.79 [06:29:55] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.99, 19.92, 16.92 [06:33:52] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.08, 19.47, 17.44 [06:37:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.94, 24.86, 23.68 [06:45:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.02, 23.09, 23.78 [06:48:12] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sel_parse: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [06:50:13] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [238 system event log (SEL) entries present] [06:51:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.70, 23.85, 23.71 [06:53:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.11, 22.91, 23.35 [06:55:48] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.71, 23.54, 23.54 [07:01:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.17, 21.92, 23.02 [07:07:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 18.46, 21.34, 23.39 [07:09:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.87, 22.11, 23.37 [07:09:48] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.82, 17.74, 20.40 [07:11:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.85, 23.18, 23.65 [07:13:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.00, 24.38, 24.02 [07:15:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 19.13, 23.46, 23.81 [07:18:46] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.06, 20.86, 20.60 [07:20:45] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.97, 21.12, 20.68 [07:21:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 10.67, 14.53, 19.59 [07:22:45] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.25, 23.64, 21.65 [07:31:04] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o