[00:01:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.84, 3.87, 3.39 [00:03:56] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.31, 3.16, 3.19 [00:13:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.82, 22.39, 23.43 [00:18:32] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.57, 3.67, 3.18 [00:18:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.91, 21.65, 23.60 [00:20:28] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 3.03, 3.15, 3.03 [00:25:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.66, 20.56, 21.19 [00:27:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:28:09] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.42, 3.81, 3.24 [00:29:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.53, 22.04, 21.73 [00:30:03] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.31, 3.44, 3.19 [00:32:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:32:50] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.50, 18.29, 20.31 [00:33:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.58, 3.12, 3.14 [00:35:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.22, 22.30, 21.87 [00:37:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.42, 21.15, 21.46 [00:40:46] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.22, 4.04, 3.44 [00:42:41] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.99, 3.34, 3.24 [00:45:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 20.20, 19.29, 20.38 [00:46:07] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [00:47:26] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.74, 20.00, 20.10 [00:48:07] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 389 system event log (SEL) entries present] [00:49:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.26, 20.08, 20.58 [00:49:21] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 19.05, 19.22, 19.78 [00:51:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.59, 19.26, 20.23 [00:55:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.93, 22.72, 21.32 [00:57:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:57:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.83, 23.03, 21.62 [00:58:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.27, 4.55, 3.63 [00:59:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.72, 22.45, 21.50 [00:59:56] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.02, 3.68, 3.41 [01:00:46] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 1 backends are down. mw152 [01:01:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.49, 22.82, 21.81 [01:01:55] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.88, 21.46, 20.66 [01:01:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.01, 2.83, 3.14 [01:02:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:04:43] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [01:05:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.13, 24.04, 22.48 [01:05:32] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 1 backends are down. mw152 [01:05:46] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.62, 23.95, 21.77 [01:07:37] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.91, 21.61, 17.65 [01:07:41] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.66, 23.92, 22.03 [01:09:33] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.27, 21.31, 18.02 [01:09:37] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.58, 25.15, 22.72 [01:11:19] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [01:11:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.90, 23.00, 22.24 [01:13:28] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.23, 24.79, 22.98 [01:15:33] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 16.76, 20.14, 18.65 [01:17:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.90, 23.95, 23.56 [01:17:19] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.27, 22.88, 22.69 [01:21:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.06, 22.80, 23.09 [01:21:10] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.98, 22.32, 22.36 [01:23:05] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.14, 22.74, 22.56 [01:25:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.84, 22.32, 22.98 [01:30:16] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [01:32:17] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 391 system event log (SEL) entries present] [01:35:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.02, 17.85, 20.12 [01:42:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.34, 21.88, 21.77 [01:44:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.74, 22.12, 21.85 [01:48:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.36, 22.73, 22.00 [01:52:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.38, 23.36, 22.48 [01:57:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.19, 20.19, 19.65 [02:04:50] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 14.97, 18.00, 20.39 [02:07:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.22, 18.87, 19.70 [02:07:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.51, 22.12, 23.97 [02:16:16] PROBLEM - mw152 MediaWiki Rendering on mw152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:18:22] RECOVERY - mw152 MediaWiki Rendering on mw152 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 6.735 second response time [02:19:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.04, 19.90, 19.05 [02:21:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 26.02, 21.74, 21.93 [02:23:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.45, 22.42, 20.24 [02:25:05] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 1 backends are down. mw152 [02:26:48] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 1 backends are down. mw152 [02:27:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.36, 22.74, 21.05 [02:28:44] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [02:28:55] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [02:30:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [02:31:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.01, 21.99, 21.00 [02:31:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 13.03, 20.19, 22.14 [02:32:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:32:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 393 system event log (SEL) entries present] [02:33:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.62, 20.50, 20.59 [02:33:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.81, 22.32, 22.69 [02:35:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.22, 19.86, 20.34 [02:37:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.14, 23.32, 23.14 [02:39:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.82, 23.91, 23.38 [02:41:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.07, 22.34, 22.89 [02:47:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:49:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.62, 21.34, 22.01 [02:51:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.06, 21.07, 21.83 [03:02:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:03:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.07, 22.04, 21.79 [03:03:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.35, 2.64, 1.10 [03:05:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.22, 21.97, 21.85 [03:05:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.49, 2.51, 1.22 [03:07:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.14, 22.82, 22.17 [03:07:25] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:07:57] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.76, 3.62, 1.79 [03:08:24] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:09:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.59, 21.66, 21.81 [03:10:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.073 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:11:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.44, 3.99, 2.43 [03:12:25] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:13:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.44, 3.20, 2.32 [03:17:25] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:23:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 11.98, 17.48, 19.91 [03:25:56] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.21, 3.56, 2.77 [03:27:25] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:27:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.34, 3.09, 2.69 [03:30:42] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:35:42] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:35:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:35:53] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 26.98, 22.97, 21.30 [03:38:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.57, 3.76, 3.14 [03:39:43] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.38, 22.13, 21.43 [03:40:32] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.13, 3.80, 3.22 [03:44:24] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.09, 3.34, 3.21 [03:45:26] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.29, 23.29, 22.01 [03:45:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:47:21] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 23.14, 22.90, 21.99 [03:47:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:50:12] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.23, 4.45, 3.67 [03:52:07] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.50, 3.81, 3.51 [03:54:01] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.49, 3.12, 3.30 [03:55:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.04, 22.88, 22.16 [03:57:50] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:59:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 17.62, 21.21, 21.72 [04:02:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:05:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.00, 22.34, 21.83 [04:07:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.68, 21.99, 21.75 [04:07:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:08:33] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.85, 4.39, 3.48 [04:09:16] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:09:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.48, 22.35, 21.88 [04:11:13] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.089 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:11:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.13, 22.86, 22.17 [04:12:23] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.62, 3.03, 3.22 [04:12:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:17:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.78, 22.44, 22.08 [04:19:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.01, 21.99, 21.99 [04:21:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.85, 22.77, 22.22 [04:25:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.16, 22.94, 22.53 [04:27:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.13, 23.18, 22.66 [04:29:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.91, 22.42, 22.44 [04:35:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 26.37, 23.75, 22.90 [04:37:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.08, 22.60, 22.58 [04:39:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.53, 23.64, 22.95 [04:41:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 23.55, 23.94, 23.17 [04:47:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:55:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 27.57, 22.72, 22.19 [04:57:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.69, 22.67, 18.40 [04:58:11] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 29.25, 22.35, 16.16 [05:01:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.97, 22.62, 22.63 [05:03:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.54, 23.98, 23.14 [05:07:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.69, 22.78, 22.93 [05:11:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.65, 23.78, 22.95 [05:18:11] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 13.85, 21.97, 22.87 [05:19:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:21:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 13.16, 17.29, 20.31 [05:21:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 16.20, 18.05, 20.04 [05:22:11] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 10.70, 15.62, 20.03 [05:46:26] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [05:52:52] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o