[00:01:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.84, 3.87, 3.39 [00:03:56] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.31, 3.16, 3.19 [00:13:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.82, 22.39, 23.43 [00:18:32] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.57, 3.67, 3.18 [00:18:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.91, 21.65, 23.60 [00:20:28] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 3.03, 3.15, 3.03 [00:25:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.66, 20.56, 21.19 [00:27:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:28:09] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.42, 3.81, 3.24 [00:29:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.53, 22.04, 21.73 [00:30:03] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.31, 3.44, 3.19 [00:32:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:32:50] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.50, 18.29, 20.31 [00:33:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.58, 3.12, 3.14 [00:35:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.22, 22.30, 21.87 [00:37:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.42, 21.15, 21.46 [00:40:46] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.22, 4.04, 3.44 [00:42:41] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.99, 3.34, 3.24 [00:45:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 20.20, 19.29, 20.38 [00:46:07] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [00:47:26] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.74, 20.00, 20.10 [00:48:07] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 389 system event log (SEL) entries present] [00:49:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.26, 20.08, 20.58 [00:49:21] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 19.05, 19.22, 19.78 [00:51:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.59, 19.26, 20.23 [00:55:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.93, 22.72, 21.32 [00:57:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:57:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.83, 23.03, 21.62 [00:58:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.27, 4.55, 3.63 [00:59:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.72, 22.45, 21.50 [00:59:56] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.02, 3.68, 3.41 [01:00:46] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 1 backends are down. mw152 [01:01:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.49, 22.82, 21.81 [01:01:55] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.88, 21.46, 20.66 [01:01:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.01, 2.83, 3.14 [01:02:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:04:43] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [01:05:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.13, 24.04, 22.48 [01:05:32] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 1 backends are down. mw152 [01:05:46] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.62, 23.95, 21.77 [01:07:37] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.91, 21.61, 17.65 [01:07:41] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.66, 23.92, 22.03 [01:09:33] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.27, 21.31, 18.02 [01:09:37] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.58, 25.15, 22.72 [01:11:19] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [01:11:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.90, 23.00, 22.24 [01:13:28] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.23, 24.79, 22.98 [01:15:33] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 16.76, 20.14, 18.65 [01:17:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.90, 23.95, 23.56 [01:17:19] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.27, 22.88, 22.69 [01:21:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.06, 22.80, 23.09 [01:21:10] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.98, 22.32, 22.36 [01:23:05] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.14, 22.74, 22.56 [01:25:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.84, 22.32, 22.98 [01:30:16] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [01:32:17] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 391 system event log (SEL) entries present] [01:35:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.02, 17.85, 20.12 [01:42:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.34, 21.88, 21.77 [01:44:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.74, 22.12, 21.85 [01:48:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.36, 22.73, 22.00 [01:52:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.38, 23.36, 22.48 [01:57:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.19, 20.19, 19.65 [02:04:50] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 14.97, 18.00, 20.39 [02:07:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.22, 18.87, 19.70 [02:07:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.51, 22.12, 23.97 [02:16:16] PROBLEM - mw152 MediaWiki Rendering on mw152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:18:22] RECOVERY - mw152 MediaWiki Rendering on mw152 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 6.735 second response time [02:19:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.04, 19.90, 19.05 [02:21:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 26.02, 21.74, 21.93 [02:23:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.45, 22.42, 20.24 [02:25:05] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 1 backends are down. mw152 [02:26:48] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 1 backends are down. mw152 [02:27:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.36, 22.74, 21.05 [02:28:44] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [02:28:55] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [02:30:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [02:31:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.01, 21.99, 21.00 [02:31:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 13.03, 20.19, 22.14 [02:32:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:32:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 393 system event log (SEL) entries present] [02:33:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.62, 20.50, 20.59 [02:33:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.81, 22.32, 22.69 [02:35:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.22, 19.86, 20.34 [02:37:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.14, 23.32, 23.14 [02:39:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.82, 23.91, 23.38 [02:41:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.07, 22.34, 22.89 [02:47:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:49:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.62, 21.34, 22.01 [02:51:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.06, 21.07, 21.83 [03:02:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:03:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.07, 22.04, 21.79 [03:03:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.35, 2.64, 1.10 [03:05:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.22, 21.97, 21.85 [03:05:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.49, 2.51, 1.22 [03:07:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.14, 22.82, 22.17 [03:07:25] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:07:57] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.76, 3.62, 1.79 [03:08:24] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:09:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.59, 21.66, 21.81 [03:10:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.073 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:11:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.44, 3.99, 2.43 [03:12:25] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:13:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.44, 3.20, 2.32 [03:17:25] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:23:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 11.98, 17.48, 19.91 [03:25:56] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.21, 3.56, 2.77 [03:27:25] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:27:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.34, 3.09, 2.69 [03:30:42] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:35:42] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:35:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:35:53] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 26.98, 22.97, 21.30 [03:38:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.57, 3.76, 3.14 [03:39:43] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.38, 22.13, 21.43 [03:40:32] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.13, 3.80, 3.22 [03:44:24] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.09, 3.34, 3.21 [03:45:26] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.29, 23.29, 22.01 [03:45:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:47:21] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 23.14, 22.90, 21.99 [03:47:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:50:12] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.23, 4.45, 3.67 [03:52:07] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.50, 3.81, 3.51 [03:54:01] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.49, 3.12, 3.30 [03:55:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.04, 22.88, 22.16 [03:57:50] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:59:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 17.62, 21.21, 21.72 [04:02:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:05:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.00, 22.34, 21.83 [04:07:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.68, 21.99, 21.75 [04:07:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:08:33] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.85, 4.39, 3.48 [04:09:16] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:09:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.48, 22.35, 21.88 [04:11:13] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.089 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:11:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.13, 22.86, 22.17 [04:12:23] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.62, 3.03, 3.22 [04:12:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:17:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.78, 22.44, 22.08 [04:19:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.01, 21.99, 21.99 [04:21:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.85, 22.77, 22.22 [04:25:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.16, 22.94, 22.53 [04:27:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.13, 23.18, 22.66 [04:29:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.91, 22.42, 22.44 [04:35:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 26.37, 23.75, 22.90 [04:37:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.08, 22.60, 22.58 [04:39:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.53, 23.64, 22.95 [04:41:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 23.55, 23.94, 23.17 [04:47:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:55:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 27.57, 22.72, 22.19 [04:57:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.69, 22.67, 18.40 [04:58:11] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 29.25, 22.35, 16.16 [05:01:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.97, 22.62, 22.63 [05:03:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.54, 23.98, 23.14 [05:07:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.69, 22.78, 22.93 [05:11:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.65, 23.78, 22.95 [05:18:11] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 13.85, 21.97, 22.87 [05:19:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:21:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 13.16, 17.29, 20.31 [05:21:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 16.20, 18.05, 20.04 [05:22:11] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 10.70, 15.62, 20.03 [05:46:26] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [05:52:52] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [05:57:00] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.000215023756 secs [06:14:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:16:27] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:16:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.69, 20.32, 18.16 [06:18:50] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.90, 19.92, 18.29 [06:35:21] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 26.04, 21.29, 19.12 [06:37:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.63, 20.49, 19.08 [06:39:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.91, 22.14, 19.82 [06:41:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 23.56, 22.52, 20.24 [06:45:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:47:11] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [06:49:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 16.14, 19.08, 19.53 [06:55:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.74, 20.42, 19.88 [06:59:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 20.04, 20.25, 19.92 [07:02:29] PROBLEM - cloud16 Puppet on cloud16 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[ulogd2] [07:03:28] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:05:28] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.001509666443 secs [07:15:15] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [07:23:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.18, 19.39, 17.56 [07:23:31] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 28.28, 23.21, 20.39 [07:25:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.44, 23.11, 19.15 [07:25:56] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.86, 20.68, 18.34 [07:27:52] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.02, 23.40, 19.60 [07:29:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.51, 23.93, 20.51 [07:29:47] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.86, 23.02, 19.98 [07:30:29] RECOVERY - cloud16 Puppet on cloud16 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [07:33:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.50, 18.82, 19.23 [07:33:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.92, 23.57, 22.64 [07:33:38] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 15.21, 19.51, 19.27 [07:45:22] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [07:51:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.28, 21.83, 21.43 [07:53:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 18.52, 20.82, 21.14 [07:55:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:55:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 27.71, 23.09, 21.89 [08:01:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.93, 23.58, 22.72 [08:03:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.52, 24.46, 23.16 [08:11:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 18.98, 22.95, 23.53 [08:13:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.52, 23.68, 23.69 [08:14:23] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [08:15:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.66, 23.09, 23.48 [08:17:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 27.25, 24.24, 23.82 [08:17:21] PROBLEM - mw152 MediaWiki Rendering on mw152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:17:56] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 1 backends are down. mw152 [08:21:34] RECOVERY - mw152 MediaWiki Rendering on mw152 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 4.834 second response time [08:21:53] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.36, 19.77, 18.75 [08:21:56] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [08:22:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:23:49] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 13.80, 17.66, 18.12 [08:25:36] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.57, 20.02, 18.62 [08:33:22] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.90, 18.80, 18.88 [08:39:14] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.20, 22.36, 20.37 [08:43:07] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.20, 24.18, 21.44 [08:45:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.84, 22.95, 21.34 [08:47:15] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [08:53:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.16, 18.00, 19.81 [08:59:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 15.72, 19.49, 22.99 [09:03:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.62, 20.64, 22.52 [09:06:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.95, 21.18, 18.41 [09:09:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.88, 21.09, 19.67 [09:11:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.68, 23.92, 20.89 [09:11:43] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:13:48] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0007724165916 secs [09:14:11] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 22.56, 19.66, 16.12 [09:18:11] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 20.07, 19.59, 16.87 [09:18:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.92, 23.47, 21.77 [09:22:22] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.79, 19.92, 17.69 [09:24:19] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 16.64, 18.79, 17.57 [09:27:41] PROBLEM - mw152 MediaWiki Rendering on mw152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:56] RECOVERY - mw152 MediaWiki Rendering on mw152 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 6.572 second response time [09:34:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.77, 21.58, 21.31 [09:36:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.89, 21.82, 21.45 [09:40:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.26, 22.75, 21.87 [09:42:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.68, 22.03, 21.73 [09:43:07] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:43:29] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 1 backends are down. mw152 [09:43:31] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw152 [09:44:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.21, 23.38, 22.26 [09:45:06] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0006036758423 secs [09:45:24] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [09:45:31] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [09:46:12] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:46:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.37, 22.90, 22.22 [09:47:56] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 1 backends are down. mw161 [09:48:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.72, 23.78, 22.58 [09:49:52] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [09:50:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.72, 22.92, 22.43 [09:54:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.97, 24.93, 23.22 [09:58:58] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 25.69, 20.60, 16.42 [10:00:55] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 15.09, 18.37, 16.12 [10:06:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 16.52, 22.06, 23.45 [10:10:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.59, 23.84, 23.77 [10:19:44] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.77, 21.60, 18.62 [10:21:41] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 17.78, 20.04, 18.41 [10:24:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.53, 22.88, 23.89 [10:31:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 16.08, 18.94, 23.51 [10:32:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.26, 24.16, 23.75 [10:45:13] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [10:46:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 16.54, 21.40, 23.55 [10:52:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.51, 22.44, 23.10 [10:54:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.50, 22.45, 23.02 [10:58:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.67, 24.59, 23.65 [11:02:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:02:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.30, 23.86, 23.65 [11:05:34] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 1 backends are down. mw152 [11:07:29] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [11:08:50] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.51, 22.76, 23.06 [11:14:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 17.80, 22.78, 23.34 [11:15:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.28, 19.15, 20.36 [11:15:11] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.15, 19.32, 18.09 [11:15:50] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:19:07] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 16.20, 18.57, 18.18 [11:19:49] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw152 [11:21:43] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [11:22:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:24:59] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.47, 21.12, 19.37 [11:26:57] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 18.22, 20.00, 19.18 [11:27:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:28:50] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 14.98, 17.39, 20.09 [11:29:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.29, 4.03, 2.88 [11:33:56] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.90, 3.42, 2.86 [11:35:56] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.89, 4.40, 3.29 [11:37:42] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 22.66, 22.14, 20.21 [11:37:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.50, 3.62, 3.15 [11:39:56] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.60, 2.97, 2.97 [11:41:38] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 15.59, 19.19, 19.49 [11:42:35] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.09, 20.77, 19.71 [11:44:32] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.58, 21.99, 20.27 [11:45:58] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 25.39, 19.51, 15.59 [11:46:24] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [11:48:56] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.78, 4.41, 3.58 [11:49:58] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 20.00, 20.76, 17.08 [11:50:21] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.27, 23.49, 21.70 [11:50:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.56, 3.43, 3.31 [11:51:58] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 13.90, 18.23, 16.60 [11:52:46] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.26, 2.81, 3.11 [11:56:11] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.38, 22.99, 21.73 [11:57:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:58:33] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.48, 4.68, 3.73 [12:00:04] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.29, 22.14, 21.77 [12:00:29] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.95, 3.77, 3.50 [12:02:25] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.68, 4.19, 3.69 [12:02:53] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:04:47] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [12:06:14] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.79, 3.51, 3.54 [12:07:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:07:49] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.19, 18.34, 19.98 [12:08:09] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.53, 2.98, 3.33 [12:14:56] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:16:57] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 1 backends are down. mw152 [12:17:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:18:52] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [12:28:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:29:40] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.67, 4.02, 3.35 [12:31:35] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.78, 3.26, 3.16 [12:32:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.61, 21.80, 20.45 [12:33:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:34:06] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.22, 19.58, 19.79 [12:41:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:46:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:48:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:53:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:57:28] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.01, 21.31, 19.84 [12:58:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:58:43] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.38, 4.39, 3.24 [13:00:50] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 16.94, 20.79, 19.66 [13:02:50] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.59, 19.82, 19.41 [13:03:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:07:11] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.99, 19.33, 19.91 [13:08:23] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.34, 3.69, 3.80 [13:12:12] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.11, 2.64, 3.37 [13:15:45] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [13:16:03] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.30, 3.67, 3.66 [13:19:56] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.61, 3.39, 3.50 [13:21:56] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.21, 3.22, 3.45 [13:23:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.94, 4.25, 3.80 [13:28:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:29:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.43, 3.53, 3.70 [13:31:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.28, 4.21, 3.93 [13:35:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.60, 3.94, 3.91 [13:38:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:39:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.86, 4.18, 3.97 [13:41:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.96, 3.72, 3.80 [13:43:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:44:30] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [13:45:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.47, 3.81, 3.80 [13:47:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.71, 3.44, 3.64 [13:48:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:49:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 16.16, 21.56, 23.77 [13:51:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.76, 23.01, 24.01 [13:51:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.00, 3.77, 3.67 [13:53:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:57:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.25, 3.43, 3.61 [13:58:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:59:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.32, 2.82, 3.35 [14:03:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:03:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 14.65, 20.89, 23.34 [14:03:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.13, 3.77, 3.67 [14:04:22] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [14:05:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.67, 4.12, 3.79 [14:06:21] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset 0.0003278255463 secs [14:07:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.92, 3.36, 3.54 [14:08:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:09:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.77, 2.89, 3.33 [14:13:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.10, 17.14, 20.10 [14:15:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.28, 3.55, 3.51 [14:18:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:19:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.85, 20.84, 21.01 [14:19:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.39, 3.88, 3.62 [14:20:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:21:32] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [14:23:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.05, 19.04, 20.29 [14:23:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.16, 3.75, 3.67 [14:25:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:25:35] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.089 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [14:27:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.26, 21.62, 21.07 [14:29:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.06, 2.83, 3.30 [14:35:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.24, 18.93, 20.10 [14:36:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:37:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.46, 3.46, 3.27 [14:39:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.87, 3.24, 3.19 [14:41:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:41:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.78, 2.93, 3.11 [14:45:41] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 15.78, 18.50, 22.94 [14:47:41] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.56, 20.48, 23.10 [14:52:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:56:30] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.94, 4.18, 3.33 [14:57:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:58:24] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.06, 3.43, 3.14 [14:59:41] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [15:00:19] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.05, 3.22, 3.11 [15:04:10]