[00:00:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:01:43] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.25, 2.95, 3.33 [00:02:38] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 20.94, 20.54, 18.02 [00:04:36] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 17.96, 19.54, 17.94 [00:05:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:10:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:11:16] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:11:43] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.66, 5.01, 3.87 [00:12:05] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.97, 23.51, 23.53 [00:13:11] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [00:13:43] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.13, 3.92, 3.60 [00:15:20] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:21:45] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.89, 3.67, 3.56 [00:23:32] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:25:20] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:25:28] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.246 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:26:05] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 13.57, 15.40, 19.62 [00:29:43] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.70, 3.50, 3.72 [00:29:52] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 16.50, 19.03, 23.87 [00:30:20] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:33:43] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.42, 4.43, 3.98 [00:39:43] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.78, 3.72, 3.92 [00:40:20] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:43:52] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.36, 16.88, 19.84 [00:45:43] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.99, 3.46, 3.68 [00:46:49] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:47:43] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.89, 3.64, 3.75 [00:49:32] RECOVERY - www.dovearchives.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'www.dovearchives.wiki' will expire on Tue 29 Oct 2024 10:49:57 PM GMT +0000. [00:49:43] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.95, 3.95, 3.84 [00:50:47] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.37, 20.86, 20.60 [00:51:01] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:53:43] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.53, 3.97, 3.93 [00:54:43] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.91, 23.21, 21.53 [00:55:43] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.72, 4.60, 4.16 [00:56:36] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:58:30] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [00:59:15] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.066 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [01:00:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.91, 23.14, 22.24 [01:01:49] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:03:43] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.34, 2.54, 3.57 [01:05:43] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.12, 1.72, 3.14 [01:22:17] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.64, 18.86, 20.13 [01:26:12] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.40, 22.17, 21.26 [01:36:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.93, 23.54, 21.99 [01:38:01] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.80, 22.83, 21.89 [01:43:55] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.50, 22.71, 21.88 [02:16:05] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.09, 21.90, 17.84 [02:17:31] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw181 [02:19:22] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.87, 19.06, 15.30 [02:19:29] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [02:19:45] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [02:20:06] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.44, 20.58, 15.56 [02:21:22] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 24.31, 21.61, 16.73 [02:21:42] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [02:23:22] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 23.98, 22.44, 17.64 [02:25:22] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.21, 22.69, 18.27 [02:26:06] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.27, 23.35, 18.72 [02:27:22] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 18.96, 21.38, 18.34 [02:27:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:28:06] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 15.09, 20.34, 18.18 [02:32:26] PROBLEM - phorge171 issue-tracker.miraheze.org HTTPS on phorge171 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.013 second response time [02:32:53] PROBLEM - phorge171 php-fpm on phorge171 is CRITICAL: PROCS CRITICAL: 0 processes with command name 'php-fpm8.2' [02:32:57] PROBLEM - phorge171 phorge-static.wikitide.net HTTPS on phorge171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 502 Bad Gateway [02:33:22] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.88, 22.53, 19.68 [02:34:26] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw181 [02:36:06] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.75, 21.66, 19.28 [02:36:23] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [02:38:17] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 27.30, 21.62, 18.11 [02:38:26] RECOVERY - phorge171 issue-tracker.miraheze.org HTTPS on phorge171 is OK: HTTP OK: HTTP/1.1 200 OK - 19644 bytes in 0.069 second response time [02:38:32] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 28.63, 22.83, 18.36 [02:38:53] RECOVERY - phorge171 php-fpm on phorge171 is OK: PROCS OK: 9 processes with command name 'php-fpm8.2' [02:38:57] RECOVERY - phorge171 phorge-static.wikitide.net HTTPS on phorge171 is OK: HTTP OK: Status line output matched "HTTP/1.1 200" - 17718 bytes in 0.036 second response time [02:39:54] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 27.39, 22.53, 18.19 [02:42:25] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:44:06] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 1 backends are down. mw181 [02:44:18] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 2 backends are down. mw181 mw182 [02:44:39] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 1 backends are down. mw181 [02:44:55] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 1 backends are down. mw181 [02:46:03] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [02:46:16] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [02:46:35] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [02:46:54] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [02:49:43] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 16.19, 22.58, 21.02 [02:49:59] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 1 backends are down. mw181 [02:51:41] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 13.83, 19.60, 20.12 [02:51:56] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [02:51:58] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 13.11, 21.59, 22.03 [02:52:06] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 12.99, 20.98, 22.39 [02:52:32] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 10.88, 19.62, 21.25 [02:53:22] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 13.72, 20.93, 23.50 [02:54:32] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 10.07, 16.42, 19.88 [02:55:53] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 10.34, 15.60, 19.50 [02:56:06] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 13.66, 16.56, 20.24 [02:59:22] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 10.56, 14.19, 19.60 [03:00:05] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 15.28, 18.47, 23.08 [03:08:05] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.87, 16.08, 19.91 [03:15:31] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:17:33] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [03:37:25] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:38:13] [02mw-config] 07The-Voidwalker pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/mw-config/compare/cd8c00be1a19...66301932f893 [03:38:14] [02mw-config] 07The-Voidwalker 036630193 - Math install no longer requires sql [03:39:11] miraheze/mw-config - The-Voidwalker the build passed. [03:39:33] !log [@mwtask181] starting deploy of {'config': True} to all [03:39:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:39:46] !log [@mwtask181] finished deploy of {'config': True} to all - SUCCESS in 13s [03:40:04] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:53:27] !log [@test151] starting deploy of {'config': True} to test151 [03:53:28] !log [@test151] finished deploy of {'config': True} to test151 - SUCCESS in 0s [03:53:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:53:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:07:30] !log [@mwtask171] starting deploy of {'config': True} to all [04:07:36] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:07:42] !log [@mwtask171] finished deploy of {'config': True} to all - SUCCESS in 12s [04:07:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:11:31] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [04:12:25] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:13:33] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [04:22:25] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:29:24] PROBLEM - ping6 on cp26 is CRITICAL: PING CRITICAL - Packet loss = 28%, RTA = 178.94 ms [04:31:24] RECOVERY - ping6 on cp26 is OK: PING OK - Packet loss = 0%, RTA = 178.87 ms [05:01:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:03:30] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:03:31] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [05:03:43] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.81, 3.42, 1.40 [05:04:19] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:05:26] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.076 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:05:30] PROBLEM - ping6 on cp26 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 179.62 ms [05:05:33] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [05:06:13] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [05:06:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:07:30] RECOVERY - ping6 on cp26 is OK: PING OK - Packet loss = 0%, RTA = 179.00 ms [05:08:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:09:50] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:11:39] PROBLEM - ping6 on cp26 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 179.59 ms [05:13:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:13:39] RECOVERY - ping6 on cp26 is OK: PING OK - Packet loss = 0%, RTA = 180.02 ms [05:13:43] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.51, 3.44, 2.70 [05:15:43] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:15:58] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.079 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:17:43] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.71, 3.37, 2.88 [05:19:52] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 16.86, 20.96, 23.95 [05:20:43] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:21:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:21:52] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.59, 22.42, 24.08 [05:23:52] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.89, 22.58, 23.93 [05:25:53] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.61, 24.06, 24.31 [05:26:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:40:46] PROBLEM - cp41 Puppet on cp41 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/home/reception] [06:04:53] RECOVERY - cp41 Puppet on cp41 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:16:05] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 29.23, 22.10, 17.68 [06:18:06] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.53, 20.64, 15.26 [06:18:32] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 28.11, 21.64, 16.26 [06:19:22] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 29.20, 21.35, 15.65 [06:21:16] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 23.21, 18.91, 14.76 [06:21:38] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 23.13, 19.88, 15.21 [06:23:35] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 17.55, 19.26, 15.55 [06:24:32] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 17.70, 22.62, 18.86 [06:27:16] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 15.51, 18.91, 16.30 [06:28:06] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.12, 23.88, 19.98 [06:28:32] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 28.36, 24.99, 20.63 [06:30:32] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 18.99, 22.69, 20.33 [06:31:22] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 19.15, 23.30, 20.66 [06:32:53] PROBLEM - phorge171 php-fpm on phorge171 is CRITICAL: PROCS CRITICAL: 0 processes with command name 'php-fpm8.2' [06:32:57] PROBLEM - phorge171 phorge-static.wikitide.net HTTPS on phorge171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 502 Bad Gateway [06:34:06] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.22, 22.28, 20.37 [06:34:26] PROBLEM - phorge171 issue-tracker.miraheze.org HTTPS on phorge171 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.013 second response time [06:34:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:34:32] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.81, 23.63, 21.17 [06:35:29] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o