[00:00:37] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 26.02, 22.95, 20.57 [00:00:53] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.46, 23.77, 21.76 [00:03:50] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:04:13] RECOVERY - cloud15 Puppet on cloud15 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [00:06:03] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:06:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 20.78, 23.81, 22.01 [00:07:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 17.77, 22.97, 22.97 [00:08:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:08:44] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 19.48, 23.07, 22.71 [00:09:19] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.48, 23.14, 23.00 [00:11:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 20.15, 23.20, 23.26 [00:12:01] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 5.802 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:13:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:14:10] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 29 minutes ago with 0 failures [00:14:37] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 17.74, 18.84, 20.24 [00:18:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:20:32] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 13.41, 16.64, 19.80 [00:21:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 18.46, 18.75, 20.37 [00:23:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:25:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 20.75, 20.31, 20.69 [00:27:40] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 15.64, 17.84, 19.90 [00:27:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 15.70, 18.90, 20.15 [00:29:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 17.09, 18.42, 20.27 [00:31:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 22.75, 20.16, 20.30 [00:32:47] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:33:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.82, 20.09, 20.58 [00:33:08] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:33:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:34:49] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 11 minutes ago with 0 failures [00:35:01] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [00:35:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 17.95, 19.35, 20.26 [00:35:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 16.69, 19.01, 19.86 [00:38:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:40:35] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:41:47] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:44:32] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 21 minutes ago with 0 failures [00:44:34] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.106 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:49:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.12, 20.30, 19.72 [00:51:40] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 18.12, 19.45, 19.47 [00:52:12] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.96, 20.84, 19.87 [00:54:06] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 19.44, 20.27, 19.79 [00:55:27] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.17, 20.28, 19.34 [00:55:40] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 25.64, 22.38, 20.64 [00:57:23] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.44, 20.85, 19.60 [00:58:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 22.90, 21.17, 18.76 [00:58:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:59:28] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 20.73, 20.80, 19.12 [00:59:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 18.93, 22.33, 21.14 [00:59:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 22.15, 22.30, 20.76 [01:01:40] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 27.09, 24.09, 21.91 [01:01:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 27.60, 24.36, 21.71 [01:02:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.94, 23.18, 20.05 [01:03:00] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:03:12] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:03:37] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.53, 18.57, 15.40 [01:05:07] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [01:05:13] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 24.05, 22.86, 20.49 [01:05:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 20.25, 23.63, 22.34 [01:05:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 20.81, 23.53, 21.98 [01:06:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.01, 21.94, 20.21 [01:07:40] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 28.56, 25.47, 23.16 [01:09:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 28.64, 25.77, 23.24 [01:10:58] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 23.75, 23.21, 21.36 [01:11:37] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.86, 19.65, 17.42 [01:12:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 28.23, 24.61, 21.85 [01:12:53] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 27.44, 24.59, 22.07 [01:13:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:14:30] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:17:16] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.094 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [01:20:34] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [01:24:51] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:25:22] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:26:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 16.74, 21.17, 22.47 [01:26:50] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [01:27:23] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 7.362 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [01:27:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.87, 23.26, 23.96 [01:28:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 18.47, 22.13, 23.44 [01:28:30] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:31:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.33, 22.40, 23.98 [01:32:56] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [01:33:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:33:40] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:34:37] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 16.22, 17.36, 20.18 [01:34:57] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 8 minutes ago with 0 failures [01:35:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 12.82, 19.55, 23.08 [01:38:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:39:41] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.065 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [01:41:40] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 13.74, 16.73, 20.06 [01:45:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 15.19, 17.65, 20.40 [01:46:10] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 15.03, 16.76, 19.77 [01:47:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 11.46, 15.91, 19.46 [01:51:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.22, 2.79, 3.90 [01:53:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.78, 20.00, 19.98 [01:55:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.72, 3.55, 3.96 [01:55:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 21.74, 20.44, 20.15 [01:56:53] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:57:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 20.38, 19.64, 19.85 [01:58:47] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.092 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:01:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.19, 3.08, 3.75 [02:01:47] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.47, 20.13, 18.99 [02:04:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.32, 21.88, 19.74 [02:04:36] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 26.87, 22.64, 20.83 [02:05:01] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.62, 2.51, 3.37 [02:05:39] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.40, 22.07, 20.06 [02:06:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 20.12, 21.10, 19.73 [02:08:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.37, 22.64, 20.42 [02:08:24] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 22.88, 23.17, 21.47 [02:10:18] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.19, 23.59, 21.80 [02:10:37] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 25.61, 21.41, 18.78 [02:11:20] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 23.94, 23.08, 20.88 [02:13:23] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.55, 24.76, 21.88 [02:13:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:13:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:13:56] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.28, 5.05, 4.04 [02:14:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 23.46, 22.69, 19.92 [02:15:49] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 1.298 second response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:15:58] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [02:17:15] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 25.33, 24.27, 22.14 [02:17:54] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 17.33, 22.82, 22.46 [02:18:19] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:18:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:20:12] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [02:21:11] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 23.11, 23.69, 22.36 [02:21:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 24.51, 23.08, 22.61 [02:26:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 23.72, 23.63, 23.10 [02:26:37] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 16.53, 19.50, 19.93 [02:27:02] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [02:27:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.31, 23.98, 23.29 [02:27:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 23.58, 23.85, 23.21 [02:30:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 27.96, 24.42, 23.40 [02:32:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 18.52, 21.86, 22.58 [02:33:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:36:57] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 13.43, 17.56, 20.09 [02:38:10] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 13.04, 17.23, 20.38 [02:38:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:39:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 14.15, 16.60, 19.99 [02:39:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 15.15, 17.19, 20.27 [02:42:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:42:30] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:44:30] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.227 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:47:25] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:47:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 17.28, 20.12, 23.27 [02:53:14] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:55:07] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [02:55:47] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.077 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:59:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.39, 22.50, 22.33 [03:01:58] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:03:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.80, 22.61, 22.39 [03:03:52] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.137 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:04:22] [Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:07:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.84, 23.66, 22.73 [03:09:22] [Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:11:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.51, 23.91, 23.15 [03:13:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.21, 3.28, 3.89 [03:16:06] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.92, 21.29, 18.87 [03:16:18] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 24.74, 20.87, 18.70 [03:16:37] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 28.08, 21.46, 17.95 [03:16:47] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 23.57, 21.49, 18.85 [03:17:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.69, 23.53, 19.60 [03:18:16] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 23.48, 22.04, 19.42 [03:19:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.67, 24.63, 23.54 [03:20:15] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 26.86, 23.79, 20.36 [03:20:42] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.48, 23.76, 20.30 [03:22:13] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.23, 23.13, 20.56 [03:22:40] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 23.85, 23.83, 20.76 [03:23:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.03, 3.34, 3.50 [03:23:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 22.44, 23.32, 21.03 [03:24:38] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 27.28, 24.93, 21.51 [03:24:53] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:25:05] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:47] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.224 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:26:59] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [03:28:34] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 20.34, 24.00, 22.04 [03:29:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 26.30, 24.03, 22.08 [03:30:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:30:05] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 26.04, 23.57, 21.69 [03:30:32] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 27.41, 25.33, 22.75 [03:33:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.67, 3.71, 3.88 [03:35:04] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.44, 4.44, 4.13 [03:37:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.19, 3.62, 3.88 [03:39:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.00, 4.48, 4.16 [03:40:21] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.62, 23.84, 23.49 [03:41:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.20, 3.69, 3.92 [03:41:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 22.68, 23.90, 23.44 [03:42:19] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 22.26, 24.03, 23.64 [03:43:53] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 22.98, 23.65, 23.37 [03:44:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 20.91, 22.98, 23.44 [03:45:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 28.58, 25.08, 23.89 [03:45:51] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 24.98, 23.82, 23.44 [03:46:15] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 22.66, 23.92, 23.71 [03:47:49] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 20.72, 23.16, 23.29 [03:49:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.54, 3.70, 3.73 [03:49:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 18.79, 22.70, 23.74 [03:49:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 16.72, 22.06, 23.06 [03:52:37] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 13.20, 15.62, 19.73 [03:53:44] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 13.98, 16.34, 20.10 [03:54:10] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 12.40, 16.01, 20.01 [03:55:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.25, 3.83, 3.90 [03:59:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 24.70, 20.48, 21.09 [04:01:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.56, 3.89, 3.77 [04:01:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 18.99, 19.18, 20.51 [04:03:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.24, 3.46, 3.65 [04:03:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 20.22, 18.98, 20.40 [04:03:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 14.85, 17.26, 19.63 [04:05:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.56, 4.13, 3.86 [04:05:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.43, 21.88, 23.83 [04:07:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.81, 3.74, 3.77 [04:09:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.52, 4.37, 3.99 [04:10:49] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:11:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.53, 23.44, 23.70 [04:13:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 21.85, 19.81, 19.43 [04:15:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 27.98, 22.79, 20.56 [04:16:24] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 26.72, 21.85, 19.84 [04:16:49] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.65, 22.40, 20.02 [04:16:54] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [04:17:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.17, 22.92, 20.83 [04:17:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 23.10, 22.69, 20.81 [04:19:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.18, 22.69, 21.02 [04:19:03] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [04:19:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.25, 23.40, 21.28 [04:21:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.77, 23.24, 21.40 [04:21:37] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 36 minutes ago with 0 failures [04:22:42] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 22.55, 23.69, 21.40 [04:23:19] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:23:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 17.51, 22.69, 21.64 [04:24:04] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 22.84, 22.52, 20.41 [04:24:17] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 19.21, 22.66, 21.42 [04:25:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.73, 23.68, 22.11 [04:28:35] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [04:29:23] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 3.556 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:29:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 17.13, 18.64, 20.15 [04:29:49] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 17.81, 19.36, 19.71 [04:30:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:30:11] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 11.81, 16.80, 19.38 [04:32:32] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 13.18, 18.36, 20.04 [04:33:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 12.13, 17.63, 20.02 [04:35:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.38, 20.43, 23.34 [04:36:38] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [04:40:53] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:42:47] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.129 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:43:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.58, 21.97, 22.78 [04:45:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.44, 20.68, 22.22 [04:51:11] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:53:05] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.077 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:57:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.52, 21.56, 21.63 [04:59:15] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:59:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.84, 21.89, 21.72 [05:01:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.73, 23.27, 22.22 [05:01:47] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:04:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.85, 18.57, 16.73 [05:04:13] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures [05:05:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:08:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.89, 21.33, 18.14 [05:09:28] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 4.057 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:10:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:10:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.87, 20.71, 18.27 [05:14:10] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 16.70, 19.57, 18.47 [05:14:36] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:32] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [05:43:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.79, 3.43, 3.99 [05:44:48] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.36, 4.84, 4.44 [05:45:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.82, 22.64, 23.93 [05:48:53] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [05:49:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.72, 3.37, 3.90 [05:51:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.18, 3.95, 4.06 [05:53:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.39, 22.00, 22.97 [05:55:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.15, 3.29, 3.75 [05:55:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.11, 21.73, 22.76 [05:57:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.66, 4.19, 4.02 [05:57:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.29, 22.46, 22.86 [05:57:58] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:59:52] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.077 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [06:01:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.37, 23.47, 23.22 [06:03:03] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.80, 3.64, 3.93 [06:03:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.73, 24.22, 23.54 [06:05:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.90, 23.95, 23.56 [06:07:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.85, 4.27, 4.10 [06:07:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.24, 24.40, 23.74 [06:09:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.19, 22.13, 18.63 [06:10:47] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 23.69, 20.63, 18.98 [06:11:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 23.36, 20.57, 18.23 [06:11:56] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.77, 21.30, 18.92 [06:13:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.12, 22.37, 19.19 [06:19:00] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 23.17, 23.46, 21.44 [06:19:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 21.57, 22.69, 20.49 [06:20:38] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 12.97, 18.60, 19.19 [06:21:25] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:23:19] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.373 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [06:23:31] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 14.46, 18.49, 19.20 [06:23:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 13.29, 18.43, 19.33 [06:24:53] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 11.20, 16.64, 19.22 [06:25:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 16.54, 21.69, 23.92 [06:29:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.31, 23.19, 23.80 [06:37:07] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:37:52] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [06:39:01] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.243 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [06:43:08] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:43:18] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:45:02] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [06:45:12] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.067 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [06:45:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.95, 22.43, 23.81 [06:49:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.89, 22.96, 23.66 [06:51:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.89, 23.06, 23.60 [06:51:56] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 28 minutes ago with 0 failures [06:53:26] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:53:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 29.11, 25.32, 24.37 [06:55:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:55:30] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [06:59:47] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:00:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:00:11] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.25, 21.14, 19.09 [07:01:41] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.074 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [07:02:10] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 17.63, 20.06, 18.96 [07:04:11] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.72, 20.74, 18.85 [07:06:06] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 17.85, 19.74, 18.71 [07:06:18] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 27.70, 22.02, 19.49 [07:08:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 28.67, 23.81, 20.70 [07:08:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 23.54, 20.43, 17.98 [07:08:58] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:10:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.38, 23.04, 20.38 [07:10:45] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 29.30, 24.94, 21.07 [07:10:52] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.079 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [07:12:37] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 27.20, 22.97, 19.51 [07:13:54] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 21.13, 23.91, 21.67 [07:14:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 19.20, 21.20, 19.28 [07:16:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 18.18, 23.13, 21.86 [07:16:37] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 16.83, 19.84, 19.03 [07:16:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 16.23, 22.70, 21.66 [07:17:46] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 16.14, 21.60, 21.21 [07:18:48] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:19:06] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:19:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 13.16, 17.89, 19.88 [07:20:36] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 16.96, 19.14, 20.38 [07:20:44] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [07:21:00] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.523 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [07:21:37] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 15.28, 18.33, 19.98 [07:24:10] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 16.59, 18.69, 20.15 [07:28:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:29:24] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:31:18] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.082 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [07:45:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.20, 3.31, 3.89 [07:47:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.12, 3.66, 3.95 [07:49:23] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:51:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.22, 3.71, 3.96 [07:51:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.12, 22.37, 23.84 [07:51:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 20.70, 19.28, 18.33 [07:53:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.988 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [07:53:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:53:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 17.75, 18.61, 18.20 [07:55:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.14, 3.91, 3.95 [07:57:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.41, 22.68, 23.22 [07:57:40] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:59:12] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [08:01:13] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 11 minutes ago with 0 failures [08:01:46] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 8.900 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [08:01:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 28.81, 22.98, 19.96 [08:02:54] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 27.79, 22.41, 19.08 [08:03:12] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.84, 22.44, 19.02 [08:03:47] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 23.05, 20.29, 18.18 [08:04:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 28.94, 24.20, 19.95 [08:05:37] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 23.83, 19.60, 15.69 [08:05:42] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 27.60, 22.46, 19.18 [08:07:37] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 27.66, 22.38, 17.18 [08:09:33] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 22.26, 23.80, 20.58 [08:09:49] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:09:51] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [08:11:52] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [08:12:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 22.68, 23.72, 21.54 [08:13:23] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 24.24, 24.60, 21.66 [08:15:18] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 19.13, 23.76, 21.77 [08:17:13] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 27.58, 25.28, 22.55 [08:17:37] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 18.06, 23.38, 20.90 [08:18:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.58, 24.44, 22.46 [08:18:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:19:59] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.069 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [08:21:37] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.96, 20.08, 20.06 [08:23:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:25:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.63, 3.34, 4.00 [08:27:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 3.70, 3.67, 4.04 [08:28:30] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:34:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 19.48, 23.19, 23.76 [08:34:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 19.34, 22.73, 23.77 [08:36:58] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:37:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 22.53, 22.67, 23.98 [08:38:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 27.29, 24.24, 23.97 [08:38:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:38:37] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 28.34, 24.52, 24.16 [08:39:12] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [08:40:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 23.08, 23.52, 23.74 [08:40:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 18.69, 22.06, 23.30 [08:43:10] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.404 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [08:43:23] PROBLEM - cloud18 Puppet on cloud18 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[ulogd2] [08:43:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:44:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.52, 23.34, 23.49 [08:46:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 17.70, 21.64, 22.88 [08:47:14] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [08:47:27] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [08:48:12] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 20.04, 21.81, 23.81 [08:48:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:49:25] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 4.327 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [08:50:37] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 17.89, 17.15, 20.03 [08:51:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.27, 23.01, 23.87 [08:51:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 27.62, 23.66, 23.41 [08:52:09] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 26.96, 23.55, 24.00 [08:52:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 24.98, 21.95, 22.43 [08:53:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.94, 23.83, 24.08 [08:53:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:57:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.86, 2.69, 3.80 [08:57:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.07, 23.28, 23.92 [08:58:37] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 26.86, 22.20, 21.23 [08:59:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.62, 24.92, 24.43 [09:01:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.50, 3.62, 3.92 [09:03:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:03:37] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 28.01, 21.72, 18.10 [09:04:48] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:06:42] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [09:07:02] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [09:07:37] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.92, 22.53, 19.31 [09:09:24] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [09:10:59] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:23] RECOVERY - cloud18 Puppet on cloud18 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:11:37] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.20, 18.97, 18.61 [09:13:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:15:01] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [09:18:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:19:15] PROBLEM - db151 Current Load on db151 is CRITICAL: LOAD CRITICAL - total load average: 68.28, 33.65, 13.87 [09:19:18] PROBLEM - db151 PowerDNS Recursor on db151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [09:19:18] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:19:38] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 4.604 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [09:20:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 16.44, 21.03, 23.13 [09:21:41] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 17.32, 20.85, 23.90 [09:23:17] PROBLEM - db151 Puppet on db151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [09:23:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 17.15, 20.17, 23.21 [09:23:55] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [09:25:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.70, 22.66, 23.99 [09:25:17] RECOVERY - db151 Puppet on db151 is OK: OK: Puppet is currently enabled, last run 39 minutes ago with 0 failures [09:25:19] RECOVERY - db151 PowerDNS Recursor on db151 is OK: DNS OK: 0.360 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [09:25:24] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [09:26:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 14.71, 19.81, 23.17 [09:27:41] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 39 minutes ago with 0 failures [09:27:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 26.98, 22.20, 23.21 [09:27:52] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.126 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [09:28:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:29:40] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 25.25, 21.70, 22.88 [09:30:37] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 25.63, 22.21, 22.18 [09:31:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.72, 24.83, 24.29 [09:31:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 23.70, 23.10, 23.36 [09:34:03] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [09:34:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 19.69, 21.82, 22.11 [09:37:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 20.75, 22.35, 22.86 [09:37:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 27.32, 23.60, 23.25 [09:38:02] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 1.283 second response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [09:38:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.39, 22.43, 22.35 [09:38:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:38:37] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 25.88, 22.13, 22.05 [09:39:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 17.15, 22.86, 23.87 [09:39:04] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [09:39:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 22.00, 22.73, 22.98 [09:40:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 23.37, 21.84, 21.92 [09:41:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.09, 23.67, 24.02 [09:41:14] PROBLEM - db151 Current Load on db151 is WARNING: LOAD WARNING - total load average: 0.62, 2.65, 10.65 [09:42:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 17.20, 21.32, 22.00 [09:43:14] RECOVERY - db151 Current Load on db151 is OK: LOAD OK - total load average: 0.68, 1.99, 9.43 [09:43:40] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 27.41, 24.28, 23.47 [09:43:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 24.69, 23.21, 23.08 [09:44:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.59, 22.75, 22.42 [09:45:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 18.77, 22.17, 23.43 [09:45:51] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [09:47:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.55, 23.35, 23.30 [09:47:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 20.48, 23.29, 23.20 [09:48:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.63, 22.15, 22.24 [09:49:15] PROBLEM - db151 Current Load on db151 is CRITICAL: LOAD CRITICAL - total load average: 44.02, 20.96, 14.20 [09:49:18] PROBLEM - db151 PowerDNS Recursor on db151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [09:51:32] PROBLEM - db151 Puppet on db151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [09:52:37] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 18.06, 18.56, 20.25 [09:53:30] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:55:40] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [09:56:00] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:57:31] RECOVERY - db151 PowerDNS Recursor on db151 is OK: DNS OK: 8.436 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [09:58:17] PROBLEM - prometheus151 conntrack_table_size on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [09:58:20] [Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:01:01] RECOVERY - prometheus151 conntrack_table_size on prometheus151 is OK: OK: nf_conntrack is 0 % full [10:01:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.06, 20.68, 23.85 [10:01:41] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.102 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [10:01:59] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 10 minutes ago with 0 failures [10:02:01] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [10:03:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 16.51, 17.66, 20.27 [10:03:20] [Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:03:40] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 15.03, 17.44, 20.05 [10:05:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.35, 22.64, 23.91 [10:05:59] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [10:06:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 24.01, 21.42, 21.01 [10:06:42] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.54, 22.42, 23.87 [10:07:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.61, 21.10, 21.07 [10:07:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.44, 21.85, 23.49 [10:07:48] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 26.77, 23.04, 22.05 [10:08:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 23.77, 20.55, 19.53 [10:09:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:09:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.61, 20.62, 20.90 [10:09:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 22.44, 20.27, 20.36 [10:09:57] PROBLEM - db151 PowerDNS Recursor on db151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [10:10:28] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:11:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.26, 23.05, 23.63 [10:11:40] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 19.96, 20.24, 20.35 [10:12:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 20.87, 22.13, 21.67 [10:13:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.45, 22.47, 23.31 [10:13:57] RECOVERY - db151 PowerDNS Recursor on db151 is OK: DNS OK: 2.115 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [10:14:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:14:03] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.260 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [10:14:10] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 24.15, 22.92, 22.02 [10:14:10] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:15:48] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 17.12, 22.30, 22.39 [10:16:37] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 24.20, 22.16, 20.53 [10:16:42] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.51, 24.42, 24.05 [10:17:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.01, 23.90, 23.65 [10:18:35] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [10:18:37] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 21.40, 21.86, 20.63 [10:19:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.86, 22.96, 23.36 [10:19:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 20.15, 21.09, 20.77 [10:21:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.44, 23.69, 23.59 [10:24:10] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.68, 23.82, 23.22 [10:24:21] PROBLEM - db151 PowerDNS Recursor on db151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [10:24:37] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 18.10, 20.00, 20.29 [10:24:42] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.45, 23.62, 23.98 [10:25:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.65, 22.03, 23.02 [10:25:40] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 19.40, 20.06, 20.38 [10:26:01] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [10:26:41] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:27:55] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.301 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [10:28:34] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [10:29:02] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.33, 21.91, 21.82 [10:29:32] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.45, 23.06, 23.14 [10:31:02] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 19.94, 21.27, 21.62 [10:33:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 17.67, 21.75, 22.72 [10:34:45] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:35:02] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 13.00, 17.70, 20.17 [10:35:13] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [10:35:48] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 8.70, 16.42, 19.70 [10:36:44] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [10:37:06] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.164 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [10:37:13] PROBLEM - cp36 Varnish Backends on cp36 is CRITICAL: 1 backends are down. mw181 [10:37:32] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 13.63, 16.10, 20.12 [10:38:10] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 18.60, 17.58, 20.04 [10:38:42] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.11, 16.47, 20.01 [10:39:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:39:13] RECOVERY - cp36 Varnish Backends on cp36 is OK: All 19 backends are healthy [10:44:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:44:55] RECOVERY - db151 PowerDNS Recursor on db151 is OK: DNS OK: 0.594 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [10:51:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [10:52:15] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:52:19] PROBLEM - db151 PowerDNS Recursor on db151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [10:53:49] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 2 backends are down. mw151 mw161 [10:54:35] PROBLEM - prometheus151 conntrack_table_size on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [10:54:37] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [10:55:13] PROBLEM - cp36 Varnish Backends on cp36 is CRITICAL: 7 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 [10:55:31] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 7 backends are down. mw151 mw161 mw162 mw171 mw172 mw181 mw182 [10:56:05] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 2 backends are down. mw161 mw182 [10:56:28] PROBLEM - cp51 HTTPS on cp51 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [10:56:54] RECOVERY - prometheus151 conntrack_table_size on prometheus151 is OK: OK: nf_conntrack is 0 % full [10:57:00] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 8 minutes ago with 0 failures [10:57:49] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [10:58:01] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [10:58:30] RECOVERY - cp51 HTTPS on cp51 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3843 bytes in 6.748 second response time [10:59:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:00:24] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [11:01:14] PROBLEM - cp27 HTTPS on cp27 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [11:01:49] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 5 backends are down. mw151 mw162 mw171 mw181 mw182 [11:01:50] PROBLEM - cp37 Varnish Backends on cp37 is CRITICAL: 3 backends are down. mw162 mw171 mw181 [11:01:52] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 2 backends are down. mw162 mw181 [11:02:07] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 3.495 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [11:02:13] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 2 backends are down. mw161 mw181 [11:03:13] RECOVERY - cp36 Varnish Backends on cp36 is OK: All 19 backends are healthy [11:03:14] RECOVERY - cp27 HTTPS on cp27 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3820 bytes in 3.808 second response time [11:03:32] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [11:03:50] RECOVERY - cp37 Varnish Backends on cp37 is OK: All 19 backends are healthy [11:04:00] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:05:49] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [11:06:38] RECOVERY - db151 PowerDNS Recursor on db151 is OK: DNS OK: 2.484 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [11:07:33] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 3 backends are down. mw161 mw181 mw182 [11:07:50] PROBLEM - cp37 Varnish Backends on cp37 is CRITICAL: 4 backends are down. mw151 mw152 mw162 mw182 [11:11:28] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [11:11:34] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [11:11:50] RECOVERY - cp37 Varnish Backends on cp37 is OK: All 19 backends are healthy [11:12:11] [02puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/da3f5925aed2...ef19e249e205 [11:12:13] [02puppet] 07Universal-Omega 03ef19e24 - Add theresnotime shell [11:12:44] [02puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/ef19e249e205...122d7a25a6bd [11:12:45] [02puppet] 07Universal-Omega 03122d7a2 - Add theresnotime icinga [11:13:52] PROBLEM - cp36 Varnish Backends on cp36 is CRITICAL: 4 backends are down. mw151 mw162 mw171 mw181 [11:14:40] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 1 backends are down. mw182 [11:14:52] PROBLEM - db151 PowerDNS Recursor on db151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [11:15:17] [02puppet] 07redbluegreenhat created branch 03tsportal-v19 - 13https://github.com/miraheze/puppet [11:15:20] [02puppet] 07redbluegreenhat pushed 031 commit to 03tsportal-v19 [+0/-0/±1] 13https://github.com/miraheze/puppet/commit/a7b0ea460f56 [11:15:21] [02puppet] 07redbluegreenhat 03a7b0ea4 - reports: Upgrade TSPortal to v19 [11:15:33] [02puppet] 07redbluegreenhat opened pull request 03#3901: reports: Upgrade TSPortal to v19 - 13https://github.com/miraheze/puppet/pull/3901 [11:15:38] [02puppet] 07coderabbitai[bot] commented on pull request 03#3901: reports: Upgrade TSPortal to v19 - 13https://github.com/miraheze/puppet/pull/3901#issuecomment-2315031024 [11:15:47] RECOVERY - cp36 Varnish Backends on cp36 is OK: All 19 backends are healthy [11:16:06] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [11:16:38] [02puppet] 07redbluegreenhat closed pull request 03#3901: reports: Upgrade TSPortal to v19 - 13https://github.com/miraheze/puppet/pull/3901 [11:16:39] [02puppet] 07redbluegreenhat pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/122d7a25a6bd...d0cda01a5449 [11:16:41] [02puppet] 07redbluegreenhat 03d0cda01 - reports: Upgrade TSPortal to v19 (#3901) [11:16:44] [02puppet] 07redbluegreenhat deleted branch 03tsportal-v19 - 13https://github.com/miraheze/puppet [11:16:45] [02puppet] 07redbluegreenhat deleted branch 03tsportal-v19 [11:17:52] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.80, 17.92, 16.22 [11:18:29] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [11:18:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [11:19:00] RECOVERY - db151 PowerDNS Recursor on db151 is OK: DNS OK: 5.722 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [11:19:48] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 20.35, 18.26, 16.51 [11:21:40] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.24, 19.80, 16.92 [11:23:40] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 13.21, 17.45, 16.42 [11:24:01] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 4 backends are down. mw152 mw172 mw181 mw182 [11:24:15] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 2 backends are down. mw162 mw181 [11:24:33] PROBLEM - cp37 Varnish Backends on cp37 is CRITICAL: 2 backends are down. mw171 mw182 [11:25:59] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [11:26:10] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [11:26:30] RECOVERY - cp37 Varnish Backends on cp37 is OK: All 19 backends are healthy [11:29:02] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.097 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [11:29:31] PROBLEM - db151 PowerDNS Recursor on db151 is CRITICAL: No response from DNS ::1 [11:31:31] RECOVERY - db151 PowerDNS Recursor on db151 is OK: DNS OK: 6.199 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [11:35:48] PROBLEM - db151 PowerDNS Recursor on db151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [11:36:20] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [11:37:10] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:38:14] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.077 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [11:39:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:39:04] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [11:42:32] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [11:44:36] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 6.203 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [11:45:14] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 2 backends are down. mw171 mw182 [11:45:39] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw152 [11:46:07] RECOVERY - db151 PowerDNS Recursor on db151 is OK: DNS OK: 8.645 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [11:47:13] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [11:47:38] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [11:53:26] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.99, 22.00, 19.00 [11:54:37] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.26, 21.97, 19.13 [11:55:24] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.53, 22.06, 19.41 [11:57:17] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.42/maintenance/run.php /srv/mediawiki/1.42/maintenance/deleteBatch.php --wiki=allthetropeswiki T12513_deleteBatch.php_listfile -r=Requested at [[phorge:T12513]] (START) [11:57:18] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.42/maintenance/run.php /srv/mediawiki/1.42/maintenance/deleteBatch.php --wiki=allthetropeswiki T12513_deleteBatch.php_listfile -r=Requested at [[phorge:T12513]] (END - exit=256) [11:57:22] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.82, 24.60, 20.66 [11:57:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:57:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:57:56] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.42/maintenance/run.php /srv/mediawiki/1.42/maintenance/deleteBatch.php --wiki=allthetropeswiki T12513_deleteBatch.php_listfile -r=[[phorge:T12513]] (START) [11:57:57] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.42/maintenance/run.php /srv/mediawiki/1.42/maintenance/deleteBatch.php --wiki=allthetropeswiki T12513_deleteBatch.php_listfile -r=[[phorge:T12513]] (END - exit=256) [11:58:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:58:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:58:22] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.42/maintenance/run.php /srv/mediawiki/1.42/maintenance/deleteBatch.php --wiki=allthetropeswiki T12513_deleteBatch.php_listfile --r=Requested at [[phorge:T12513]] (START) [11:58:24] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.42/maintenance/run.php /srv/mediawiki/1.42/maintenance/deleteBatch.php --wiki=allthetropeswiki T12513_deleteBatch.php_listfile --r=Requested at [[phorge:T12513]] (END - exit=0) [11:58:35] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.81, 23.36, 20.28 [11:58:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:58:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:00:34] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.51, 25.59, 21.47 [12:06:10] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o