[00:00:16] PROBLEM - cp33 Current Load on cp33 is CRITICAL: CRITICAL - load average: 6.03, 4.04, 3.52 [00:00:32] [Grafana] !sre FIRING: The mediawiki job queue has more than 2500 unclaimed jobs https://grafana.miraheze.org/d/GtxbP1Xnk?orgId=1 [00:01:07] PROBLEM - cp23 Current Load on cp23 is CRITICAL: CRITICAL - load average: 6.78, 7.05, 4.22 [00:02:22] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 70% [00:02:26] RECOVERY - cp32 Current Load on cp32 is OK: OK - load average: 3.02, 3.08, 2.75 [00:03:40] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 35% [00:04:16] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 41% [00:06:14] PROBLEM - cp33 Current Load on cp33 is WARNING: WARNING - load average: 2.97, 3.88, 3.75 [00:06:26] PROBLEM - cp32 Current Load on cp32 is CRITICAL: CRITICAL - load average: 8.56, 5.34, 3.67 [00:07:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 45% [00:08:15] PROBLEM - cp33 Current Load on cp33 is CRITICAL: CRITICAL - load average: 5.16, 4.27, 3.89 [00:08:18] PROBLEM - es141 Current Load on es141 is WARNING: WARNING - load average: 3.53, 3.01, 2.25 [00:09:40] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 39% [00:10:19] PROBLEM - es141 Current Load on es141 is CRITICAL: CRITICAL - load average: 4.59, 3.35, 2.45 [00:12:18] PROBLEM - db142 Current Load on db142 is CRITICAL: CRITICAL - load average: 15.75, 9.99, 5.58 [00:12:19] PROBLEM - es141 Current Load on es141 is WARNING: WARNING - load average: 4.00, 3.50, 2.62 [00:12:31] PROBLEM - mw121 Current Load on mw121 is CRITICAL: CRITICAL - load average: 14.62, 12.30, 9.40 [00:12:58] PROBLEM - cp23 Current Load on cp23 is WARNING: WARNING - load average: 3.36, 3.52, 3.94 [00:13:41] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 44% [00:13:59] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 31% [00:14:30] PROBLEM - mw121 Current Load on mw121 is WARNING: WARNING - load average: 10.83, 11.84, 9.59 [00:14:56] PROBLEM - cp23 Current Load on cp23 is CRITICAL: CRITICAL - load average: 14.15, 9.05, 6.01 [00:17:43] PROBLEM - cp23 APT on cp23 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:18:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 73% [00:19:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 56% [00:19:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 72% [00:19:52] RECOVERY - cp23 APT on cp23 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [00:20:02] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 57% [00:20:17] PROBLEM - db142 Current Load on db142 is WARNING: WARNING - load average: 5.22, 7.46, 6.35 [00:20:26] RECOVERY - mw121 Current Load on mw121 is OK: OK - load average: 9.16, 10.14, 9.50 [00:21:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 40% [00:21:50] PROBLEM - en.religiononfire.mar.in.ua - reverse DNS on sslhost is WARNING: NoNameservers: All nameservers failed to answer the query en.religiononfire.mar.in.ua. IN CNAME: Server 2606:4700:4700::1111 UDP port 53 answered SERVFAIL [00:22:17] PROBLEM - db142 Current Load on db142 is CRITICAL: CRITICAL - load average: 8.24, 7.99, 6.69 [00:22:19] PROBLEM - es141 Current Load on es141 is CRITICAL: CRITICAL - load average: 4.51, 3.79, 3.14 [00:23:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 65% [00:23:54] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 36% [00:24:17] PROBLEM - db142 Current Load on db142 is WARNING: WARNING - load average: 5.12, 7.15, 6.56 [00:24:22] PROBLEM - es141 Current Load on es141 is WARNING: WARNING - load average: 3.93, 3.75, 3.20 [00:26:18] PROBLEM - db142 Current Load on db142 is CRITICAL: CRITICAL - load average: 10.16, 7.82, 6.84 [00:28:20] PROBLEM - mw121 Current Load on mw121 is CRITICAL: CRITICAL - load average: 13.88, 11.78, 10.38 [00:29:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 52% [00:29:35] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 57% [00:31:32] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 81% [00:31:39] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 39% [00:32:17] PROBLEM - mw121 Current Load on mw121 is WARNING: WARNING - load average: 10.08, 11.67, 10.75 [00:32:31] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 47% [00:33:28] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 44% [00:34:17] PROBLEM - db142 Current Load on db142 is WARNING: WARNING - load average: 5.26, 7.16, 7.08 [00:35:23] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 74% [00:35:31] PROBLEM - mw122 Current Load on mw122 is CRITICAL: CRITICAL - load average: 12.56, 11.52, 9.75 [00:36:42] PROBLEM - cp23 Current Load on cp23 is WARNING: WARNING - load average: 0.70, 1.83, 3.66 [00:37:31] PROBLEM - mw122 Current Load on mw122 is WARNING: WARNING - load average: 11.06, 11.94, 10.15 [00:37:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [00:37:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 62% [00:38:13] RECOVERY - mw121 Current Load on mw121 is OK: OK - load average: 10.17, 9.52, 10.11 [00:38:14] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 26% [00:40:17] PROBLEM - db142 Current Load on db142 is CRITICAL: CRITICAL - load average: 8.26, 7.50, 7.22 [00:42:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 43% [00:42:09] PROBLEM - mw121 Current Load on mw121 is WARNING: WARNING - load average: 10.64, 10.33, 10.32 [00:42:39] RECOVERY - cp23 Current Load on cp23 is OK: OK - load average: 0.93, 2.09, 3.31 [00:43:31] RECOVERY - mw122 Current Load on mw122 is OK: OK - load average: 7.56, 9.38, 9.60 [00:44:17] PROBLEM - db142 Current Load on db142 is WARNING: WARNING - load average: 6.88, 7.82, 7.43 [00:45:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 42% [00:45:40] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 37% [00:45:55] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 34% [00:47:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 76% [00:48:05] RECOVERY - mw121 Current Load on mw121 is OK: OK - load average: 9.77, 9.92, 10.15 [00:48:17] RECOVERY - db142 Current Load on db142 is OK: OK - load average: 4.79, 5.53, 6.55 [00:48:19] PROBLEM - es141 Current Load on es141 is CRITICAL: CRITICAL - load average: 4.06, 3.54, 3.48 [00:49:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 40% [00:50:21] PROBLEM - es141 Current Load on es141 is WARNING: WARNING - load average: 3.66, 3.56, 3.50 [00:50:40] RECOVERY - en.religiononfire.mar.in.ua - reverse DNS on sslhost is OK: SSL OK - en.religiononfire.mar.in.ua reverse DNS resolves to cp23.miraheze.org - CNAME OK [00:51:43] PROBLEM - cp32 Varnish Backends on cp32 is CRITICAL: 3 backends are down. mw121 mw131 mw132 [00:54:21] RECOVERY - es141 Current Load on es141 is OK: OK - load average: 2.89, 3.12, 3.33 [00:55:28] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 74% [00:55:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 69% [00:55:43] RECOVERY - cp32 Varnish Backends on cp32 is OK: All 14 backends are healthy [00:56:16] PROBLEM - cp33 NTP time on cp33 is WARNING: NTP WARNING: Offset 0.2506902516 secs [00:57:14] PROBLEM - db142 Current Load on db142 is WARNING: WARNING - load average: 7.53, 6.43, 6.42 [00:57:40] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 31% [00:58:16] RECOVERY - cp33 NTP time on cp33 is OK: NTP OK: Offset -0.03221806884 secs [00:58:18] PROBLEM - es141 Current Load on es141 is CRITICAL: CRITICAL - load average: 4.13, 3.93, 3.63 [00:58:24] PROBLEM - cp23 Current Load on cp23 is CRITICAL: CRITICAL - load average: 4.82, 3.32, 3.02 [00:59:16] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 34% [01:00:22] RECOVERY - cp23 Current Load on cp23 is OK: OK - load average: 2.50, 3.36, 3.11 [01:01:13] PROBLEM - db142 Current Load on db142 is CRITICAL: CRITICAL - load average: 10.41, 8.27, 7.12 [01:03:08] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 47% [01:03:12] RECOVERY - db142 Current Load on db142 is OK: OK - load average: 5.00, 6.73, 6.69 [01:03:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 40% [01:05:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 75% [01:05:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 54% [01:05:40] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 31% [01:07:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 50% [01:07:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 60% [01:08:49] PROBLEM - mw121 Current Load on mw121 is CRITICAL: CRITICAL - load average: 16.03, 11.86, 10.05 [01:09:08] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 74% [01:10:48] PROBLEM - mw121 Current Load on mw121 is WARNING: WARNING - load average: 11.54, 11.91, 10.32 [01:11:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 44% [01:12:21] PROBLEM - es141 Current Load on es141 is WARNING: WARNING - load average: 3.59, 3.77, 3.95 [01:13:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 59% [01:15:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 70% [01:15:39] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 28% [01:16:18] PROBLEM - es141 Current Load on es141 is CRITICAL: CRITICAL - load average: 4.20, 3.62, 3.82 [01:16:44] RECOVERY - mw121 Current Load on mw121 is OK: OK - load average: 8.85, 10.00, 9.96 [01:18:19] PROBLEM - es141 Current Load on es141 is WARNING: WARNING - load average: 3.98, 3.74, 3.84 [01:19:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 52% [01:20:17] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 56% [01:21:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [01:22:46] PROBLEM - uk.religiononfire.mar.in.ua - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - uk.religiononfire.mar.in.ua All nameservers failed to answer the query. [01:23:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 55% [01:24:06] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 35% [01:25:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 52% [01:25:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 65% [01:25:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [01:27:08] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 72% [01:27:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 54% [01:27:55] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 69% [01:29:50] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 34% [01:32:16] PROBLEM - cp33 NTP time on cp33 is WARNING: NTP WARNING: Offset 0.1399203837 secs [01:32:19] PROBLEM - es141 Current Load on es141 is CRITICAL: CRITICAL - load average: 4.58, 3.77, 3.68 [01:33:40] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 39% [01:35:44] PROBLEM - cp32 Varnish Backends on cp32 is CRITICAL: 5 backends are down. mw121 mw131 mw132 mw141 mw142 [01:37:31] PROBLEM - mw122 Current Load on mw122 is CRITICAL: CRITICAL - load average: 14.13, 10.41, 8.56 [01:37:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 68% [01:38:32] PROBLEM - mw122 MediaWiki Rendering on mw122 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:39:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 55% [01:39:24] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 46% [01:39:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 40% [01:40:22] PROBLEM - es141 Current Load on es141 is WARNING: WARNING - load average: 2.52, 3.47, 3.66 [01:40:28] RECOVERY - mw122 MediaWiki Rendering on mw122 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.311 second response time [01:41:18] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 62% [01:41:31] RECOVERY - mw122 Current Load on mw122 is OK: OK - load average: 5.30, 9.08, 8.61 [01:41:39] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 36% [01:41:43] RECOVERY - cp32 Varnish Backends on cp32 is OK: All 14 backends are healthy [01:43:08] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 75% [01:43:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 57% [01:44:16] RECOVERY - cp33 NTP time on cp33 is OK: NTP OK: Offset 0.06392276287 secs [01:45:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 41% [01:45:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 76% [01:47:07] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 35% [01:48:18] RECOVERY - es141 Current Load on es141 is OK: OK - load average: 2.24, 2.78, 3.29 [01:49:10] PROBLEM - cp33 Varnish Backends on cp33 is CRITICAL: 7 backends are down. mw121 mw122 mw131 mw132 mw141 mw142 mediawiki [01:50:18] PROBLEM - graylog121 Current Load on graylog121 is WARNING: WARNING - load average: 2.25, 3.27, 3.89 [01:51:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 54% [01:51:09] RECOVERY - cp33 Varnish Backends on cp33 is OK: All 14 backends are healthy [01:52:21] RECOVERY - uk.religiononfire.mar.in.ua - reverse DNS on sslhost is OK: SSL OK - uk.religiononfire.mar.in.ua reverse DNS resolves to cp22.miraheze.org - CNAME OK [01:53:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 60% [01:53:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 47% [01:53:26] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 55% [01:56:18] PROBLEM - graylog121 Current Load on graylog121 is CRITICAL: CRITICAL - load average: 5.48, 4.01, 3.98 [01:56:18] PROBLEM - es141 Current Load on es141 is CRITICAL: CRITICAL - load average: 4.66, 3.78, 3.51 [01:57:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 44% [01:57:21] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 37% [01:58:18] PROBLEM - es141 Current Load on es141 is WARNING: WARNING - load average: 2.99, 3.50, 3.45 [01:59:09] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 99% [02:00:16] PROBLEM - cp33 NTP time on cp33 is WARNING: NTP WARNING: Offset -0.1191035509 secs [02:00:19] PROBLEM - es141 Current Load on es141 is CRITICAL: CRITICAL - load average: 4.06, 3.56, 3.47 [02:01:06] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 27% [02:01:13] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 51% [02:01:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 56% [02:02:16] RECOVERY - cp33 NTP time on cp33 is OK: NTP OK: Offset -0.08380943537 secs [02:03:11] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 61% [02:03:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 84% [02:06:18] PROBLEM - es141 Current Load on es141 is WARNING: WARNING - load average: 3.27, 3.89, 3.71 [02:07:05] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 38% [02:11:13] PROBLEM - es141 PowerDNS Recursor on es141 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:12:20] PROBLEM - es141 Current Load on es141 is CRITICAL: CRITICAL - load average: 4.24, 4.08, 3.81 [02:12:43] PROBLEM - cp32 Varnish Backends on cp32 is CRITICAL: 4 backends are down. mw122 mw131 mw132 mw141 [02:12:56] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 58% [02:13:56] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 59% [02:15:03] PROBLEM - mw121 Current Load on mw121 is CRITICAL: CRITICAL - load average: 12.96, 11.14, 8.97 [02:15:15] RECOVERY - es141 PowerDNS Recursor on es141 is OK: DNS OK: 1.028 second response time. miraheze.org returns 109.228.51.216,217.174.247.33,2a00:da00:1800:326::1,2a00:da00:1800:328::1 [02:15:50] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 35% [02:16:19] PROBLEM - es141 Current Load on es141 is WARNING: WARNING - load average: 1.96, 3.41, 3.64 [02:16:33] RECOVERY - cp32 Varnish Backends on cp32 is OK: All 14 backends are healthy [02:17:02] PROBLEM - mw121 Current Load on mw121 is WARNING: WARNING - load average: 10.23, 11.12, 9.24 [02:17:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 59% [02:19:01] RECOVERY - mw121 Current Load on mw121 is OK: OK - load average: 8.86, 10.07, 9.07 [02:19:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 53% [02:19:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [02:19:39] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 46% [02:20:16] PROBLEM - cp33 NTP time on cp33 is WARNING: NTP WARNING: Offset 0.374317795 secs [02:20:19] RECOVERY - es141 Current Load on es141 is OK: OK - load average: 2.13, 2.68, 3.28 [02:20:45] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 34% [02:21:34] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 37% [02:23:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 61% [02:23:57] PROBLEM - cp23 NTP time on cp23 is WARNING: NTP WARNING: Offset 0.1121527851 secs [02:24:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 56% [02:25:23] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 60% [02:25:57] RECOVERY - cp23 NTP time on cp23 is OK: NTP OK: Offset 0.09654131532 secs [02:26:37] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 38% [02:27:17] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 57% [02:29:12] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 31% [02:30:29] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 60% [02:34:23] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 59% [02:35:57] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 43% [02:36:15] RECOVERY - cp33 NTP time on cp33 is OK: NTP OK: Offset -0.02847996354 secs [02:37:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 40% [02:39:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 59% [02:39:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [02:40:15] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 34% [02:41:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 79% [02:41:40] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 37% [02:42:43] PROBLEM - mw121 Current Load on mw121 is CRITICAL: CRITICAL - load average: 18.29, 12.73, 10.14 [02:43:09] PROBLEM - cp33 Varnish Backends on cp33 is CRITICAL: 1 backends are down. mw132 [02:43:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 48% [02:44:42] PROBLEM - mw121 Current Load on mw121 is WARNING: WARNING - load average: 10.28, 11.55, 10.03 [02:45:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 59% [02:45:09] RECOVERY - cp33 Varnish Backends on cp33 is OK: All 14 backends are healthy [02:45:30] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 46% [02:45:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 74% [02:48:14] PROBLEM - cp33 Current Load on cp33 is WARNING: WARNING - load average: 1.40, 2.42, 3.78 [02:48:40] RECOVERY - mw121 Current Load on mw121 is OK: OK - load average: 6.52, 9.33, 9.50 [02:49:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [02:49:18] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 33% [02:49:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 57% [02:49:57] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 62% [02:51:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 51% [02:51:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 68% [02:51:50] PROBLEM - en.religiononfire.mar.in.ua - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - en.religiononfire.mar.in.ua All nameservers failed to answer the query. [02:51:54] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 33% [02:52:14] PROBLEM - cp33 Current Load on cp33 is CRITICAL: CRITICAL - load average: 8.86, 5.63, 4.70 [02:53:08] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 49% [02:55:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 74% [02:55:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 73% [02:55:46] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 48% [02:57:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 56% [02:57:43] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 28% [02:59:07] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 33% [03:00:38] RECOVERY - db101 Backups SQL on db101 is OK: FILE_AGE OK: /var/log/sql-backup.log is 35 seconds old and 275 bytes [03:01:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [03:01:39] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 47% [03:03:06] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 56% [03:03:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 58% [03:05:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 82% [03:06:32] PROBLEM - cp32 HTTPS on cp32 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 8191 bytes in 0.429 second response time [03:07:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 56% [03:09:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 72% [03:09:43] PROBLEM - cp32 Varnish Backends on cp32 is CRITICAL: 7 backends are down. mw121 mw122 mw131 mw132 mw141 mw142 mediawiki [03:09:54] Jeez icinga is freaking out [03:11:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 77% [03:13:06] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 37% [03:13:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [03:13:39] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 52% [03:14:26] PROBLEM - cp32 Current Load on cp32 is WARNING: WARNING - load average: 0.66, 1.44, 3.91 [03:15:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 60% [03:17:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 62% [03:17:09] PROBLEM - cp32 Disk Space on cp32 is WARNING: DISK WARNING - free space: / 8123 MB (10% inode=98%); [03:17:39] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 48% [03:18:14] PROBLEM - cp33 Current Load on cp33 is WARNING: WARNING - load average: 1.44, 2.68, 3.89 [03:18:26] RECOVERY - cp32 Current Load on cp32 is OK: OK - load average: 0.14, 0.75, 3.06 [03:19:06] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 57% [03:19:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 55% [03:19:39] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 82% [03:20:18] PROBLEM - graylog121 Current Load on graylog121 is WARNING: WARNING - load average: 1.98, 3.17, 3.89 [03:20:40] PROBLEM - en.religiononfire.mar.in.ua - reverse DNS on sslhost is WARNING: NoNameservers: All nameservers failed to answer the query mar.in.ua. IN NS: Server 2606:4700:4700::1111 UDP port 53 answered SERVFAIL [03:21:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 65% [03:21:39] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 57% [03:22:14] RECOVERY - cp33 Current Load on cp33 is OK: OK - load average: 1.72, 1.97, 3.30 [03:23:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [03:23:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 52% [03:23:39] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 60% [03:25:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 49% [03:25:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 82% [03:25:39] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 48% [03:27:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [03:27:39] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 85% [03:27:43] RECOVERY - cp32 Varnish Backends on cp32 is OK: All 14 backends are healthy [03:29:06] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 48% [03:31:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 49% [03:31:32] PROBLEM - knowledgebase.clientmanager.co.za - reverse DNS on sslhost is WARNING: Timeout: The DNS operation timed out after 5.407189846038818 seconds [03:33:07] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 36% [03:33:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 78% [03:33:10] [02mw-config] 07Reno-Rex opened pull request 03#5191: Modifying $wgAvailableRights and $wgRestrictionLevels - 13https://github.com/miraheze/mw-config/pull/5191 [03:33:39] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 58% [03:33:43] PROBLEM - cp32 Varnish Backends on cp32 is CRITICAL: 7 backends are down. mw121 mw122 mw131 mw132 mw141 mw142 mediawiki [03:34:07] [02mw-config] 07Reno-Rex edited pull request 03#5191: Modifying $wgAvailableRights and $wgRestrictionLevels - 13https://github.com/miraheze/mw-config/pull/5191 [03:34:13] miraheze/mw-config - Reno-Rex the build passed. [03:35:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 65% [03:36:18] PROBLEM - graylog121 Current Load on graylog121 is CRITICAL: CRITICAL - load average: 4.36, 3.68, 3.64 [03:37:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 49% [03:38:18] PROBLEM - graylog121 Current Load on graylog121 is WARNING: WARNING - load average: 2.84, 3.41, 3.55 [03:39:07] RECOVERY - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is OK: OK - NGINX Error Rate is 39% [03:41:52] [02mw-config] 07Reno-Rex edited pull request 03#5191: Modifying $wgAvailableRights and $wgRestrictionLevels - 13https://github.com/miraheze/mw-config/pull/5191 [03:42:07] PROBLEM - cp23 APT on cp23 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [03:43:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 75% [03:43:08] PROBLEM - cp32 Disk Space on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:44:17] RECOVERY - cp23 APT on cp23 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [03:44:18] PROBLEM - graylog121 Current Load on graylog121 is CRITICAL: CRITICAL - load average: 4.14, 3.57, 3.55 [03:44:34] PROBLEM - cp32 APT on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:44:41] PROBLEM - cp32 conntrack_table_size on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:44:43] PROBLEM - cp32 Stunnel for mw121 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:44:54] PROBLEM - cp32 Stunnel for mw132 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:45:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 46% [03:45:18] PROBLEM - cp32 SSH on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 and port 22: Connection refused [03:45:19] PROBLEM - cp32 Stunnel for phab121 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:45:27] PROBLEM - cp32 Stunnel for test131 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:45:39] PROBLEM - cp32 PowerDNS Recursor on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:45:42] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 108.175.15.182/cpweb, 2607:f1c0:1800:8100::1/cpweb [03:45:42] PROBLEM - cp32 Stunnel for mwtask141 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:45:45] PROBLEM - cp32 Stunnel for reports121 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:45:49] PROBLEM - cp32 Stunnel for mw122 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:45:52] PROBLEM - cp32 Puppet on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:45:58] PROBLEM - cp32 Stunnel for matomo131 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:46:02] PROBLEM - cp32 Stunnel for mw131 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:46:14] PROBLEM - cp32 NTP time on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:46:14] PROBLEM - cp33 Current Load on cp33 is CRITICAL: CRITICAL - load average: 5.13, 3.78, 2.81 [03:46:15] PROBLEM - cp32 ferm_active on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:46:16] PROBLEM - cp32 Stunnel for mail121 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:46:18] PROBLEM - cp32 Stunnel for puppet141 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:46:25] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 108.175.15.182/cpweb, 2607:f1c0:1800:8100::1/cpweb [03:46:25] PROBLEM - cp32 Current Load on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:46:29] PROBLEM - cp32 Stunnel for mw141 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:46:31] PROBLEM - cp32 Stunnel for mon141 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:46:31] PROBLEM - cp32 Stunnel for mw142 on cp32 is CRITICAL: connect to address 2607:f1c0:1800:8100::1 port 5666: Connection refusedconnect to host 2607:f1c0:1800:8100::1 port 5666: Connection refused [03:47:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 71% [03:47:10] PROBLEM - cp33 Varnish Backends on cp33 is CRITICAL: 5 backends are down. mw121 mw122 mw131 mw132 mw141 [03:47:39] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 54% [03:48:14] PROBLEM - cp33 Current Load on cp33 is WARNING: WARNING - load average: 3.84, 4.00, 3.03 [03:48:18] PROBLEM - graylog121 Current Load on graylog121 is WARNING: WARNING - load average: 3.06, 3.44, 3.51 [03:49:10] RECOVERY - cp33 Varnish Backends on cp33 is OK: All 14 backends are healthy [03:49:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 60% [03:49:50] RECOVERY - en.religiononfire.mar.in.ua - reverse DNS on sslhost is OK: SSL OK - en.religiononfire.mar.in.ua reverse DNS resolves to cp22.miraheze.org - CNAME OK [03:50:15] PROBLEM - cp33 Current Load on cp33 is CRITICAL: CRITICAL - load average: 38.43, 12.96, 6.17 [03:50:18] PROBLEM - graylog121 Current Load on graylog121 is CRITICAL: CRITICAL - load average: 4.24, 3.60, 3.54 [03:51:08] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 55% [03:52:18] PROBLEM - graylog121 Current Load on graylog121 is WARNING: WARNING - load average: 3.01, 3.40, 3.48 [03:53:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 90% [03:53:10] RECOVERY - cp32 Stunnel for mw132 on cp32 is OK: TCP OK - 0.001 second response time on localhost port 8107 [03:53:11] PROBLEM - cp33 Varnish Backends on cp33 is CRITICAL: 7 backends are down. mw121 mw122 mw131 mw132 mw141 mw142 mediawiki [03:53:11] RECOVERY - cp32 conntrack_table_size on cp32 is OK: OK: nf_conntrack is 0 % full [03:53:15] RECOVERY - cp32 Stunnel for mw121 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8104 [03:53:18] RECOVERY - cp32 SSH on cp32 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u1 (protocol 2.0) [03:53:19] RECOVERY - cp32 Stunnel for phab121 on cp32 is OK: TCP OK - 0.001 second response time on localhost port 8202 [03:53:28] RECOVERY - cp32 Stunnel for test131 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8180 [03:53:40] RECOVERY - cp32 PowerDNS Recursor on cp32 is OK: DNS OK: 0.077 seconds response time. miraheze.org returns 108.175.15.182,2607:f1c0:1800:26f::1,2607:f1c0:1800:8100::1,74.208.203.152 [03:53:42] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [03:53:43] RECOVERY - cp32 Stunnel for mwtask141 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8150 [03:53:43] RECOVERY - cp32 Varnish Backends on cp32 is OK: All 14 backends are healthy [03:53:46] RECOVERY - cp32 Stunnel for reports121 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8205 [03:53:47] RECOVERY - cp32 Stunnel for mw122 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8105 [03:53:53] RECOVERY - cp32 Puppet on cp32 is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures [03:53:57] RECOVERY - cp32 Stunnel for matomo131 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8203 [03:54:02] RECOVERY - cp32 Stunnel for mw131 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8106 [03:54:15] RECOVERY - cp32 NTP time on cp32 is OK: NTP OK: Offset -0.002507358789 secs [03:54:15] RECOVERY - cp32 ferm_active on cp32 is OK: OK ferm input default policy is set [03:54:16] RECOVERY - cp32 Stunnel for mail121 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8200 [03:54:18] RECOVERY - graylog121 Current Load on graylog121 is OK: OK - load average: 2.41, 2.95, 3.30 [03:54:19] RECOVERY - cp32 Stunnel for puppet141 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8204 [03:54:25] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [03:54:26] RECOVERY - cp32 Current Load on cp32 is OK: OK - load average: 0.52, 0.19, 0.07 [03:54:28] RECOVERY - cp32 HTTPS on cp32 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 3713 bytes in 0.530 second response time [03:54:29] RECOVERY - cp32 APT on cp32 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [03:54:30] RECOVERY - cp32 Stunnel for mw141 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8108 [03:54:31] RECOVERY - cp32 Stunnel for mon141 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8201 [03:54:32] RECOVERY - cp32 Stunnel for mw142 on cp32 is OK: TCP OK - 0.000 second response time on localhost port 8109 [03:55:09] PROBLEM - cp32 Disk Space on cp32 is WARNING: DISK WARNING - free space: / 7304 MB (9% inode=98%); [03:55:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 57% [03:56:07] PROBLEM - cp33 Disk Space on cp33 is WARNING: DISK WARNING - free space: / 8129 MB (10% inode=98%); [03:57:10] RECOVERY - cp33 Varnish Backends on cp33 is OK: All 14 backends are healthy [03:57:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 81% [03:57:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 55% [03:58:18] PROBLEM - graylog121 Current Load on graylog121 is CRITICAL: CRITICAL - load average: 4.31, 3.61, 3.46 [03:59:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 54% [03:59:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 69% [04:00:32] [Grafana] !sre FIRING: The mediawiki job queue has more than 2500 unclaimed jobs https://grafana.miraheze.org/d/GtxbP1Xnk?orgId=1 [04:03:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 78% [04:03:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 55% [04:05:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 52% [04:05:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 62% [04:07:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 52% [04:09:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 64% [04:09:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 68% [04:11:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 58% [04:11:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 59% [04:13:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [04:15:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 49% [04:17:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 65% [04:19:03] PROBLEM - wiki.nj.cn.eu.org - reverse DNS on sslhost is WARNING: NoNameservers: All nameservers failed to answer the query cn.eu.org. IN NS: Server 2606:4700:4700::1111 UDP port 53 answered SERVFAIL [04:19:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 58% [04:19:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is WARNING: WARNING - NGINX Error Rate is 56% [04:19:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [04:20:53] PROBLEM - cp32 Current Load on cp32 is WARNING: WARNING - load average: 4.00, 2.96, 2.10 [04:21:07] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is CRITICAL: CRITICAL - NGINX Error Rate is 75% [04:21:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 76% [04:21:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is WARNING: WARNING - NGINX Error Rate is 59% [04:21:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 58% [04:22:48] PROBLEM - cp32 Current Load on cp32 is CRITICAL: CRITICAL - load average: 4.74, 3.10, 2.22 [04:23:07] RECOVERY - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is OK: OK - NGINX Error Rate is 39% [04:23:31] PROBLEM - cp32 HTTP 4xx/5xx ERROR Rate on cp32 is CRITICAL: CRITICAL - NGINX Error Rate is 76% [04:23:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [04:24:44] PROBLEM - cp32 Current Load on cp32 is WARNING: WARNING - load average: 3.77, 3.52, 2.49 [04:25:40] PROBLEM - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is WARNING: WARNING - NGINX Error Rate is 55% [04:25:43] PROBLEM - cp32 Varnish Backends on cp32 is CRITICAL: 7 backends are down. mw121 mw122 mw131 mw132 mw141 mw142 mediawiki [04:26:23] Hmm I can load meta but am not getting any styles loaded [04:26:40] RECOVERY - cp32 Current Load on cp32 is OK: OK - load average: 1.27, 2.67, 2.29 [04:27:07] PROBLEM - cp33 HTTP 4xx/5xx ERROR Rate on cp33 is CRITICAL: CRITICAL - NGINX Error Rate is 82% [04:29:06] PROBLEM - cp23 HTTP 4xx/5xx ERROR Rate on cp23 is WARNING: WARNING - NGINX Error Rate is 46% [04:29:40] RECOVERY - cp22 HTTP 4xx/5xx ERROR Rate on cp22 is OK: OK - NGINX Error Rate is 35% [04:29:43] RECOVERY - cp32 Varnish Backends on cp32 is OK: All 14 backends are healthy [04:31:07]