[00:00:32] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [00:02:37] PROBLEM - cp31 Stunnel HTTP for mw101 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:03:02] PROBLEM - mw101 MediaWiki Rendering on mw101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:03:06] PROBLEM - test101 Current Load on test101 is CRITICAL: CRITICAL - load average: 2.83, 1.93, 1.42 [00:03:25] RECOVERY - gluster121 Current Load on gluster121 is OK: OK - load average: 2.04, 3.14, 3.13 [00:03:44] PROBLEM - db101 Current Load on db101 is CRITICAL: CRITICAL - load average: 11.22, 8.22, 6.77 [00:03:48] PROBLEM - gluster111 Current Load on gluster111 is CRITICAL: CRITICAL - load average: 4.13, 3.69, 3.58 [00:03:50] PROBLEM - mw101 Puppet on mw101 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [00:04:37] RECOVERY - cp31 Stunnel HTTP for mw101 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.968 second response time [00:05:00] RECOVERY - mw101 MediaWiki Rendering on mw101 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 2.901 second response time [00:05:48] PROBLEM - gluster111 Current Load on gluster111 is WARNING: WARNING - load average: 3.27, 3.39, 3.47 [00:07:48] PROBLEM - gluster111 Current Load on gluster111 is CRITICAL: CRITICAL - load average: 4.03, 3.30, 3.41 [00:09:16] PROBLEM - cp21 Stunnel HTTP for mw111 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:09:24] PROBLEM - cp30 Stunnel HTTP for mw111 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:09:25] PROBLEM - mw112 MediaWiki Rendering on mw112 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:09:31] PROBLEM - cp21 Stunnel HTTP for mw112 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:09:46] RECOVERY - gluster101 Current Load on gluster101 is OK: OK - load average: 2.32, 2.79, 3.35 [00:09:50] PROBLEM - cp31 Stunnel HTTP for mw111 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:09:51] PROBLEM - cp31 Stunnel HTTP for mw112 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:09:52] PROBLEM - cp20 Stunnel HTTP for mw111 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:10:02] PROBLEM - mw111 MediaWiki Rendering on mw111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:10:08] PROBLEM - cp20 Stunnel HTTP for mw112 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:10:14] PROBLEM - cp31 Stunnel HTTP for mw121 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:10:34] PROBLEM - cp30 Stunnel HTTP for mw112 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:10:45] PROBLEM - mw121 MediaWiki Rendering on mw121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:11:32] PROBLEM - db101 Current Load on db101 is WARNING: WARNING - load average: 4.47, 7.17, 7.01 [00:11:36] PROBLEM - cp20 Stunnel HTTP for mw121 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:11:44] PROBLEM - cp30 Stunnel HTTP for mw121 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:11:48] RECOVERY - gluster111 Current Load on gluster111 is OK: OK - load average: 1.30, 2.44, 3.05 [00:11:57] PROBLEM - cp21 Stunnel HTTP for mw121 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:11:58] PROBLEM - cp21 Stunnel HTTP for mw102 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:12:08] PROBLEM - cp30 Stunnel HTTP for mw101 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:12:26] PROBLEM - cp21 Stunnel HTTP for mw101 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:12:33] PROBLEM - cp20 Stunnel HTTP for mw122 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:12:38] PROBLEM - cp30 Stunnel HTTP for mw102 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:12:41] PROBLEM - mw102 MediaWiki Rendering on mw102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:12:45] PROBLEM - cp20 Stunnel HTTP for mw101 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:12:53] PROBLEM - cp21 Stunnel HTTP for mw122 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:04] PROBLEM - cp31 Stunnel HTTP for mw102 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:06] PROBLEM - test101 Current Load on test101 is WARNING: WARNING - load average: 1.78, 1.99, 1.74 [00:13:10] PROBLEM - cp20 Stunnel HTTP for mw102 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:11] PROBLEM - cp31 Stunnel HTTP for mw101 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:20] PROBLEM - mw101 MediaWiki Rendering on mw101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:32] PROBLEM - cp30 Stunnel HTTP for mw122 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:36] PROBLEM - mw122 MediaWiki Rendering on mw122 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:39] PROBLEM - cp31 Stunnel HTTP for mw122 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:59] RECOVERY - cp21 Stunnel HTTP for mw102 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 5.615 second response time [00:14:26] RECOVERY - cp31 Stunnel HTTP for mw121 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 9.059 second response time [00:14:45] RECOVERY - mw102 MediaWiki Rendering on mw102 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 9.922 second response time [00:14:46] RECOVERY - cp30 Stunnel HTTP for mw102 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 7.819 second response time [00:15:06] PROBLEM - test101 Current Load on test101 is CRITICAL: CRITICAL - load average: 2.06, 2.03, 1.79 [00:15:07] RECOVERY - cp31 Stunnel HTTP for mw102 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 7.323 second response time [00:15:18] RECOVERY - cp20 Stunnel HTTP for mw102 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 8.322 second response time [00:15:32] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 4.32, 6.14, 6.67 [00:17:08] !log DELETE FROM incidents WHERE i_id='50'; duplicate report created by mistake [00:17:54] RECOVERY - cp30 Stunnel HTTP for mw111 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 8.685 second response time [00:18:04] RECOVERY - cp31 Stunnel HTTP for mw111 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 7.599 second response time [00:18:07] RECOVERY - cp20 Stunnel HTTP for mw111 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 6.043 second response time [00:18:18] RECOVERY - mw111 MediaWiki Rendering on mw111 is OK: HTTP OK: HTTP/1.1 200 OK - 22335 bytes in 4.513 second response time [00:18:21] RECOVERY - cp30 Stunnel HTTP for mw101 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 6.202 second response time [00:18:21] PROBLEM - cp21 Stunnel HTTP for mw102 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:18:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:18:40] RECOVERY - cp21 Stunnel HTTP for mw101 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 2.984 second response time [00:18:52] PROBLEM - cp31 Stunnel HTTP for mw121 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:19:01] PROBLEM - wiki.simorgh.me - reverse DNS on sslhost is WARNING: Traceback (most recent call last): File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 155, in main() File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 136, in main records = check_records(args.hostname) File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 86, in check_records cname = str(dns_resolver.resolve(hostname, 'CNAME')[0]) File "/usr/lib/py [00:19:01] st-packages/dns/resolver.py", line 1040, in resolve (nameserver, port, tcp, backoff) = resolution.next_nameserver() File "/usr/lib/python3/dist-packages/dns/resolver.py", line 598, in next_nameserver raise NoNameservers(request=self.request, errors=self.errors)dns.resolver.NoNameservers: All nameservers failed to answer the query wiki.simorgh.me. IN CNAME: Server 2606:4700:4700::1111 UDP port 53 answered SERVFAIL [00:19:03] RECOVERY - cp20 Stunnel HTTP for mw101 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.095 second response time [00:19:07] !log DELETE FROM incidents_log WHERE log_incident='50'; duplicate report created by mistake [00:19:15] PROBLEM - cp30 Stunnel HTTP for mw102 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:19:27] RECOVERY - mw101 MediaWiki Rendering on mw101 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 2.407 second response time [00:19:29] PROBLEM - cp31 Stunnel HTTP for mw102 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:19:30] RECOVERY - cp31 Stunnel HTTP for mw101 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 1.670 second response time [00:19:33] RECOVERY - cp21 Stunnel HTTP for mw111 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.484 second response time [00:20:29] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:20:44] !log DELETE FROM incidents_reviewer WHERE r_incident='50'; duplicate report created by mistake [00:21:10] PROBLEM - cp31 Current Load on cp31 is CRITICAL: CRITICAL - load average: 2.01, 1.72, 1.13 [00:22:32] RECOVERY - cp21 Stunnel HTTP for mw102 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 9.760 second response time [00:22:40] PROBLEM - mw111 MediaWiki Rendering on mw111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:22:44] PROBLEM - cp30 Stunnel HTTP for mw101 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:22:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:23:01] PROBLEM - cp21 Stunnel HTTP for mw101 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:08] RECOVERY - cp31 Current Load on cp31 is OK: OK - load average: 0.71, 1.32, 1.05 [00:23:31] RECOVERY - cp30 Stunnel HTTP for mw102 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 5.969 second response time [00:23:31] PROBLEM - cp20 Stunnel HTTP for mw101 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:34] RECOVERY - cp31 Stunnel HTTP for mw102 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 4.977 second response time [00:23:37] PROBLEM - mw101 MediaWiki Rendering on mw101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:45] PROBLEM - cp21 Stunnel HTTP for mw111 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:57] PROBLEM - cp31 Stunnel HTTP for mw101 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:24:07] PROBLEM - cp30 Stunnel HTTP for mw111 on cp30 is CRITICAL: HTTP CRITICAL - No data received from host [00:24:20] PROBLEM - cp31 Stunnel HTTP for mw111 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:24:27] PROBLEM - cp20 Stunnel HTTP for mw111 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:24:29] PROBLEM - cp30 Current Load on cp30 is CRITICAL: CRITICAL - load average: 1.74, 2.67, 1.79 [00:25:33] PROBLEM - gluster101 Current Load on gluster101 is CRITICAL: CRITICAL - load average: 4.72, 3.70, 3.29 [00:25:42] RECOVERY - cp21 Stunnel HTTP for mw111 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.019 second response time [00:25:44] PROBLEM - dnd.bellinrattin.it - reverse DNS on sslhost is WARNING: Traceback (most recent call last): File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 155, in main() File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 136, in main records = check_records(args.hostname) File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 73, in check_records nameserversans = dns_resolver.resolve(root_domain, 'NS') File "/usr/l [00:25:44] n3/dist-packages/dns/resolver.py", line 1040, in resolve (nameserver, port, tcp, backoff) = resolution.next_nameserver() File "/usr/lib/python3/dist-packages/dns/resolver.py", line 598, in next_nameserver raise NoNameservers(request=self.request, errors=self.errors)dns.resolver.NoNameservers: All nameservers failed to answer the query bellinrattin.it. IN NS: Server 2606:4700:4700::1111 UDP port 53 answered SERVFAIL [00:25:55] ugh, this is the same sort of problem we were having last night, dont think there is anything I can do about it [00:26:05] RECOVERY - cp30 Stunnel HTTP for mw111 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.475 second response time [00:26:15] RECOVERY - cp31 Stunnel HTTP for mw111 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.346 second response time [00:26:21] RECOVERY - cp20 Stunnel HTTP for mw111 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.019 second response time [00:26:28] PROBLEM - cp30 Current Load on cp30 is WARNING: WARNING - load average: 0.64, 1.98, 1.65 [00:26:41] RECOVERY - mw111 MediaWiki Rendering on mw111 is OK: HTTP OK: HTTP/1.1 200 OK - 22335 bytes in 1.133 second response time [00:27:31] PROBLEM - gluster101 Current Load on gluster101 is WARNING: WARNING - load average: 3.05, 3.51, 3.27 [00:27:44] PROBLEM - cp30 Varnish Backends on cp30 is CRITICAL: 2 backends are down. mw102 mw111 [00:28:27] RECOVERY - cp30 Current Load on cp30 is OK: OK - load average: 0.99, 1.63, 1.56 [00:29:39] RECOVERY - cp30 Varnish Backends on cp30 is OK: All 12 backends are healthy [00:29:42] RECOVERY - mw101 MediaWiki Rendering on mw101 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 1.066 second response time [00:29:48] RECOVERY - cp20 Stunnel HTTP for mw101 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.045 second response time [00:30:05] RECOVERY - cp31 Stunnel HTTP for mw101 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.318 second response time [00:30:51] RECOVERY - cp30 Stunnel HTTP for mw101 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.313 second response time [00:31:10] PROBLEM - mw102 MediaWiki Rendering on mw102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:12] PROBLEM - cp20 Varnish Backends on cp20 is CRITICAL: 2 backends are down. mw101 mw112 [00:31:18] PROBLEM - cp20 Stunnel HTTP for mw102 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:20] RECOVERY - cp21 Stunnel HTTP for mw101 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 1.560 second response time [00:31:36] PROBLEM - cp21 Varnish Backends on cp21 is CRITICAL: 2 backends are down. mw101 mw121 [00:31:50] RECOVERY - mw101 Puppet on mw101 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:32:09] RECOVERY - cp21 Stunnel HTTP for mw112 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.018 second response time [00:32:25] RECOVERY - cp30 Stunnel HTTP for mw121 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 6.967 second response time [00:32:31] RECOVERY - cp20 Stunnel HTTP for mw121 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 7.081 second response time [00:32:45] RECOVERY - cp31 Stunnel HTTP for mw112 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 8.920 second response time [00:33:02] !log restart php7.4-fpm and nginx on mw* [00:33:25] RECOVERY - cp21 Stunnel HTTP for mw121 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 4.245 second response time [00:33:25] RECOVERY - mw121 MediaWiki Rendering on mw121 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 4.701 second response time [00:33:26] RECOVERY - gluster101 Current Load on gluster101 is OK: OK - load average: 2.75, 3.22, 3.22 [00:33:28] RECOVERY - cp21 Stunnel HTTP for mw122 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.021 second response time [00:33:31] RECOVERY - cp20 Stunnel HTTP for mw122 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.015 second response time [00:34:07] RECOVERY - cp30 Stunnel HTTP for mw122 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 3.876 second response time [00:34:22] RECOVERY - mw122 MediaWiki Rendering on mw122 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 6.864 second response time [00:34:29] PROBLEM - cp31 Stunnel HTTP for mw111 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:34:34] RECOVERY - cp31 Stunnel HTTP for mw122 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 8.079 second response time [00:34:39] PROBLEM - cp20 Stunnel HTTP for mw111 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:34:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:35:06] PROBLEM - cp31 Varnish Backends on cp31 is CRITICAL: 1 backends are down. mw102 [00:35:08] RECOVERY - mw102 MediaWiki Rendering on mw102 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 1.065 second response time [00:35:20] ._. [00:35:27] RECOVERY - cp20 Stunnel HTTP for mw102 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.016 second response time [00:35:59] PROBLEM - mw101 MediaWiki Rendering on mw101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:36:05] PROBLEM - cp20 Stunnel HTTP for mw101 on cp20 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.011 second response time [00:36:21] PROBLEM - cp31 Stunnel HTTP for mw101 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:36:26] RECOVERY - cp31 Stunnel HTTP for mw111 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 2.575 second response time [00:36:28] PROBLEM - cp21 Stunnel HTTP for mw112 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:36:36] RECOVERY - cp20 Stunnel HTTP for mw111 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 3.161 second response time [00:37:01] i suspect this will resolve itself eventually, but the logs don't show anything helpful, meaning i cant determine why this is happening [00:37:05] PROBLEM - test101 Current Load on test101 is WARNING: WARNING - load average: 1.86, 1.99, 1.99 [00:37:06] PROBLEM - cp31 Stunnel HTTP for mw112 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:37:07] PROBLEM - cp30 Stunnel HTTP for mw101 on cp30 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 1.518 second response time [00:37:28] RECOVERY - cp30 Stunnel HTTP for mw112 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 9.652 second response time [00:37:30] RECOVERY - cp20 Stunnel HTTP for mw112 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 9.848 second response time [00:37:32] PROBLEM - db101 Current Load on db101 is CRITICAL: CRITICAL - load average: 9.36, 7.41, 6.61 [00:37:47] PROBLEM - cp21 Stunnel HTTP for mw101 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:37:48] PROBLEM - cp21 Stunnel HTTP for mw122 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:37:58] PROBLEM - cp20 Stunnel HTTP for mw122 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:02] RECOVERY - mw101 MediaWiki Rendering on mw101 is OK: HTTP OK: HTTP/1.1 200 OK - 22335 bytes in 9.101 second response time [00:38:09] RECOVERY - cp20 Stunnel HTTP for mw101 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 5.103 second response time [00:38:19] PROBLEM - cp30 Stunnel HTTP for mw122 on cp30 is CRITICAL: HTTP CRITICAL - No data received from host [00:38:21] RECOVERY - cp31 Stunnel HTTP for mw101 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.987 second response time [00:38:30] RECOVERY - cp21 Stunnel HTTP for mw112 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 7.686 second response time [00:38:37] !sre we are having the same issue that we had last night [00:38:46] PROBLEM - mw122 MediaWiki Rendering on mw122 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:39:01] PROBLEM - cp31 Stunnel HTTP for mw122 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:39:05] RECOVERY - cp30 Stunnel HTTP for mw101 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 2.082 second response time [00:39:05] PROBLEM - test101 Current Load on test101 is CRITICAL: CRITICAL - load average: 2.15, 2.03, 2.01 [00:39:06] RECOVERY - cp31 Stunnel HTTP for mw112 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 5.399 second response time [00:39:10] RECOVERY - mw112 MediaWiki Rendering on mw112 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 3.672 second response time [00:39:32] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 4.99, 6.46, 6.36 [00:39:45] RECOVERY - cp21 Stunnel HTTP for mw101 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.376 second response time [00:40:25] PROBLEM - cp21 Stunnel HTTP for mw102 on cp21 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.014 second response time [00:40:47] PROBLEM - cp31 Stunnel HTTP for mw102 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:40:50] PROBLEM - gluster121 Current Load on gluster121 is WARNING: WARNING - load average: 3.66, 3.21, 2.90 [00:40:51] PROBLEM - mw121 MediaWiki Rendering on mw121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:01] PROBLEM - cp30 Stunnel HTTP for mw121 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:06] RECOVERY - cp31 Varnish Backends on cp31 is OK: All 12 backends are healthy [00:41:08] PROBLEM - cp21 Stunnel HTTP for mw121 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:08] PROBLEM - cp20 Stunnel HTTP for mw121 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:16] PROBLEM - gluster101 Current Load on gluster101 is CRITICAL: CRITICAL - load average: 4.13, 3.39, 3.21 [00:41:20] PROBLEM - mw102 MediaWiki Rendering on mw102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:25] PROBLEM - cp30 Stunnel HTTP for mw102 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:46] PROBLEM - cp20 Stunnel HTTP for mw102 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:49] PROBLEM - cp30 Stunnel HTTP for mw111 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:50] PROBLEM - gluster111 Current Load on gluster111 is CRITICAL: CRITICAL - load average: 4.03, 3.46, 3.07 [00:42:13] PROBLEM - mw111 MediaWiki Rendering on mw111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:42:17] PROBLEM - cp21 Stunnel HTTP for mw111 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:42:43] PROBLEM - cp31 Stunnel HTTP for mw111 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:42:50] RECOVERY - gluster121 Current Load on gluster121 is OK: OK - load average: 3.17, 3.06, 2.87 [00:42:51] PROBLEM - cp20 Stunnel HTTP for mw111 on cp20 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.010 second response time [00:43:14] RECOVERY - gluster101 Current Load on gluster101 is OK: OK - load average: 3.12, 3.25, 3.18 [00:43:18] PROBLEM - cp30 Stunnel HTTP for mw101 on cp30 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.561 second response time [00:43:19] RECOVERY - mw102 MediaWiki Rendering on mw102 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 5.695 second response time [00:43:25] RECOVERY - cp30 Stunnel HTTP for mw102 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.334 second response time [00:43:47] RECOVERY - cp20 Stunnel HTTP for mw102 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 1.137 second response time [00:43:48] RECOVERY - gluster111 Current Load on gluster111 is OK: OK - load average: 2.61, 3.05, 2.96 [00:43:58] PROBLEM - cp21 Stunnel HTTP for mw101 on cp21 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.010 second response time [00:44:10] RECOVERY - cp31 Stunnel HTTP for mw121 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 5.041 second response time [00:44:19] PROBLEM - mw101 MediaWiki Rendering on mw101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:44:26] RECOVERY - cp21 Stunnel HTTP for mw102 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.019 second response time [00:44:36] PROBLEM - cp20 Stunnel HTTP for mw101 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:44:38] PROBLEM - cp31 Stunnel HTTP for mw101 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:44:46] PROBLEM - cp21 Stunnel HTTP for mw112 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:44:48] RECOVERY - cp31 Stunnel HTTP for mw102 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.327 second response time [00:45:22] PROBLEM - cp31 Stunnel HTTP for mw112 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:45:38] PROBLEM - mw112 MediaWiki Rendering on mw112 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:45:51] PROBLEM - cp30 Stunnel HTTP for mw112 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:45:53] PROBLEM - cp30 Varnish Backends on cp30 is CRITICAL: 1 backends are down. mw102 [00:46:02] PROBLEM - cp20 Stunnel HTTP for mw112 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:47:48] RECOVERY - cp30 Varnish Backends on cp30 is OK: All 12 backends are healthy [00:48:26] PROBLEM - cp31 Stunnel HTTP for mw121 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:48:48] RECOVERY - cp20 Stunnel HTTP for mw101 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 3.445 second response time [00:48:48] PROBLEM - cp21 Stunnel HTTP for mw102 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:48:49] RECOVERY - cp31 Stunnel HTTP for mw101 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 2.955 second response time [00:48:59] RECOVERY - wiki.simorgh.me - reverse DNS on sslhost is OK: SSL OK - wiki.simorgh.me reverse DNS resolves to cp21.miraheze.org - CNAME OK [00:49:17] RECOVERY - mw122 MediaWiki Rendering on mw122 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 9.261 second response time [00:49:31] RECOVERY - cp31 Stunnel HTTP for mw122 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 6.479 second response time [00:49:34] PROBLEM - mw102 MediaWiki Rendering on mw102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:50:02] RECOVERY - cp21 Stunnel HTTP for mw122 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 6.702 second response time [00:50:17] RECOVERY - cp21 Stunnel HTTP for mw101 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 8.609 second response time [00:50:30] RECOVERY - mw101 MediaWiki Rendering on mw101 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 7.404 second response time [00:50:32] RECOVERY - cp20 Stunnel HTTP for mw122 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.971 second response time [00:50:40] RECOVERY - cp30 Stunnel HTTP for mw122 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.423 second response time [00:50:49] PROBLEM - cp31 Current Load on cp31 is CRITICAL: CRITICAL - load average: 2.22, 2.36, 1.49 [00:51:12] RECOVERY - cp20 Varnish Backends on cp20 is OK: All 12 backends are healthy [00:51:12] PROBLEM - cp31 Stunnel HTTP for mw102 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:51:14] PROBLEM - cp30 Stunnel HTTP for mw102 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:51:18] PROBLEM - cp20 Stunnel HTTP for mw102 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:51:36] RECOVERY - cp21 Varnish Backends on cp21 is OK: All 12 backends are healthy [00:51:48] PROBLEM - gluster111 Current Load on gluster111 is CRITICAL: CRITICAL - load average: 5.94, 3.78, 3.18 [00:52:33] RECOVERY - cp31 Stunnel HTTP for mw121 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 3.665 second response time [00:52:47] PROBLEM - cp31 Current Load on cp31 is WARNING: WARNING - load average: 0.58, 1.72, 1.36 [00:53:12] RECOVERY - mw121 MediaWiki Rendering on mw121 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 0.664 second response time [00:53:19] RECOVERY - cp30 Stunnel HTTP for mw121 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.313 second response time [00:53:32] PROBLEM - cp30 Varnish Backends on cp30 is CRITICAL: 1 backends are down. mw121 [00:53:43] RECOVERY - cp21 Stunnel HTTP for mw121 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 1.169 second response time [00:53:47] RECOVERY - cp20 Stunnel HTTP for mw121 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 1.342 second response time [00:54:38] PROBLEM - cp21 Stunnel HTTP for mw101 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:54:46] RECOVERY - cp31 Current Load on cp31 is OK: OK - load average: 0.51, 1.34, 1.26 [00:54:50] PROBLEM - gluster121 Current Load on gluster121 is WARNING: WARNING - load average: 3.56, 3.31, 2.85 [00:54:51] PROBLEM - mw101 MediaWiki Rendering on mw101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:55:01] PROBLEM - gluster101 Current Load on gluster101 is CRITICAL: CRITICAL - load average: 4.75, 3.92, 3.36 [00:55:12] PROBLEM - cp20 Stunnel HTTP for mw101 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:55:12] PROBLEM - cp20 Varnish Backends on cp20 is CRITICAL: 1 backends are down. mw112 [00:55:13] PROBLEM - cp31 Stunnel HTTP for mw101 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:55:27] RECOVERY - cp30 Varnish Backends on cp30 is OK: All 12 backends are healthy [00:55:44] PROBLEM - dnd.bellinrattin.it - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - dnd.bellinrattin.it All nameservers failed to answer the query. [00:56:10] PROBLEM - db101 Current Load on db101 is WARNING: WARNING - load average: 7.49, 6.90, 6.44 [00:56:25] PROBLEM - cp21 Stunnel HTTP for mw122 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:56:50] RECOVERY - gluster121 Current Load on gluster121 is OK: OK - load average: 3.39, 3.36, 2.92 [00:56:59] PROBLEM - gluster101 Current Load on gluster101 is WARNING: WARNING - load average: 3.75, 3.98, 3.45 [00:57:03] PROBLEM - cp20 Stunnel HTTP for mw122 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:57:04] PROBLEM - cp30 Stunnel HTTP for mw122 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:57:10] RECOVERY - cp31 Stunnel HTTP for mw111 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 7.134 second response time [00:57:11] RECOVERY - cp20 Stunnel HTTP for mw111 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 5.693 second response time [00:57:28] RECOVERY - cp31 Stunnel HTTP for mw102 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 9.735 second response time [00:57:36] PROBLEM - cp21 Varnish Backends on cp21 is CRITICAL: 1 backends are down. mw111 [00:57:42] RECOVERY - cp30 Stunnel HTTP for mw102 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 8.821 second response time [00:57:44] RECOVERY - cp20 Stunnel HTTP for mw102 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 6.833 second response time [00:57:45] PROBLEM - mw122 MediaWiki Rendering on mw122 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:57:47] RECOVERY - mw102 MediaWiki Rendering on mw102 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 7.410 second response time [00:57:48] PROBLEM - gluster111 Current Load on gluster111 is WARNING: WARNING - load average: 3.47, 3.92, 3.47 [00:58:03] PROBLEM - cp31 Stunnel HTTP for mw122 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:58:07] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 5.83, 6.64, 6.41 [00:58:23] RECOVERY - cp30 Stunnel HTTP for mw111 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 0.470 second response time [00:58:26] RECOVERY - mw111 MediaWiki Rendering on mw111 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 0.642 second response time [00:58:45] RECOVERY - cp21 Stunnel HTTP for mw111 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.362 second response time [00:59:03] PROBLEM - cp31 Stunnel HTTP for mw121 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:59:10] RECOVERY - cp21 Stunnel HTTP for mw102 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 5.284 second response time [00:59:12] RECOVERY - cp20 Varnish Backends on cp20 is OK: All 12 backends are healthy [00:59:37] PROBLEM - mw121 MediaWiki Rendering on mw121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:59:47] PROBLEM - cp30 Stunnel HTTP for mw121 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:59:48] PROBLEM - gluster111 Current Load on gluster111 is CRITICAL: CRITICAL - load average: 4.86, 4.16, 3.60 [01:00:11] PROBLEM - cp20 Stunnel HTTP for mw121 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:00:13] PROBLEM - cp21 Stunnel HTTP for mw121 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:00:50] PROBLEM - gluster121 Current Load on gluster121 is WARNING: WARNING - load average: 3.45, 3.84, 3.24 [01:01:36] RECOVERY - cp21 Varnish Backends on cp21 is OK: All 12 backends are healthy [01:01:48] PROBLEM - gluster111 Current Load on gluster111 is WARNING: WARNING - load average: 3.11, 3.74, 3.52 [01:02:50] RECOVERY - gluster121 Current Load on gluster121 is OK: OK - load average: 2.26, 3.36, 3.14 [01:02:54] RECOVERY - gluster101 Current Load on gluster101 is OK: OK - load average: 2.48, 3.37, 3.39 [01:03:39] PROBLEM - cp31 Stunnel HTTP for mw102 on cp31 is CRITICAL: HTTP CRITICAL - No data received from host [01:03:48] RECOVERY - gluster111 Current Load on gluster111 is OK: OK - load average: 2.37, 3.24, 3.36 [01:04:04] PROBLEM - mw102 MediaWiki Rendering on mw102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:04:17] PROBLEM - cp30 Stunnel HTTP for mw102 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:04:18] PROBLEM - cp20 Stunnel HTTP for mw102 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:04:46] PROBLEM - mw111 MediaWiki Rendering on mw111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:04:47] PROBLEM - cp30 Stunnel HTTP for mw111 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:05:33] PROBLEM - cp21 Stunnel HTTP for mw102 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:06:07] RECOVERY - cp31 Stunnel HTTP for mw112 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 2.238 second response time [01:06:35] RECOVERY - cp30 Stunnel HTTP for mw112 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.682 second response time [01:06:39] RECOVERY - mw112 MediaWiki Rendering on mw112 is OK: HTTP OK: HTTP/1.1 200 OK - 22335 bytes in 1.256 second response time [01:07:11] RECOVERY - cp21 Stunnel HTTP for mw112 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.017 second response time [01:07:18] RECOVERY - cp20 Stunnel HTTP for mw112 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.249 second response time [01:08:36] RECOVERY - cp31 Stunnel HTTP for mw122 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 9.969 second response time [01:09:29] PROBLEM - cp21 Stunnel HTTP for mw111 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:09:30] PROBLEM - cp31 Stunnel HTTP for mw111 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:09:31] RECOVERY - cp30 Stunnel HTTP for mw122 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 4.210 second response time [01:09:35] PROBLEM - cp20 Stunnel HTTP for mw111 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:09:47] RECOVERY - cp20 Stunnel HTTP for mw122 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.093 second response time [01:10:08] PROBLEM - cp31 Stunnel HTTP for mw112 on cp31 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.239 second response time [01:10:13] RECOVERY - mw122 MediaWiki Rendering on mw122 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 2.596 second response time [01:10:46] PROBLEM - gluster101 Current Load on gluster101 is CRITICAL: CRITICAL - load average: 4.28, 3.59, 3.42 [01:10:52] [discord] getting 15-30 sec response delays on non-cached content [01:10:52] PROBLEM - mw112 MediaWiki Rendering on mw112 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:10:56] PROBLEM - cp30 Stunnel HTTP for mw112 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:10:59] RECOVERY - cp21 Stunnel HTTP for mw122 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 7.331 second response time [01:11:31] PROBLEM - cp21 Stunnel HTTP for mw112 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:11:41] [discord] it is quite unfortunate if nothing is on the logs and apache-status also doesn't show anything [01:11:44] PROBLEM - cp20 Stunnel HTTP for mw112 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:12:13] [discord] it is quite unfortunate if nothing is on the logs and apache-status also doesn't show anything worthwhile (edited) [01:12:44] PROBLEM - gluster101 Current Load on gluster101 is WARNING: WARNING - load average: 3.87, 3.74, 3.50 [01:12:52] MacFan4000: are we still having issues again? [01:12:52] [discord] getting 15-30 sec response delays on non-cached content, cached is instant (edited) [01:13:33] I even tried rebooting the mw servers yesterday and the issues just kept coming back. [01:13:46] RECOVERY - cp20 Stunnel HTTP for mw112 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 3.051 second response time [01:13:52] PROBLEM - cp30 Stunnel HTTP for mw122 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:14:08] RECOVERY - cp31 Stunnel HTTP for mw112 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.312 second response time [01:14:15] PROBLEM - cp20 Stunnel HTTP for mw122 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:14:33] This is getting ridiculous. [01:14:36] PROBLEM - mw122 MediaWiki Rendering on mw122 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:14:42] PROBLEM - gluster101 Current Load on gluster101 is CRITICAL: CRITICAL - load average: 6.02, 4.37, 3.74 [01:14:54] How and why is this happening...? [01:14:58] RECOVERY - cp30 Stunnel HTTP for mw112 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 1.595 second response time [01:15:00] RECOVERY - mw112 MediaWiki Rendering on mw112 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 2.331 second response time [01:15:02] PROBLEM - cp31 Stunnel HTTP for mw122 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:15:09] PROBLEM - cp21 Stunnel HTTP for mw122 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:15:13] RECOVERY - cp21 Stunnel HTTP for mw121 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 7.856 second response time [01:15:30] RECOVERY - cp21 Stunnel HTTP for mw112 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.016 second response time [01:15:32] RECOVERY - cp31 Stunnel HTTP for mw121 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 5.094 second response time [01:15:48] PROBLEM - gluster111 Current Load on gluster111 is WARNING: WARNING - load average: 3.81, 3.51, 3.33 [01:16:03] RECOVERY - cp30 Stunnel HTTP for mw121 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.727 second response time [01:16:08] RECOVERY - mw121 MediaWiki Rendering on mw121 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 0.748 second response time [01:16:45] RECOVERY - cp20 Stunnel HTTP for mw121 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.014 second response time [01:17:01] PROBLEM - cp30 Varnish Backends on cp30 is CRITICAL: 1 backends are down. mw112 [01:17:12] PROBLEM - cp20 Varnish Backends on cp20 is CRITICAL: 1 backends are down. mw102 [01:17:45] MacFan4000: WHY is this happening again??? [01:18:00] Same time as yesterday also. [01:18:41] RECOVERY - cp31 Stunnel HTTP for mw101 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 8.844 second response time [01:18:42] RECOVERY - cp30 Stunnel HTTP for mw101 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 8.707 second response time [01:19:01] RECOVERY - cp30 Varnish Backends on cp30 is OK: All 12 backends are healthy [01:19:10] [discord] it was getting kinda slow an hour ago, but now is [01:19:11] [discord] https://cdn.discordapp.com/attachments/808001911868489748/962159711002697769/unknown.png [01:19:18] RECOVERY - cp20 Stunnel HTTP for mw102 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 3.792 second response time [01:19:20] RECOVERY - mw101 MediaWiki Rendering on mw101 is OK: HTTP OK: HTTP/1.1 200 OK - 22335 bytes in 2.208 second response time [01:19:20] RECOVERY - cp30 Stunnel HTTP for mw102 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14548 bytes in 4.366 second response time [01:19:32] RECOVERY - cp21 Stunnel HTTP for mw101 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 2.323 second response time [01:19:36] PROBLEM - cp21 Varnish Backends on cp21 is CRITICAL: 1 backends are down. mw101 [01:19:48] RECOVERY - gluster111 Current Load on gluster111 is OK: OK - load average: 3.18, 3.34, 3.30 [01:19:48] PROBLEM - cp31 Stunnel HTTP for mw121 on cp31 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.235 second response time [01:19:50] RECOVERY - cp21 Stunnel HTTP for mw102 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 6.121 second response time [01:20:06] RECOVERY - cp31 Stunnel HTTP for mw102 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 4.194 second response time [01:20:22] PROBLEM - cp30 Stunnel HTTP for mw121 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:20:24] RECOVERY - cp20 Stunnel HTTP for mw101 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 2.193 second response time [01:20:29] PROBLEM - mw121 MediaWiki Rendering on mw121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:20:34] RECOVERY - mw102 MediaWiki Rendering on mw102 is OK: HTTP OK: HTTP/1.1 200 OK - 22334 bytes in 5.644 second response time [01:20:44] [discord] hmm, reloaded and is totally inconsistent [01:20:45] [discord] https://cdn.discordapp.com/attachments/808001911868489748/962160106026434570/unknown.png [01:20:49] PROBLEM - cp20 Stunnel HTTP for mw121 on cp20 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.010 second response time [01:20:57] [discord] but still with 25 sec spikes [01:21:11] !log reboot mw* [01:21:12] RECOVERY - cp20 Varnish Backends on cp20 is OK: All 12 backends are healthy [01:21:36] RECOVERY - cp21 Varnish Backends on cp21 is OK: All 12 backends are healthy [01:21:43] PROBLEM - cp21 Stunnel HTTP for mw112 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:21:49] PROBLEM - cp21 Stunnel HTTP for mw121 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:22:10] PROBLEM - cp20 Stunnel HTTP for mw112 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:40] RECOVERY - cp21 Stunnel HTTP for mw112 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 2.679 second response time [01:23:47] PROBLEM - cp20 Stunnel HTTP for mw102 on cp20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:49] PROBLEM - cp30 Stunnel HTTP for mw102 on cp30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:24:09] RECOVERY - cp20 Stunnel HTTP for mw112 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14562 bytes in 0.017 second response time [01:24:12] PROBLEM - cp21 Stunnel HTTP for mw102 on cp21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:24:24] And logbot quit... [01:24:28] PROBLEM - cp31 Stunnel HTTP for mw102 on cp31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:24:33] PROBLEM - gluster101 Current Load on gluster101 is WARNING: WARNING - load average: 3.73, 4.00, 3.87 [01:24:48] Ugh.... [01:24:51] PROBLEM - mw102 MediaWiki Rendering on mw102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:24:58] There's nothing more I can do right now... [01:25:05] PROBLEM - test101 Current Load on test101 is WARNING: WARNING - load average: 0.83, 1.53, 1.87 [01:25:27] [discord] yeah, the times are just a little bit better... retrieving a simple css file... [01:25:43] RECOVERY - dnd.bellinrattin.it - reverse DNS on sslhost is OK: SSL OK - dnd.bellinrattin.it reverse DNS resolves to cp21.miraheze.org - CNAME OK [01:25:54] HOW??? What is pointing to test101??? test101 is currently down for MediaWiki so how is it having high loads?? [01:26:01] !log reboot mw* [01:26:31] PROBLEM - gluster101 Current Load on gluster101 is CRITICAL: CRITICAL - load average: 4.62, 4.19, 3.96 [01:26:53] I have to go now again also.... [01:27:14] [discord] i am sorry for having to disturb you and making you lose sanity Cosmic [01:27:48] PROBLEM - gluster111 Current Load on gluster111 is CRITICAL: CRITICAL - load average: 4.47, 3.76, 3.41 [01:28:08] PROBLEM - db101 Current Load on db101 is WARNING: WARNING - load average: 7.42, 6.67, 6.22 [01:28:29] PROBLEM - gluster101 Current Load on gluster101 is WARNING: WARNING - load average: 2.64, 3.61, 3.77 [01:29:05] RECOVERY - test101 Current Load on test101 is OK: OK - load average: 1.08, 1.24, 1.67 [01:29:22] @Kozd No problem at all, I don't understand what is happening here... I tried rebooting the servers, restarting fpm, everything I can think of, and can not figure this out. I'm going to recommend to the rest of SRE that if these issues continue again tomorrow, we focus our entire efforts for a resolution as this is getting ridiculous now. [01:29:48] RECOVERY - gluster111 Current Load on gluster111 is OK: OK - load average: 2.48, 3.32, 3.29 [01:30:04] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 4.94, 6.18, 6.10 [01:30:08] Very sorry for the issues. [01:30:08]