[00:03:30] [02puppet] 07Universal-Omega opened pull request 03#2047: prometheus-es-exporter: lower QueryIntervalSecs to 60 - 13https://git.io/JKEmH [00:04:04] [02puppet] 07Universal-Omega synchronize pull request 03#2047: prometheus-es-exporter: lower QueryIntervalSecs to 60 - 13https://git.io/JKEmH [00:04:43] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 26.58, 22.29, 18.54 [00:05:06] [02puppet] 07Universal-Omega edited pull request 03#2047: prometheus-es-exporter: lower QueryIntervalSecs to 60 - 13https://git.io/JKEmH [00:05:27] [02puppet] 07paladox closed pull request 03#2047: prometheus-es-exporter: lower QueryIntervalSecs to 60 - 13https://git.io/JKEmH [00:05:28] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JKEmF [00:05:30] [02miraheze/puppet] 07Universal-Omega 03b541d72 - prometheus-es-exporter: lower QueryIntervalSecs to 60 (#2047) [00:05:55] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.18, 5.40, 4.62 [00:08:53] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.76, 6.59, 5.23 [00:10:13] paladox: do you have any ideas on https://phabricator.miraheze.org/T6979#164939? If not that's OK, I'll figure something out. [00:10:15] [url] ⚓ T6979 Collect Statistics for API Requests (Including Module Type) | phabricator.miraheze.org [00:10:48] nope :/ [00:17:47] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKEY9 [00:17:48] [02miraheze/mw-config] 07Universal-Omega 036ed1e69 - Raise DPL requirement to 100,000 pages [00:19:06] miraheze/mw-config - Universal-Omega the build passed. [00:20:35] [02puppet] 07Universal-Omega opened pull request 03#2048: prometheus-es-exporter: query last 2 minutes rather than 15 - 13https://git.io/JKEYF [00:21:06] [02puppet] 07Universal-Omega synchronize pull request 03#2048: prometheus-es-exporter: query last 2 minutes rather than 15 - 13https://git.io/JKEYF [00:21:54] [02puppet] 07paladox closed pull request 03#2048: prometheus-es-exporter: query last 2 minutes rather than 15 - 13https://git.io/JKEYF [00:21:55] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JKEYp [00:21:57] [02miraheze/puppet] 07Universal-Omega 0323de8f8 - prometheus-es-exporter: query last 2 minutes rather than 15 (#2048) [00:23:43] Thanks again! [00:33:24] !log [@test3] starting deploy of {'config': True} to skip [00:33:25] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [00:33:29] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:33:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:34:30] !log [@mw11] starting deploy of {'config': True} to all [00:34:36] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:34:46] !log [@mw11] finished deploy of {'config': True} to all - SUCCESS in 15s [00:34:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:37:19] There, prometheus-es-exporter stats are more accurate now. [00:42:03] PROBLEM - db11 Disk Space on db11 is WARNING: DISK WARNING - free space: / 48803 MB (10% inode=97%); [00:47:03] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 12.64, 16.87, 22.99 [00:56:33] RECOVERY - db11 Disk Space on db11 is OK: DISK OK - free space: / 63955 MB (14% inode=97%); [00:57:29] !log install iptables-persistent on cp12 [00:57:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:01:46] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.53, 18.78, 20.32 [01:02:28] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.33, 4.70, 5.87 [01:13:54] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.59, 5.64, 5.54 [01:16:49] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.04, 5.17, 5.35 [01:23:35] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 2.63, 3.79, 4.71 [01:24:14] PROBLEM - cp12 Stunnel Http for mw13 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:24:41] PROBLEM - cp12 Stunnel Http for mw11 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:25:08] PROBLEM - cp12 HTTPS on cp12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:25:25] PROBLEM - cp12 Puppet on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:25:30] PROBLEM - cp12 Stunnel Http for mw9 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:25:39] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 51.222.25.132/cpweb, 2607:5300:205:200::1c30/cpweb [01:25:40] PROBLEM - cp12 Stunnel Http for mw10 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:26:06] paladox: ^ [01:26:14] PROBLEM - cp12 Stunnel Http for mon2 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:26:17] PROBLEM - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:26:24] hmm [01:26:27] looking [01:26:33] PROBLEM - cp12 Stunnel Http for mw12 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:26:34] PROBLEM - cp12 PowerDNS Recursor on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:26:45] PROBLEM - ping4 on cp12 is CRITICAL: PING CRITICAL - Packet loss = 50%, RTA = 84.07 ms [01:27:22] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 51.222.25.132/cpweb, 2607:5300:205:200::1c30/cpweb [01:27:49] RECOVERY - cp12 Stunnel Http for mw13 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.332 second response time [01:28:07] RECOVERY - cp12 Stunnel Http for mw11 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.627 second response time [01:28:30] RECOVERY - cp12 HTTPS on cp12 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 3027 bytes in 0.344 second response time [01:28:41] RECOVERY - cp12 Stunnel Http for mw9 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.307 second response time [01:28:50] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [01:28:52] RECOVERY - cp12 Stunnel Http for mw10 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.335 second response time [01:29:15] RECOVERY - cp12 Stunnel Http for mon2 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 33934 bytes in 0.369 second response time [01:29:18] RECOVERY - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is OK: OK - NGINX Error Rate is 5% [01:29:28] RECOVERY - cp12 Stunnel Http for mw12 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.312 second response time [01:29:30] RECOVERY - cp12 PowerDNS Recursor on cp12 is OK: DNS OK: 0.108 seconds response time. miraheze.org returns 167.114.2.161,2607:5300:201:3100::1d3,2607:5300:205:200::1c30,51.222.25.132 [01:29:48] RECOVERY - ping4 on cp12 is OK: PING OK - Packet loss = 0%, RTA = 83.22 ms [01:30:11] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [01:31:24] RECOVERY - cp12 Puppet on cp12 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:36:19] [02puppet] 07Universal-Omega opened pull request 03#2049: prometheus-es-exporter: reduce QueryTimeoutSecs to 10 - 13https://git.io/JKEWs [01:37:51] [02puppet] 07paladox closed pull request 03#2049: prometheus-es-exporter: reduce QueryTimeoutSecs to 10 - 13https://git.io/JKEWs [01:37:53] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKEWV [01:37:54] [02miraheze/puppet] 07Universal-Omega 038232a81 - prometheus-es-exporter: reduce QueryTimeoutSecs to 10 (#2049) [01:46:34] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.88, 5.75, 5.20 [01:48:51] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK [01:48:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:49:08] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK [01:49:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:49:28] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.29, 5.69, 5.25 [01:51:22] !log cp12: iptables -t raw -A OUTPUT -p tcp --dport 80 -j NOTRACK [01:51:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:51:34] !log cp12: iptables -t raw -A OUTPUT -p tcp --dport 443 -j NOTRACK [01:51:36] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:52:24] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 2.70, 4.31, 4.79 [01:55:52] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 51.222.25.132/cpweb, 2607:5300:205:200::1c30/cpweb [01:55:54] PROBLEM - cp12 Stunnel Http for mw13 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:56:35] hmm [01:56:35] PROBLEM - cp12 Stunnel Http for mw9 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:56:57] PROBLEM - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is CRITICAL: CRITICAL - NGINX Error Rate is 80% [01:57:08] PROBLEM - cp12 Stunnel Http for mw10 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:57:23] PROBLEM - cp12 Stunnel Http for mon2 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:57:41] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 51.222.25.132/cpweb, 2607:5300:205:200::1c30/cpweb [01:57:42] PROBLEM - cp12 Stunnel Http for mw12 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:58:33] PROBLEM - cp12 Stunnel Http for mw8 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:58:35] PROBLEM - cp12 Varnish Backends on cp12 is CRITICAL: 6 backends are down. mw8 mw9 mw10 mw11 mw12 mw13 [01:58:57] PROBLEM - cp12 Stunnel Http for mw11 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:59:02] RECOVERY - cp12 Stunnel Http for mw13 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.335 second response time [01:59:37] RECOVERY - cp12 Stunnel Http for mw9 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15874 bytes in 0.324 second response time [02:00:01] RECOVERY - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is OK: OK - NGINX Error Rate is 3% [02:00:16] RECOVERY - cp12 Stunnel Http for mw10 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.318 second response time [02:00:26] RECOVERY - cp12 Stunnel Http for mon2 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 33934 bytes in 0.331 second response time [02:00:38] RECOVERY - cp12 Stunnel Http for mw12 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.432 second response time [02:00:39] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [02:01:31] RECOVERY - cp12 Varnish Backends on cp12 is OK: All 9 backends are healthy [02:06:53] PROBLEM - cp12 Stunnel Http for mw13 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:07:29] PROBLEM - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is CRITICAL: CRITICAL - NGINX Error Rate is 83% [02:07:31] PROBLEM - cp12 Stunnel Http for mw9 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:08:09] PROBLEM - cp12 Stunnel Http for mw10 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:08:15] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 51.222.25.132/cpweb, 2607:5300:205:200::1c30/cpweb [02:08:30] PROBLEM - cp12 Stunnel Http for mon2 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:08:44] PROBLEM - cp12 Stunnel Http for mw12 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:09:16] PROBLEM - cp12 Varnish Backends on cp12 is CRITICAL: 6 backends are down. mw8 mw9 mw10 mw11 mw12 mw13 [02:13:42] !log rebooted cp12 [02:13:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:14:37] RECOVERY - cp12 Stunnel Http for mw10 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.647 second response time [02:14:43] RECOVERY - cp12 Stunnel Http for mw8 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15868 bytes in 0.305 second response time [02:14:47] RECOVERY - cp12 Stunnel Http for mon2 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 33972 bytes in 0.361 second response time [02:14:54] RECOVERY - cp12 Stunnel Http for mw12 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.318 second response time [02:14:55] RECOVERY - cp12 Stunnel Http for mw11 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.342 second response time [02:15:24] RECOVERY - cp12 Varnish Backends on cp12 is OK: All 9 backends are healthy [02:16:28] RECOVERY - cp12 Stunnel Http for mw13 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.328 second response time [02:16:57] RECOVERY - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is OK: OK - NGINX Error Rate is 3% [02:17:00] RECOVERY - cp12 Stunnel Http for mw9 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.336 second response time [02:17:26] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [02:17:29] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [02:21:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.75, 8.13, 6.16 [02:21:55] !log cp12: reverted previous change and only applied 'iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK ' and 'iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK' [02:21:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:47:28] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.90, 4.83, 5.72 [03:00:11] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.92, 4.37, 5.01 [03:01:51] PROBLEM - cp12 Stunnel Http for mon2 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [03:01:53] PROBLEM - incubator.nocyclo.tk - LetsEncrypt on sslhost is CRITICAL: connect to address incubator.nocyclo.tk and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [03:01:53] PROBLEM - cp12 Stunnel Http for mw11 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [03:05:07] RECOVERY - cp12 Stunnel Http for mon2 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 33934 bytes in 0.335 second response time [03:06:09] RECOVERY - cp12 Stunnel Http for mw11 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.314 second response time [03:06:47] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK [03:06:54] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK [03:07:01] !log cp12: iptables -t raw -A OUTPUT -p tcp --sport 80 -j NOTRACK [03:07:04] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:07:09] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:07:14] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:07:15] !log cp12: iptables -t raw -A OUTPUT -p tcp --sport 443 -j NOTRACK [03:07:18] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:07:35] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.25, 6.28, 5.54 [03:09:46] RECOVERY - incubator.nocyclo.tk - LetsEncrypt on sslhost is OK: OK - Certificate 'incubator.nocyclo.tk' will expire on Wed 17 Nov 2021 04:06:46 GMT +0000. [03:25:51] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.11, 3.75, 3.41 [03:28:44] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.57, 3.22, 3.26 [03:38:09] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 18.66, 20.92, 19.39 [03:41:03] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 13.56, 17.86, 18.49 [03:48:49] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.89, 5.17, 5.77 [03:54:00] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.82, 3.57, 3.25 [03:54:41] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.06, 5.38, 5.63 [03:56:54] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 3.11, 3.40, 3.24 [04:06:18] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.18, 5.56, 5.79 [04:20:45] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.10, 5.89, 5.62 [04:23:41] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.67, 5.83, 5.64 [04:24:54] [02miraheze/MirahezeMagic] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-2 [+0/-0/±1] 13https://git.io/JKzIh [04:24:56] [02miraheze/MirahezeMagic] 07Universal-Omega 03b78bb86 - en.json: fix capitalisation and use sre-mediawiki email [04:24:57] [02MirahezeMagic] 07Universal-Omega created branch 03Universal-Omega-patch-2 - 13https://git.io/fQRGX [04:24:59] [02MirahezeMagic] 07Universal-Omega opened pull request 03#295: en.json: fix capitalisation and use sre-mediawiki email - 13https://git.io/JKzLe [04:25:54] miraheze/MirahezeMagic - Universal-Omega the build passed. [04:26:10] [02miraheze/MirahezeMagic] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-2 [+0/-0/±1] 13https://git.io/JKzLX [04:26:11] [02miraheze/MirahezeMagic] 07Universal-Omega 03e3423b8 - Update en.json [04:26:13] [02MirahezeMagic] 07Universal-Omega synchronize pull request 03#295: en.json: fix capitalisation and use sre-mediawiki email - 13https://git.io/JKzLe [04:26:35] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.34, 6.42, 5.93 [04:27:08] miraheze/MirahezeMagic - Universal-Omega the build passed. [04:29:24] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.64, 5.38, 5.63 [04:30:11] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-2 [+0/-0/±1] 13https://git.io/JKzq1 [04:30:13] [02miraheze/mw-config] 07Universal-Omega 03bbc9f63 - Update extension-list [04:30:14] [02mw-config] 07Universal-Omega created branch 03Universal-Omega-patch-2 - 13https://git.io/vbvb3 [04:30:22] [02mw-config] 07Universal-Omega opened pull request 03#4157: Remove AdvancedSearch - 13https://git.io/JKzq5 [04:31:03] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-2 [+0/-0/±1] 13https://git.io/JKzmL [04:31:04] [02miraheze/mw-config] 07Universal-Omega 03678f423 - Update ManageWikiExtensions.php [04:31:06] [02mw-config] 07Universal-Omega synchronize pull request 03#4157: Remove AdvancedSearch - 13https://git.io/JKzq5 [04:31:23] miraheze/mw-config - Universal-Omega the build passed. [04:32:11] miraheze/mw-config - Universal-Omega the build passed. [04:32:56] [02mw-config] 07Universal-Omega edited pull request 03#4157: T7740: remove AdvancedSearch - 13https://git.io/JKzq5 [04:35:21] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.02, 5.91, 5.70 [04:38:28] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.46, 5.81, 5.71 [04:38:31] !log [reception@mw11] starting deploy of {'config': True} to all [04:38:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:38:57] !log [reception@mw11] finished deploy of {'config': True} to all - SUCCESS in 27s [04:39:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:41:32] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 6.25, 8.11, 6.36 [04:42:08] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 20.10, 21.21, 18.92 [04:44:20] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 4.49, 6.61, 6.08 [04:44:58] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 18.36, 19.52, 18.62 [04:47:04] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.68, 4.28, 4.98 [04:51:19] !log [reception@mw11] starting deploy of {'config': True, 'world': True, 'l10n': True, 'gitinfo': True} to all [04:51:33] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [05:04:17] [02puppet] 07Universal-Omega opened pull request 03#2050: Remove test4 - 13https://git.io/JKz0a [05:05:07] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 7.11, 6.88, 5.87 [05:08:03] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 5.32, 6.54, 5.96 [05:08:30] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.06, 5.21, 5.00 [05:11:20] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.55, 5.04, 4.97 [05:11:51] While accessing my contributions page on FP [05:11:51] https://cdn.discordapp.com/attachments/808001911868489748/899162772095524945/IMG_20211017_104100.jpg [05:14:04] PROBLEM - db11 Disk Space on db11 is WARNING: DISK WARNING - free space: / 48516 MB (10% inode=97%); [05:17:07] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.73, 21.64, 17.96 [05:17:57] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.65, 5.93, 5.51 [05:19:28] !log [reception@mw11] finished deploy of {'config': True, 'world': True, 'l10n': True, 'gitinfo': True} to all - SUCCESS in 1688s [05:19:32] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [05:20:06] PROBLEM - cloud5 Current Load on cloud5 is CRITICAL: CRITICAL - load average: 30.00, 27.71, 22.17 [05:20:34] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 13.61, 18.61, 17.57 [05:21:24] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.68, 6.24, 5.72 [05:25:57] PROBLEM - db13 Disk Space on db13 is WARNING: DISK WARNING - free space: / 48431 MB (10% inode=98%); [05:26:00] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 15.96, 21.05, 21.00 [05:27:10] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.17, 5.79, 5.72 [05:28:47] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 17.22, 18.94, 20.15 [05:30:03] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.17, 6.33, 5.94 [05:33:00] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.65, 5.77, 5.81 [05:54:18] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.43, 3.42, 3.23 [05:57:07] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.18, 3.54, 3.29 [05:58:51] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.65, 4.39, 5.02 [05:59:54] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.83, 3.34, 3.26 [06:06:42] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.31, 6.37, 3.99 [06:11:36] RECOVERY - db13 Disk Space on db13 is OK: DISK OK - free space: / 55584 MB (12% inode=98%); [06:12:26] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.97, 7.43, 5.21 [06:15:10] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 11.45, 8.71, 6.02 [06:17:55] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.10, 7.81, 6.10 [06:20:45] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.34, 8.33, 6.61 [06:23:31] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 5.54, 7.32, 6.52 [06:26:21] PROBLEM - db13 Disk Space on db13 is WARNING: DISK WARNING - free space: / 47941 MB (10% inode=98%); [06:31:52] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.77, 6.48, 6.47 [06:45:59] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 5.81, 6.89, 6.73 [06:48:43] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 9.10, 7.74, 7.07 [06:51:27] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 5.85, 6.90, 6.87 [06:54:15] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 6.08, 6.61, 6.76 [06:57:58] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.88, 5.85, 4.97 [07:00:41] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.21, 7.01, 6.83 [07:00:46] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.71, 5.15, 4.84 [07:03:28] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.89, 6.62, 6.72 [07:03:36] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.78, 5.75, 5.12 [07:06:28] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.01, 5.73, 5.25 [07:12:09] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.74, 6.09, 5.45 [07:12:50] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.26, 6.98, 6.82 [07:15:38] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.97, 6.43, 6.64 [07:17:43] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.37, 5.37, 5.42 [07:22:10] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 9.61, 8.20, 7.37 [07:25:01] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.74, 7.91, 7.40 [07:26:10] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.74, 4.63, 5.08 [07:33:14] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 4.91, 5.99, 6.68 [07:41:27] dmehus: do you actually know what Citoid is? [08:06:37] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.00, 7.02, 6.55 [08:09:27] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.89, 6.62, 6.49 [08:10:50] alerting : [FIRING:1] (MediaWiki Exception Rate mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:10:51] [url] Grafana | grafana.miraheze.org [08:11:16] me ^ [08:11:22] Yey! [08:14:32] Spookreeeno: can we have the bot ignore links from icinga-miraheze? [08:14:40] If that's possible. [08:14:54] CosmicAlpha: I can ignore grafana [08:15:04] That would work also. [08:15:47] .urlexclude grafana.miraheze.org [08:15:48] Spookreeeno: This URL is now excluded from auto title. [08:15:50] alerting : [FIRING:1] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:16:00] Oh nice! [08:16:03] Works [08:17:12] Spookreeeno: running one more alert test then I'll finalise the alert and go to sleep. Will do panel tomorrow. [08:17:19] CosmicAlpha: nice [08:17:51] RECOVERY - db13 Disk Space on db13 is OK: DISK OK - free space: / 49706 MB (11% inode=98%); [08:18:36] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.89, 7.03, 6.71 [08:21:18] Spookreeeno: what I want to do is hook icinga itself into these exception rate increases, as more information can be included in Icinga alerts, such as current average, etc... but that's more difficult and can wait for now. [08:24:16] PROBLEM - db13 Disk Space on db13 is WARNING: DISK WARNING - free space: / 47370 MB (10% inode=98%); [08:24:51] Hmm...why won't it alert again? [08:29:31] CosmicAlpha: it says pending [08:29:39] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 4.65, 6.23, 6.63 [08:30:50] alerting : [FIRING:1] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:30:51] Spookreeeno: yeah I just reset it, to try again. [08:31:04] There we go, but title didn't work properly. [08:31:05] Both went off [08:31:18] CosmicAlpha: it went off for one resolved & one alerting same time [08:33:17] Spookreeeno: I'll enable !sre pings now. Feel free to change alert and remove it, if it messes up while I'm asleep. [08:34:03] Oh whoops I just pinged it. My bad. [08:35:50] alerting : [FIRING:2] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:37:55] CosmicAlpha: alert name has no ! in front [08:38:24] Oh whoops [08:40:50] alerting : [FIRING:1] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:42:04] Looks sensible now [08:42:14] Spookreeeno: it's finalised now. Should alert above 0.9 [08:42:20] Cool! [08:42:59] I'm off to sleep now, hoping it doesn't spam sre pings throughout the night... [08:43:38] I'm around [08:45:47] Spookreeeno: Yeah that's good. If it does just remove the ping, and I'll look tomorrow if you want. Also, I do the panel for it tomorrow also. [08:45:50] ok : [RESOLVED] (sre MediaWiki Exception Rate mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:46:03] Yep [08:53:04] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 6.93, 6.78, 6.69 [08:56:12] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKaIq [08:56:13] [02miraheze/mw-config] 07Universal-Omega 034bd0857 - Sitenotice: correct date format [08:56:15] PROBLEM - steamdecklinux.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'steamdecklinux.wiki' expires in 7 day(s) (Mon 25 Oct 2021 08:49:47 GMT +0000). [08:57:11] miraheze/mw-config - Universal-Omega the build passed. [09:01:28] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.90, 6.47, 6.68 [09:02:50] !log [@test3] starting deploy of {'config': True} to skip [09:02:51] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [09:02:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:02:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:05:07] !log [@mw11] starting deploy of {'config': True} to all [09:05:11] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:05:40] !log [@mw11] finished deploy of {'config': True} to all - SUCCESS in 32s [09:05:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:30:30] I see this on many project pages [09:32:02] Could you cite some? [09:32:26] https://en.famepedia.org/w/index.php?title=FAMEPedia:Criteria_for_speedy_deletion [09:32:28] [url] FAMEPedia:Criteria for speedy deletion - FAMEPedia | en.famepedia.org [09:32:57] https://en.famepedia.org/wiki/Special:Contributions/Magogre [09:32:59] [url] Internal error - FAMEPedia | en.famepedia.org [09:36:23] yeah, Special:Contributions is borked [09:37:06] I guess we better get a task [09:37:56] I'll create one [09:38:57] https://phabricator.miraheze.org/T8184 [09:38:58] [url] ⚓ T8184 Special:Contributions on famepedia is borked | phabricator.miraheze.org [09:45:06] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 48934 MB (10% inode=98%); [09:49:17] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.73, 5.72, 4.78 [09:51:23] majavah: any idea on https://phabricator.miraheze.org/T8184#164981 [09:51:24] [url] ⚓ T8184 Special:Contributions on famepedia is borked | phabricator.miraheze.org [09:52:03] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 2.70, 4.43, 4.43 [09:53:01] https://github.com/wikimedia/mediawiki/blob/REL1_37/includes/historyblob/HistoryBlobStub.php#L129 fails [09:53:07] [url] mediawiki/HistoryBlobStub.php at REL1_37 · wikimedia/mediawiki · GitHub | github.com [10:38:02] RECOVERY - db12 Disk Space on db12 is OK: DISK OK - free space: / 49071 MB (11% inode=98%); [10:44:26] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 48471 MB (10% inode=98%); [10:56:27] $obj should be of type ConcatenatedGzipHistoryBlob, rather than HistoryBlobStub [11:00:03] Is this the only wiki the exception appears on? [11:16:26] JohnLewis: no idea [11:16:45] It's not, I looked via graylog [11:18:35] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.56, 5.63, 5.06 [11:21:26] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.82, 4.99, 4.92 [11:22:27] https://phabricator.wikimedia.org/T39882 is the only upstream report of this from 1.19 [11:22:28] [url] ⚓ T39882 Fatal error, undefined method "HistoryBlobStub::uncompress()" when running update.php | phabricator.wikimedia.org [11:49:30] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.64, 6.50, 5.43 [11:52:21] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.96, 5.21, 5.10 [11:57:55] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.61, 4.90, 5.00 [12:04:18] There's fatal error on FAMEPedia [d02b90c73d7b2fd5b101859b] 2021-10-17 12:03:16: Fatal exception of type "Error" [12:04:42] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.76, 5.66, 5.32 [12:05:00] @Joseph: we know about contributions [12:05:08] JohnLewis: what can we do then [12:10:28] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.40, 4.35, 4.85 [12:13:06] Spookreeeno: I've been looking into it for an hour and I have no idea currently [12:13:22] Ack [12:14:33] It seems to be the same as the task linked above, but that never received a resolution [12:16:56] We might be able to help debug [12:17:23] Probably needs new 9 years later [12:19:33] Probably the best way forward, as I'm really sstruggling to find the problem [12:22:26] PROBLEM - db12 Current Load on db12 is WARNING: WARNING - load average: 7.06, 6.69, 5.30 [12:25:18] RECOVERY - db12 Current Load on db12 is OK: OK - load average: 5.86, 6.16, 5.32 [12:39:07] JohnLewis: https://phabricator.wikimedia.org/T293574 [12:39:08] [url] ⚓ T293574 Call to undefined method HistoryBlobStub::uncompress() | phabricator.wikimedia.org [12:40:46] and now it's just the waiting game... [12:43:09] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 23224 MB (5% inode=98%); [12:48:03] RECOVERY - db11 Disk Space on db11 is OK: DISK OK - free space: / 63394 MB (14% inode=97%); [12:48:44] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 44401 MB (9% inode=98%); [12:50:15] RECOVERY - db13 Disk Space on db13 is OK: DISK OK - free space: / 82838 MB (18% inode=98%); [12:51:30] RECOVERY - db12 Disk Space on db12 is OK: DISK OK - free space: / 69134 MB (15% inode=98%); [13:12:45] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.41, 5.66, 5.03 [13:15:34] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.81, 5.50, 5.06 [13:27:17] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.80, 4.61, 4.95 [13:36:59] PROBLEM - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:37:00] PROBLEM - cp12 Stunnel Http for mw10 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:37:11] Here [13:37:27] PROBLEM - cp12 Puppet on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:37:35] PROBLEM - cp12 HTTPS on cp12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:38:29] paladox, JohnLewis: ^ [13:38:31] oh [13:38:34] looking [13:38:41] PROBLEM - cp12 Stunnel Http for mw12 on cp12 is CRITICAL: connect to address 51.222.25.132 port 5666: Connection refusedconnect to host 51.222.25.132 port 5666: Connection refused [13:38:45] !log reboot cp12 [13:38:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:40:31] RECOVERY - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is OK: OK - NGINX Error Rate is 7% [13:40:32] RECOVERY - cp12 Stunnel Http for mw10 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.326 second response time [13:40:55] RECOVERY - cp12 Puppet on cp12 is OK: OK: Puppet is currently enabled, last run 17 minutes ago with 0 failures [13:40:59] RECOVERY - cp12 HTTPS on cp12 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 3010 bytes in 0.518 second response time [13:41:06] was my firewall change that somehow broke things even though i reverted? [13:41:19] Ok [13:41:56] RECOVERY - cp12 Stunnel Http for mw12 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 4.150 second response time [13:42:59] !log cp12: iptables -t raw -A OUTPUT -p tcp --sport 443 -j NOTRACK [13:43:03] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:43:13] !log cp12: iptables -t raw -A OUTPUT -p tcp --sport 80 -j NOTRACK [13:43:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:43:21] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK [13:43:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:43:34] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK [13:43:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:43:49] !log mw10: iptables -t raw -A OUTPUT -p tcp --sport 443 -j NOTRACK [13:43:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:43:56] !log mw10: iptables -t raw -A OUTPUT -p tcp --sport 80 -j NOTRACK [13:44:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:44:07] !log mw10: iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK [13:44:10] !log mw10: iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK [13:44:14] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:44:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:47:14] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 5.34, 6.30, 5.57 [13:50:09] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.18, 5.92, 5.56 [13:52:57] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 5.73, 6.14, 5.72 [13:53:44] !log cp15: iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK [13:53:48]