[00:03:30] [02puppet] 07Universal-Omega opened pull request 03#2047: prometheus-es-exporter: lower QueryIntervalSecs to 60 - 13https://git.io/JKEmH [00:04:04] [02puppet] 07Universal-Omega synchronize pull request 03#2047: prometheus-es-exporter: lower QueryIntervalSecs to 60 - 13https://git.io/JKEmH [00:04:43] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 26.58, 22.29, 18.54 [00:05:06] [02puppet] 07Universal-Omega edited pull request 03#2047: prometheus-es-exporter: lower QueryIntervalSecs to 60 - 13https://git.io/JKEmH [00:05:27] [02puppet] 07paladox closed pull request 03#2047: prometheus-es-exporter: lower QueryIntervalSecs to 60 - 13https://git.io/JKEmH [00:05:28] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JKEmF [00:05:30] [02miraheze/puppet] 07Universal-Omega 03b541d72 - prometheus-es-exporter: lower QueryIntervalSecs to 60 (#2047) [00:05:55] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.18, 5.40, 4.62 [00:08:53] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.76, 6.59, 5.23 [00:10:13] paladox: do you have any ideas on https://phabricator.miraheze.org/T6979#164939? If not that's OK, I'll figure something out. [00:10:15] [url] ⚓ T6979 Collect Statistics for API Requests (Including Module Type) | phabricator.miraheze.org [00:10:48] nope :/ [00:17:47] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKEY9 [00:17:48] [02miraheze/mw-config] 07Universal-Omega 036ed1e69 - Raise DPL requirement to 100,000 pages [00:19:06] miraheze/mw-config - Universal-Omega the build passed. [00:20:35] [02puppet] 07Universal-Omega opened pull request 03#2048: prometheus-es-exporter: query last 2 minutes rather than 15 - 13https://git.io/JKEYF [00:21:06] [02puppet] 07Universal-Omega synchronize pull request 03#2048: prometheus-es-exporter: query last 2 minutes rather than 15 - 13https://git.io/JKEYF [00:21:54] [02puppet] 07paladox closed pull request 03#2048: prometheus-es-exporter: query last 2 minutes rather than 15 - 13https://git.io/JKEYF [00:21:55] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JKEYp [00:21:57] [02miraheze/puppet] 07Universal-Omega 0323de8f8 - prometheus-es-exporter: query last 2 minutes rather than 15 (#2048) [00:23:43] Thanks again! [00:33:24] !log [@test3] starting deploy of {'config': True} to skip [00:33:25] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [00:33:29] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:33:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:34:30] !log [@mw11] starting deploy of {'config': True} to all [00:34:36] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:34:46] !log [@mw11] finished deploy of {'config': True} to all - SUCCESS in 15s [00:34:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:37:19] There, prometheus-es-exporter stats are more accurate now. [00:42:03] PROBLEM - db11 Disk Space on db11 is WARNING: DISK WARNING - free space: / 48803 MB (10% inode=97%); [00:47:03] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 12.64, 16.87, 22.99 [00:56:33] RECOVERY - db11 Disk Space on db11 is OK: DISK OK - free space: / 63955 MB (14% inode=97%); [00:57:29] !log install iptables-persistent on cp12 [00:57:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:01:46] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.53, 18.78, 20.32 [01:02:28] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.33, 4.70, 5.87 [01:13:54] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.59, 5.64, 5.54 [01:16:49] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.04, 5.17, 5.35 [01:23:35] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 2.63, 3.79, 4.71 [01:24:14] PROBLEM - cp12 Stunnel Http for mw13 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:24:41] PROBLEM - cp12 Stunnel Http for mw11 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:25:08] PROBLEM - cp12 HTTPS on cp12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:25:25] PROBLEM - cp12 Puppet on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:25:30] PROBLEM - cp12 Stunnel Http for mw9 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:25:39] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 51.222.25.132/cpweb, 2607:5300:205:200::1c30/cpweb [01:25:40] PROBLEM - cp12 Stunnel Http for mw10 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:26:06] paladox: ^ [01:26:14] PROBLEM - cp12 Stunnel Http for mon2 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:26:17] PROBLEM - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:26:24] hmm [01:26:27] looking [01:26:33] PROBLEM - cp12 Stunnel Http for mw12 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:26:34] PROBLEM - cp12 PowerDNS Recursor on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:26:45] PROBLEM - ping4 on cp12 is CRITICAL: PING CRITICAL - Packet loss = 50%, RTA = 84.07 ms [01:27:22] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 51.222.25.132/cpweb, 2607:5300:205:200::1c30/cpweb [01:27:49] RECOVERY - cp12 Stunnel Http for mw13 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.332 second response time [01:28:07] RECOVERY - cp12 Stunnel Http for mw11 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.627 second response time [01:28:30] RECOVERY - cp12 HTTPS on cp12 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 3027 bytes in 0.344 second response time [01:28:41] RECOVERY - cp12 Stunnel Http for mw9 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.307 second response time [01:28:50] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [01:28:52] RECOVERY - cp12 Stunnel Http for mw10 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.335 second response time [01:29:15] RECOVERY - cp12 Stunnel Http for mon2 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 33934 bytes in 0.369 second response time [01:29:18] RECOVERY - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is OK: OK - NGINX Error Rate is 5% [01:29:28] RECOVERY - cp12 Stunnel Http for mw12 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.312 second response time [01:29:30] RECOVERY - cp12 PowerDNS Recursor on cp12 is OK: DNS OK: 0.108 seconds response time. miraheze.org returns 167.114.2.161,2607:5300:201:3100::1d3,2607:5300:205:200::1c30,51.222.25.132 [01:29:48] RECOVERY - ping4 on cp12 is OK: PING OK - Packet loss = 0%, RTA = 83.22 ms [01:30:11] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [01:31:24] RECOVERY - cp12 Puppet on cp12 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:36:19] [02puppet] 07Universal-Omega opened pull request 03#2049: prometheus-es-exporter: reduce QueryTimeoutSecs to 10 - 13https://git.io/JKEWs [01:37:51] [02puppet] 07paladox closed pull request 03#2049: prometheus-es-exporter: reduce QueryTimeoutSecs to 10 - 13https://git.io/JKEWs [01:37:53] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKEWV [01:37:54] [02miraheze/puppet] 07Universal-Omega 038232a81 - prometheus-es-exporter: reduce QueryTimeoutSecs to 10 (#2049) [01:46:34] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.88, 5.75, 5.20 [01:48:51] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK [01:48:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:49:08] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK [01:49:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:49:28] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.29, 5.69, 5.25 [01:51:22] !log cp12: iptables -t raw -A OUTPUT -p tcp --dport 80 -j NOTRACK [01:51:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:51:34] !log cp12: iptables -t raw -A OUTPUT -p tcp --dport 443 -j NOTRACK [01:51:36] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:52:24] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 2.70, 4.31, 4.79 [01:55:52] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 51.222.25.132/cpweb, 2607:5300:205:200::1c30/cpweb [01:55:54] PROBLEM - cp12 Stunnel Http for mw13 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:56:35] hmm [01:56:35] PROBLEM - cp12 Stunnel Http for mw9 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:56:57] PROBLEM - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is CRITICAL: CRITICAL - NGINX Error Rate is 80% [01:57:08] PROBLEM - cp12 Stunnel Http for mw10 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:57:23] PROBLEM - cp12 Stunnel Http for mon2 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:57:41] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 51.222.25.132/cpweb, 2607:5300:205:200::1c30/cpweb [01:57:42] PROBLEM - cp12 Stunnel Http for mw12 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:58:33] PROBLEM - cp12 Stunnel Http for mw8 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:58:35] PROBLEM - cp12 Varnish Backends on cp12 is CRITICAL: 6 backends are down. mw8 mw9 mw10 mw11 mw12 mw13 [01:58:57] PROBLEM - cp12 Stunnel Http for mw11 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:59:02] RECOVERY - cp12 Stunnel Http for mw13 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.335 second response time [01:59:37] RECOVERY - cp12 Stunnel Http for mw9 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15874 bytes in 0.324 second response time [02:00:01] RECOVERY - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is OK: OK - NGINX Error Rate is 3% [02:00:16] RECOVERY - cp12 Stunnel Http for mw10 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.318 second response time [02:00:26] RECOVERY - cp12 Stunnel Http for mon2 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 33934 bytes in 0.331 second response time [02:00:38] RECOVERY - cp12 Stunnel Http for mw12 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.432 second response time [02:00:39] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [02:01:31] RECOVERY - cp12 Varnish Backends on cp12 is OK: All 9 backends are healthy [02:06:53] PROBLEM - cp12 Stunnel Http for mw13 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:07:29] PROBLEM - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is CRITICAL: CRITICAL - NGINX Error Rate is 83% [02:07:31] PROBLEM - cp12 Stunnel Http for mw9 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:08:09] PROBLEM - cp12 Stunnel Http for mw10 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:08:15] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 51.222.25.132/cpweb, 2607:5300:205:200::1c30/cpweb [02:08:30] PROBLEM - cp12 Stunnel Http for mon2 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:08:44] PROBLEM - cp12 Stunnel Http for mw12 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:09:16] PROBLEM - cp12 Varnish Backends on cp12 is CRITICAL: 6 backends are down. mw8 mw9 mw10 mw11 mw12 mw13 [02:13:42] !log rebooted cp12 [02:13:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:14:37] RECOVERY - cp12 Stunnel Http for mw10 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.647 second response time [02:14:43] RECOVERY - cp12 Stunnel Http for mw8 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15868 bytes in 0.305 second response time [02:14:47] RECOVERY - cp12 Stunnel Http for mon2 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 33972 bytes in 0.361 second response time [02:14:54] RECOVERY - cp12 Stunnel Http for mw12 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.318 second response time [02:14:55] RECOVERY - cp12 Stunnel Http for mw11 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.342 second response time [02:15:24] RECOVERY - cp12 Varnish Backends on cp12 is OK: All 9 backends are healthy [02:16:28] RECOVERY - cp12 Stunnel Http for mw13 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.328 second response time [02:16:57] RECOVERY - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is OK: OK - NGINX Error Rate is 3% [02:17:00] RECOVERY - cp12 Stunnel Http for mw9 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.336 second response time [02:17:26] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [02:17:29] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [02:21:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.75, 8.13, 6.16 [02:21:55] !log cp12: reverted previous change and only applied 'iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK ' and 'iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK' [02:21:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:47:28] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.90, 4.83, 5.72 [03:00:11] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.92, 4.37, 5.01 [03:01:51] PROBLEM - cp12 Stunnel Http for mon2 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [03:01:53] PROBLEM - incubator.nocyclo.tk - LetsEncrypt on sslhost is CRITICAL: connect to address incubator.nocyclo.tk and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [03:01:53] PROBLEM - cp12 Stunnel Http for mw11 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [03:05:07] RECOVERY - cp12 Stunnel Http for mon2 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 33934 bytes in 0.335 second response time [03:06:09] RECOVERY - cp12 Stunnel Http for mw11 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.314 second response time [03:06:47] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK [03:06:54] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK [03:07:01] !log cp12: iptables -t raw -A OUTPUT -p tcp --sport 80 -j NOTRACK [03:07:04] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:07:09] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:07:14] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:07:15] !log cp12: iptables -t raw -A OUTPUT -p tcp --sport 443 -j NOTRACK [03:07:18] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:07:35] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.25, 6.28, 5.54 [03:09:46] RECOVERY - incubator.nocyclo.tk - LetsEncrypt on sslhost is OK: OK - Certificate 'incubator.nocyclo.tk' will expire on Wed 17 Nov 2021 04:06:46 GMT +0000. [03:25:51] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.11, 3.75, 3.41 [03:28:44] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.57, 3.22, 3.26 [03:38:09] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 18.66, 20.92, 19.39 [03:41:03] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 13.56, 17.86, 18.49 [03:48:49] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.89, 5.17, 5.77 [03:54:00] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.82, 3.57, 3.25 [03:54:41] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.06, 5.38, 5.63 [03:56:54] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 3.11, 3.40, 3.24 [04:06:18] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.18, 5.56, 5.79 [04:20:45] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.10, 5.89, 5.62 [04:23:41] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.67, 5.83, 5.64 [04:24:54] [02miraheze/MirahezeMagic] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-2 [+0/-0/±1] 13https://git.io/JKzIh [04:24:56] [02miraheze/MirahezeMagic] 07Universal-Omega 03b78bb86 - en.json: fix capitalisation and use sre-mediawiki email [04:24:57] [02MirahezeMagic] 07Universal-Omega created branch 03Universal-Omega-patch-2 - 13https://git.io/fQRGX [04:24:59] [02MirahezeMagic] 07Universal-Omega opened pull request 03#295: en.json: fix capitalisation and use sre-mediawiki email - 13https://git.io/JKzLe [04:25:54] miraheze/MirahezeMagic - Universal-Omega the build passed. [04:26:10] [02miraheze/MirahezeMagic] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-2 [+0/-0/±1] 13https://git.io/JKzLX [04:26:11] [02miraheze/MirahezeMagic] 07Universal-Omega 03e3423b8 - Update en.json [04:26:13] [02MirahezeMagic] 07Universal-Omega synchronize pull request 03#295: en.json: fix capitalisation and use sre-mediawiki email - 13https://git.io/JKzLe [04:26:35] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.34, 6.42, 5.93 [04:27:08] miraheze/MirahezeMagic - Universal-Omega the build passed. [04:29:24] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.64, 5.38, 5.63 [04:30:11] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-2 [+0/-0/±1] 13https://git.io/JKzq1 [04:30:13] [02miraheze/mw-config] 07Universal-Omega 03bbc9f63 - Update extension-list [04:30:14] [02mw-config] 07Universal-Omega created branch 03Universal-Omega-patch-2 - 13https://git.io/vbvb3 [04:30:22] [02mw-config] 07Universal-Omega opened pull request 03#4157: Remove AdvancedSearch - 13https://git.io/JKzq5 [04:31:03] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-2 [+0/-0/±1] 13https://git.io/JKzmL [04:31:04] [02miraheze/mw-config] 07Universal-Omega 03678f423 - Update ManageWikiExtensions.php [04:31:06] [02mw-config] 07Universal-Omega synchronize pull request 03#4157: Remove AdvancedSearch - 13https://git.io/JKzq5 [04:31:23] miraheze/mw-config - Universal-Omega the build passed. [04:32:11] miraheze/mw-config - Universal-Omega the build passed. [04:32:56] [02mw-config] 07Universal-Omega edited pull request 03#4157: T7740: remove AdvancedSearch - 13https://git.io/JKzq5 [04:35:21] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.02, 5.91, 5.70 [04:38:28] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.46, 5.81, 5.71 [04:38:31] !log [reception@mw11] starting deploy of {'config': True} to all [04:38:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:38:57] !log [reception@mw11] finished deploy of {'config': True} to all - SUCCESS in 27s [04:39:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:41:32] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 6.25, 8.11, 6.36 [04:42:08] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 20.10, 21.21, 18.92 [04:44:20] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 4.49, 6.61, 6.08 [04:44:58] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 18.36, 19.52, 18.62 [04:47:04] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.68, 4.28, 4.98 [04:51:19] !log [reception@mw11] starting deploy of {'config': True, 'world': True, 'l10n': True, 'gitinfo': True} to all [04:51:33] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [05:04:17] [02puppet] 07Universal-Omega opened pull request 03#2050: Remove test4 - 13https://git.io/JKz0a [05:05:07] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 7.11, 6.88, 5.87 [05:08:03] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 5.32, 6.54, 5.96 [05:08:30] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.06, 5.21, 5.00 [05:11:20] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.55, 5.04, 4.97 [05:11:51] While accessing my contributions page on FP [05:11:51] https://cdn.discordapp.com/attachments/808001911868489748/899162772095524945/IMG_20211017_104100.jpg [05:14:04] PROBLEM - db11 Disk Space on db11 is WARNING: DISK WARNING - free space: / 48516 MB (10% inode=97%); [05:17:07] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.73, 21.64, 17.96 [05:17:57] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.65, 5.93, 5.51 [05:19:28] !log [reception@mw11] finished deploy of {'config': True, 'world': True, 'l10n': True, 'gitinfo': True} to all - SUCCESS in 1688s [05:19:32] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [05:20:06] PROBLEM - cloud5 Current Load on cloud5 is CRITICAL: CRITICAL - load average: 30.00, 27.71, 22.17 [05:20:34] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 13.61, 18.61, 17.57 [05:21:24] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.68, 6.24, 5.72 [05:25:57] PROBLEM - db13 Disk Space on db13 is WARNING: DISK WARNING - free space: / 48431 MB (10% inode=98%); [05:26:00] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 15.96, 21.05, 21.00 [05:27:10] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.17, 5.79, 5.72 [05:28:47] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 17.22, 18.94, 20.15 [05:30:03] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.17, 6.33, 5.94 [05:33:00] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.65, 5.77, 5.81 [05:54:18] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.43, 3.42, 3.23 [05:57:07] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.18, 3.54, 3.29 [05:58:51] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.65, 4.39, 5.02 [05:59:54] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.83, 3.34, 3.26 [06:06:42] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.31, 6.37, 3.99 [06:11:36] RECOVERY - db13 Disk Space on db13 is OK: DISK OK - free space: / 55584 MB (12% inode=98%); [06:12:26] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.97, 7.43, 5.21 [06:15:10] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 11.45, 8.71, 6.02 [06:17:55] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.10, 7.81, 6.10 [06:20:45] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.34, 8.33, 6.61 [06:23:31] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 5.54, 7.32, 6.52 [06:26:21] PROBLEM - db13 Disk Space on db13 is WARNING: DISK WARNING - free space: / 47941 MB (10% inode=98%); [06:31:52] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.77, 6.48, 6.47 [06:45:59] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 5.81, 6.89, 6.73 [06:48:43] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 9.10, 7.74, 7.07 [06:51:27] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 5.85, 6.90, 6.87 [06:54:15] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 6.08, 6.61, 6.76 [06:57:58] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.88, 5.85, 4.97 [07:00:41] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.21, 7.01, 6.83 [07:00:46] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.71, 5.15, 4.84 [07:03:28] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.89, 6.62, 6.72 [07:03:36] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.78, 5.75, 5.12 [07:06:28] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.01, 5.73, 5.25 [07:12:09] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.74, 6.09, 5.45 [07:12:50] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.26, 6.98, 6.82 [07:15:38] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.97, 6.43, 6.64 [07:17:43] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.37, 5.37, 5.42 [07:22:10] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 9.61, 8.20, 7.37 [07:25:01] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.74, 7.91, 7.40 [07:26:10] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.74, 4.63, 5.08 [07:33:14] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 4.91, 5.99, 6.68 [07:41:27] dmehus: do you actually know what Citoid is? [08:06:37] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.00, 7.02, 6.55 [08:09:27] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.89, 6.62, 6.49 [08:10:50] alerting : [FIRING:1] (MediaWiki Exception Rate mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:10:51] [url] Grafana | grafana.miraheze.org [08:11:16] me ^ [08:11:22] Yey! [08:14:32] Spookreeeno: can we have the bot ignore links from icinga-miraheze? [08:14:40] If that's possible. [08:14:54] CosmicAlpha: I can ignore grafana [08:15:04] That would work also. [08:15:47] .urlexclude grafana.miraheze.org [08:15:48] Spookreeeno: This URL is now excluded from auto title. [08:15:50] alerting : [FIRING:1] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:16:00] Oh nice! [08:16:03] Works [08:17:12] Spookreeeno: running one more alert test then I'll finalise the alert and go to sleep. Will do panel tomorrow. [08:17:19] CosmicAlpha: nice [08:17:51] RECOVERY - db13 Disk Space on db13 is OK: DISK OK - free space: / 49706 MB (11% inode=98%); [08:18:36] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.89, 7.03, 6.71 [08:21:18] Spookreeeno: what I want to do is hook icinga itself into these exception rate increases, as more information can be included in Icinga alerts, such as current average, etc... but that's more difficult and can wait for now. [08:24:16] PROBLEM - db13 Disk Space on db13 is WARNING: DISK WARNING - free space: / 47370 MB (10% inode=98%); [08:24:51] Hmm...why won't it alert again? [08:29:31] CosmicAlpha: it says pending [08:29:39] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 4.65, 6.23, 6.63 [08:30:50] alerting : [FIRING:1] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:30:51] Spookreeeno: yeah I just reset it, to try again. [08:31:04] There we go, but title didn't work properly. [08:31:05] Both went off [08:31:18] CosmicAlpha: it went off for one resolved & one alerting same time [08:33:17] Spookreeeno: I'll enable !sre pings now. Feel free to change alert and remove it, if it messes up while I'm asleep. [08:34:03] Oh whoops I just pinged it. My bad. [08:35:50] alerting : [FIRING:2] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:37:55] CosmicAlpha: alert name has no ! in front [08:38:24] Oh whoops [08:40:50] alerting : [FIRING:1] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:42:04] Looks sensible now [08:42:14] Spookreeeno: it's finalised now. Should alert above 0.9 [08:42:20] Cool! [08:42:59] I'm off to sleep now, hoping it doesn't spam sre pings throughout the night... [08:43:38] I'm around [08:45:47] Spookreeeno: Yeah that's good. If it does just remove the ping, and I'll look tomorrow if you want. Also, I do the panel for it tomorrow also. [08:45:50] ok : [RESOLVED] (sre MediaWiki Exception Rate mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [08:46:03] Yep [08:53:04] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 6.93, 6.78, 6.69 [08:56:12] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKaIq [08:56:13] [02miraheze/mw-config] 07Universal-Omega 034bd0857 - Sitenotice: correct date format [08:56:15] PROBLEM - steamdecklinux.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'steamdecklinux.wiki' expires in 7 day(s) (Mon 25 Oct 2021 08:49:47 GMT +0000). [08:57:11] miraheze/mw-config - Universal-Omega the build passed. [09:01:28] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.90, 6.47, 6.68 [09:02:50] !log [@test3] starting deploy of {'config': True} to skip [09:02:51] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [09:02:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:02:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:05:07] !log [@mw11] starting deploy of {'config': True} to all [09:05:11] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:05:40] !log [@mw11] finished deploy of {'config': True} to all - SUCCESS in 32s [09:05:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:30:30] I see this on many project pages [09:32:02] Could you cite some? [09:32:26] https://en.famepedia.org/w/index.php?title=FAMEPedia:Criteria_for_speedy_deletion [09:32:28] [url] FAMEPedia:Criteria for speedy deletion - FAMEPedia | en.famepedia.org [09:32:57] https://en.famepedia.org/wiki/Special:Contributions/Magogre [09:32:59] [url] Internal error - FAMEPedia | en.famepedia.org [09:36:23] yeah, Special:Contributions is borked [09:37:06] I guess we better get a task [09:37:56] I'll create one [09:38:57] https://phabricator.miraheze.org/T8184 [09:38:58] [url] ⚓ T8184 Special:Contributions on famepedia is borked | phabricator.miraheze.org [09:45:06] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 48934 MB (10% inode=98%); [09:49:17] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.73, 5.72, 4.78 [09:51:23] majavah: any idea on https://phabricator.miraheze.org/T8184#164981 [09:51:24] [url] ⚓ T8184 Special:Contributions on famepedia is borked | phabricator.miraheze.org [09:52:03] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 2.70, 4.43, 4.43 [09:53:01] https://github.com/wikimedia/mediawiki/blob/REL1_37/includes/historyblob/HistoryBlobStub.php#L129 fails [09:53:07] [url] mediawiki/HistoryBlobStub.php at REL1_37 · wikimedia/mediawiki · GitHub | github.com [10:38:02] RECOVERY - db12 Disk Space on db12 is OK: DISK OK - free space: / 49071 MB (11% inode=98%); [10:44:26] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 48471 MB (10% inode=98%); [10:56:27] $obj should be of type ConcatenatedGzipHistoryBlob, rather than HistoryBlobStub [11:00:03] Is this the only wiki the exception appears on? [11:16:26] JohnLewis: no idea [11:16:45] It's not, I looked via graylog [11:18:35] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.56, 5.63, 5.06 [11:21:26] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.82, 4.99, 4.92 [11:22:27] https://phabricator.wikimedia.org/T39882 is the only upstream report of this from 1.19 [11:22:28] [url] ⚓ T39882 Fatal error, undefined method "HistoryBlobStub::uncompress()" when running update.php | phabricator.wikimedia.org [11:49:30] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.64, 6.50, 5.43 [11:52:21] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.96, 5.21, 5.10 [11:57:55] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.61, 4.90, 5.00 [12:04:18] There's fatal error on FAMEPedia [d02b90c73d7b2fd5b101859b] 2021-10-17 12:03:16: Fatal exception of type "Error" [12:04:42] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.76, 5.66, 5.32 [12:05:00] @Joseph: we know about contributions [12:05:08] JohnLewis: what can we do then [12:10:28] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.40, 4.35, 4.85 [12:13:06] Spookreeeno: I've been looking into it for an hour and I have no idea currently [12:13:22] Ack [12:14:33] It seems to be the same as the task linked above, but that never received a resolution [12:16:56] We might be able to help debug [12:17:23] Probably needs new 9 years later [12:19:33] Probably the best way forward, as I'm really sstruggling to find the problem [12:22:26] PROBLEM - db12 Current Load on db12 is WARNING: WARNING - load average: 7.06, 6.69, 5.30 [12:25:18] RECOVERY - db12 Current Load on db12 is OK: OK - load average: 5.86, 6.16, 5.32 [12:39:07] JohnLewis: https://phabricator.wikimedia.org/T293574 [12:39:08] [url] ⚓ T293574 Call to undefined method HistoryBlobStub::uncompress() | phabricator.wikimedia.org [12:40:46] and now it's just the waiting game... [12:43:09] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 23224 MB (5% inode=98%); [12:48:03] RECOVERY - db11 Disk Space on db11 is OK: DISK OK - free space: / 63394 MB (14% inode=97%); [12:48:44] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 44401 MB (9% inode=98%); [12:50:15] RECOVERY - db13 Disk Space on db13 is OK: DISK OK - free space: / 82838 MB (18% inode=98%); [12:51:30] RECOVERY - db12 Disk Space on db12 is OK: DISK OK - free space: / 69134 MB (15% inode=98%); [13:12:45] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.41, 5.66, 5.03 [13:15:34] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.81, 5.50, 5.06 [13:27:17] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.80, 4.61, 4.95 [13:36:59] PROBLEM - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:37:00] PROBLEM - cp12 Stunnel Http for mw10 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:37:11] Here [13:37:27] PROBLEM - cp12 Puppet on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:37:35] PROBLEM - cp12 HTTPS on cp12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:38:29] paladox, JohnLewis: ^ [13:38:31] oh [13:38:34] looking [13:38:41] PROBLEM - cp12 Stunnel Http for mw12 on cp12 is CRITICAL: connect to address 51.222.25.132 port 5666: Connection refusedconnect to host 51.222.25.132 port 5666: Connection refused [13:38:45] !log reboot cp12 [13:38:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:40:31] RECOVERY - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is OK: OK - NGINX Error Rate is 7% [13:40:32] RECOVERY - cp12 Stunnel Http for mw10 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.326 second response time [13:40:55] RECOVERY - cp12 Puppet on cp12 is OK: OK: Puppet is currently enabled, last run 17 minutes ago with 0 failures [13:40:59] RECOVERY - cp12 HTTPS on cp12 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 3010 bytes in 0.518 second response time [13:41:06] was my firewall change that somehow broke things even though i reverted? [13:41:19] Ok [13:41:56] RECOVERY - cp12 Stunnel Http for mw12 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 4.150 second response time [13:42:59] !log cp12: iptables -t raw -A OUTPUT -p tcp --sport 443 -j NOTRACK [13:43:03] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:43:13] !log cp12: iptables -t raw -A OUTPUT -p tcp --sport 80 -j NOTRACK [13:43:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:43:21] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK [13:43:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:43:34] !log cp12: iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK [13:43:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:43:49] !log mw10: iptables -t raw -A OUTPUT -p tcp --sport 443 -j NOTRACK [13:43:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:43:56] !log mw10: iptables -t raw -A OUTPUT -p tcp --sport 80 -j NOTRACK [13:44:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:44:07] !log mw10: iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK [13:44:10] !log mw10: iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK [13:44:14] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:44:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:47:14] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 5.34, 6.30, 5.57 [13:50:09] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.18, 5.92, 5.56 [13:52:57] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 5.73, 6.14, 5.72 [13:53:44] !log cp15: iptables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK [13:53:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:53:52] !log cp15: iptables -t raw -A PREROUTING -p tcp --dport 443 -j NOTRACK [13:53:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:54:00] !log cp15: iptables -t raw -A OUTPUT -p tcp --sport 80 -j NOTRACK [13:54:03] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:54:12] !log cp15: iptables -t raw -A OUTPUT -p tcp --sport 443 -j NOTRACK [13:54:16] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:58:46] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.72, 5.68, 5.67 [14:04:38] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.67, 6.82, 6.05 [14:09:47] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.24, 20.86, 18.64 [14:15:33] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 15.88, 19.55, 18.93 [14:21:12] paladox: can https://github.com/miraheze/puppet/pull/2027/files be merged? [14:21:13] [url] Add landing and ErrorPages to staging by Universal-Omega · Pull Request #2027 · miraheze/puppet · GitHub | github.com [14:21:22] if able, please ping before merge [14:21:35] [02puppet] 07paladox closed pull request 03#2027: Add landing and ErrorPages to staging - 13https://git.io/JKtV6 [14:21:37] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JKrjN [14:21:38] [02miraheze/puppet] 07Universal-Omega 03c93ee78 - Add landing and ErrorPages to staging (#2027) [14:21:51] Spookreeeno: ohh [14:21:53] merged [14:22:43] paladox: it's ok [14:24:33] !log disable puppet on mw11 to deploy c93ee78 - Add landing and ErrorPages to staging (#2027) [14:24:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:25:30] Spookreeeno: as an fyi, Proton has some usage from earlier in the week [14:25:58] JohnLewis: what wiki? [14:26:05] and how much [14:27:04] !log applied same iptables rules on cp13 and cp14 [14:27:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:27:20] PROBLEM - mw11 Puppet on mw11 is WARNING: WARNING: Puppet is currently disabled, message: rf1 - deploying c93ee78, last run 21 minutes ago with 0 failures [14:27:39] nenawiki, nonciclopediawiki [14:27:54] JohnLewis: do you know how many? [14:28:17] because you can still print without pontoon [14:28:21] proton [14:29:27] 7 on this one day I'm looking at [14:30:39] hmm [14:31:01] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.92, 5.32, 5.95 [14:31:26] no one so far has cared about collection [14:31:33] !log [@test3] starting deploy of {'landing': True} to skip [14:31:34] !log [@test3] finished deploy of {'landing': True} to skip - SUCCESS in 0s [14:31:35] !log [@test3] starting deploy of {'errorpages': True} to skip [14:31:36] !log [@test3] finished deploy of {'errorpages': True} to skip - SUCCESS in 0s [14:31:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:31:40] 14 usages on climatechange wiki [14:31:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:31:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:31:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:34:01] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.49, 5.68, 5.97 [14:35:30] Spookreeeno: https://meta.miraheze.org/w/index.php?title=Community_noticeboard&diff=207003&oldid=206987&diffmode=source also for you [14:35:33] [url] Difference between revisions of "Community noticeboard" - Miraheze Meta | meta.miraheze.org [14:36:42] JohnLewis: thanks [14:37:02] 11 usages a day is still far too low [14:37:05] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.51, 5.78, 5.97 [14:37:52] !log [rhinos@test3] starting deploy of {'landing': True, 'errorpages': True} to skip [14:37:53] !log [rhinos@test3] finished deploy of {'landing': True, 'errorpages': True} to skip - SUCCESS in 0s [14:37:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:37:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:39:27] RECOVERY - mw11 Puppet on mw11 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:40:07] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.49, 6.06, 6.03 [14:40:20] !log [@mw11] starting deploy of {'landing': True} to all [14:40:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:41:07] !log [@mw11] finished deploy of {'landing': True} to all - SUCCESS in 46s [14:41:08] !log [@mw11] starting deploy of {'errorpages': True} to all [14:41:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:41:15] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:41:18] !log [@mw11] finished deploy of {'errorpages': True} to all - SUCCESS in 9s [14:41:22] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:41:37] !log [rhinos@mw11] enabled puppet [14:41:42] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:46:10] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.63, 5.86, 5.91 [14:50:08] !log rhinos@test3:~$ sudo -u www-data git -C /srv/mediawiki-staging/w/extensions/EmbedVideo apply /home/rhinos/83af7367924e7b043815cbc76c3885e6039810d8.patch [14:50:13] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:50:49] !log [rhinos@test3] starting deploy of {'files': '/srv/mediawiki/w/extensions/EmbedVideo/includes/Media/FFProbe/FFprobe.php'} to skip [14:50:50] !log [rhinos@test3] finished deploy of {'files': '/srv/mediawiki/w/extensions/EmbedVideo/includes/Media/FFProbe/FFprobe.php'} to skip - FAIL: [768] in 0s [14:50:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:50:56] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:51:14] !log [rhinos@test3] starting deploy of {'files': 'w/extensions/EmbedVideo/includes/Media/FFProbe/FFprobe.php'} to skip [14:51:15] !log [rhinos@test3] finished deploy of {'files': 'w/extensions/EmbedVideo/includes/Media/FFProbe/FFprobe.php'} to skip - FAIL: [5888] in 0s [14:51:23] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:51:27] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:52:47] !log [rhinos@test3] starting deploy of {'world': True} to skip [14:52:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:54:53] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.27, 5.58, 5.67 [14:57:09] !log [rhinos@test3] finished deploy of {'world': True} to skip - SUCCESS in 261s [14:57:15] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:57:52] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.49, 5.50, 5.62 [14:58:15] !log test3: Upgrading wikimedia/zest-css (2.0.1 => 2.0.2): Extracting archive [14:58:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:59:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.34, 3.56, 3.14 [15:00:16] !log (all mw11) [15:00:18] !log Upgrading guzzlehttp/promises (1.4.1 => 1.5.0) [15:00:26] !log Upgrading guzzlehttp/psr7 (1.8.2 => 1.8.3) [15:00:39] !log Upgrading phpdocumentor/type-resolver (1.5.0 => 1.5.1) [15:00:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:00:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:00:49] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 9.08, 6.34, 5.87 [15:00:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:00:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:01:15] !log [rhinos@mw11] starting deploy of {'world': True} to all [15:01:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:03:57] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.73, 5.46, 5.60 [15:05:10] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 23.54, 20.63, 18.11 [15:06:21] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.20, 3.04, 3.09 [15:07:10] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.15, 6.31, 5.95 [15:08:20] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 17.89, 19.78, 18.31 [15:08:24] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 5.96, 6.94, 6.05 [15:08:46] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.51, 21.14, 18.72 [15:10:21] !log [rhinos@mw11] finished deploy of {'world': True} to all - SUCCESS in 545s [15:10:42] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:11:04] SRE, ```Error 503 Backend fetch failed, forwarded for , 127.0.0.1 [15:11:04] (Varnish XID 831881475) via cp15.miraheze.org at Sun, 17 Oct 2021 15:10:33 GMT.``` [15:11:12] on `loginwiki` [15:11:58] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.88, 22.71, 19.74 [15:12:14] fine here dmehus [15:12:28] can you pm redacted [15:12:39] Spookreeeno, sure [15:12:52] In case you haven't read Doug, thanks for handling that issue I was having with someone [15:13:43] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.44, 3.72, 3.40 [15:14:37] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 4.03, 5.42, 5.72 [15:15:11] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.74, 22.45, 20.20 [15:15:45] np [15:18:41] That user in question has been driving me up the wall recently with stupid and childish usernames, and you could tell in some of my block summaries I gave to them that I was annoyed as hell. [15:21:26] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 25.16, 25.88, 22.34 [15:26:24] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 1.65, 3.08, 3.33 [15:30:43] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.02, 23.23, 22.65 [15:33:48] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKo6f [15:33:48] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.72, 3.93, 3.73 [15:33:49] [02miraheze/puppet] 07Universal-Omega 031fa9d4d - Remove test4 (#2050) [15:33:51] [02puppet] 07paladox closed pull request 03#2050: Remove test4 - 13https://git.io/JKz0a [15:36:59] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 27.51, 25.60, 23.70 [15:38:06] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 4.94, 6.93, 5.62 [15:38:17] Spookreeeno, can you handle [[phab:T8182]] for me please? [15:38:30] One ping is enough [15:38:46] I only pinged you once? [15:39:54] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.52, 2.94, 3.35 [15:41:05] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 3.40, 5.61, 5.36 [15:41:35] -cvt [15:41:41] All my nicks ping me dmehus [15:42:07] The !_sre ping is in use now too for actual stuff [15:42:51] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 15.37, 20.60, 22.31 [15:49:11] ah [15:54:48] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 15.16, 16.99, 19.53 [15:58:26] 503 backend fetch again trying to MultiLock [15:59:00] oh wait, just lock, not multilock [16:01:46] Hmm [16:10:04] SRE, did we make any changes to the permissions of api.php? [16:10:20] ``` at process (load.php?lang=en&modules=jquery&skin=vector&version=qtt5t:50) [16:10:20] /w/api.php:1 Failed to load resource: the server responded with a status of 503 () [16:10:20] load.php?lang=en&modules=jquery&skin=vector&version=qtt5t:52 jQuery.Deferred exception: Cannot read properties of undefined (reading 'error') TypeError: Cannot read properties of undefined (reading 'error')``` [16:10:20] Why is it throwing a 503 forbidden error there? [16:11:25] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 5.92, 7.64, 6.41 [16:12:14] 503 ain't forbidden [16:12:19] That's 403 [16:12:27] oh, right [16:13:05] There's a TypeError in what ever you're trying to load [16:14:08] I don't think Voidwalker and CosmicAlpha have done any recent changes to their MassGlobalBlock.js and BackendInformation.js scripts, though [16:14:21] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 5.36, 6.69, 6.26 [16:14:47] I'll try again. Maybe it's the overloaded and hadn't couldn't properly deliver the requested JS information to me [16:15:22] ^ seems to have been the case, it seems to be working now [16:16:35] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.57, 22.54, 20.37 [16:19:42] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 27.24, 24.09, 21.31 [16:21:20] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 8.53, 7.05, 5.73 [16:22:49] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.99, 23.35, 21.54 [16:24:25] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 3.96, 5.72, 5.46 [16:24:43] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 9.04, 7.57, 6.75 [16:24:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.50, 3.59, 3.22 [16:27:42] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 4.21, 6.10, 6.33 [16:27:58] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.87, 3.13, 3.10 [16:28:56] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 25.87, 23.04, 21.69 [16:30:42] I have seen 1, 503 error when using an api, but after a reload and resubmission it returned fine [16:31:42] during the this day. maybe just a coincidence. [16:31:47] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 16.93, 20.97, 21.20 [16:34:48] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 25.18, 21.51, 21.24 [16:37:52] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.42, 22.31, 21.59 [16:39:49] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 4.93, 7.15, 6.41 [16:42:51] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 4.29, 5.59, 5.91 [16:43:51] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 27.30, 21.62, 21.29 [16:44:17] [02puppet] 07Universal-Omega opened pull request 03#2051: Add `--no-log` to automatic deploys of landing and ErrorPages - 13https://git.io/JKKlA [16:44:52] [02puppet] 07Universal-Omega synchronize pull request 03#2051: Add `--no-log` to automatic deploys of landing and ErrorPages - 13https://git.io/JKKlA [16:46:48] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 18.65, 20.69, 21.01 [16:49:46] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.48, 19.30, 20.37 [17:10:02] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.99, 3.60, 3.37 [17:12:54] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.33, 3.03, 3.19 [17:16:47] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.37, 5.42, 5.89 [17:18:49] [02puppet] 07paladox closed pull request 03#2051: Add `--no-log` to automatic deploys of landing and ErrorPages - 13https://git.io/JKKlA [17:18:50] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKKH8 [17:18:52] [02miraheze/puppet] 07Universal-Omega 03128ec30 - Add `--no-log` to automatic deploys of landing and ErrorPages (#2051) [17:37:38] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 11.63, 7.83, 6.54 [17:37:54] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 34.50, 26.18, 21.02 [17:40:10] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 9.63, 8.68, 6.40 [17:41:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 19.88, 23.48, 20.91 [17:43:08] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.69, 6.78, 6.04 [17:46:58] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 16.81, 19.17, 19.75 [17:58:46] CosmicAlpha: I paged [17:58:51] Did you get it? [17:59:01] Yep [17:59:33] CosmicAlpha: yey! [18:00:35] CosmicAlpha: can you test from grafana side? [18:06:30] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 10.20, 7.77, 5.94 [18:06:35] CosmicAlpha: triggered from grafana [18:09:34] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 6.71, 6.77, 5.86 [18:10:12] .in 90seconds page [18:10:13] Spookreeeno: Okay, will remind at 2021-10-17 - 19:11:42BST [18:11:43] Spookreeeno: page [18:12:43] CosmicAlpha: why won't it alert [18:13:09] Grafana is being stupid [18:13:56] Spookreeeno: not above 0.01 [18:14:25] Currently 0.0 [18:14:38] CosmicAlpha: oh [18:14:46] Can you set it to 0 to run [18:14:56] Now it is above but has to stay like that for 2 minutes to alert. [18:16:07] CosmicAlpha: can you set it to 0.0 while we test? [18:16:08] Spookreeeno: I'd just set it to alert below 10 for testing. But I can do that now if you want? [18:16:15] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.70, 5.62, 5.99 [18:17:35] alerting : [FIRING:1] (!sre MediaWiki Exception Rate Y mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [18:18:27] CosmicAlpha: that didn't work? [18:18:45] Didn't seem to. [18:19:08] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.45, 6.25, 6.12 [18:20:34] CosmicAlpha: can you check I pasted the email in right [18:25:05] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.91, 5.81, 6.00 [18:27:35] ok : [RESOLVED] (!sre MediaWiki Exception Rate Y mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [18:27:53] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 5.34, 5.90, 6.03 [18:33:48] alerting : [FIRING:1] (!sre MediaWiki Exception Rate Y mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [18:38:48] ok : [RESOLVED] (!sre MediaWiki Exception Rate Y mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [18:39:25] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.26, 4.83, 5.66 [18:41:35] alerting : [FIRING:1] (!sre MediaWiki Exception Rate Y mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [18:43:21] Ugh fixing those from spamming. [18:46:35] ok : [RESOLVED] (!sre MediaWiki Exception Rate Y mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [18:47:53] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.87, 5.50, 5.58 [18:50:43] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.53, 5.66, 5.63 [18:52:17] CosmicAlpha: paged [18:56:21] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.23, 5.64, 5.56 [18:57:35] alerting : [FIRING:1] (sre MediaWiki Exception Rate Y mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [19:01:31] ok : [RESOLVED] (sre MediaWiki Exception Rate Y mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [19:02:32] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.64, 19.44, 17.84 [19:05:39] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 31.27, 24.48, 20.10 [19:07:00] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [19:07:05] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 8.72, 8.60, 6.32 [19:08:02] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 7.85, 7.35, 5.72 [19:10:01] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 6.35, 8.26, 6.56 [19:10:07] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 5.56, 7.17, 6.17 [19:11:03] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 5.59, 6.01, 5.46 [19:11:38] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 19.98, 23.44, 21.35 [19:12:00] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [19:13:04] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 5.36, 6.91, 6.34 [19:16:00] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.48, 6.24, 6.16 [19:16:04] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 4.88, 6.20, 6.11 [19:17:36] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.80, 19.63, 20.32 [19:27:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.33, 20.98, 20.78 [19:28:02] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-4 [+0/-0/±1] 13https://git.io/JKiG6 [19:28:04] [02miraheze/mw-config] 07Universal-Omega 03f0980e7 - Enable `slow-parsoid` logging [19:28:05] [02mw-config] 07Universal-Omega created branch 03Universal-Omega-patch-4 - 13https://git.io/vbvb3 [19:28:15] [02mw-config] 07Universal-Omega opened pull request 03#4158: Enable `slow-parsoid` logging - 13https://git.io/JKiG5 [19:28:21] [02mw-config] 07Universal-Omega edited pull request 03#4158: Enable `slow-parsoid` logging - 13https://git.io/JKiG5 [19:29:20] miraheze/mw-config - Universal-Omega the build passed. [19:29:45] [02mw-config] 07Universal-Omega edited pull request 03#4158: Enable `slow-parsoid` logging - 13https://git.io/JKiG5 [19:37:39] [02mw-config] 07Universal-Omega closed pull request 03#4158: Enable `slow-parsoid` logging - 13https://git.io/JKiG5 [19:37:40] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKili [19:37:42] [02miraheze/mw-config] 07Universal-Omega 035e8943a - Enable `slow-parsoid` logging (#4158) [19:37:43] [02mw-config] 07Universal-Omega deleted branch 03Universal-Omega-patch-4 - 13https://git.io/vbvb3 [19:37:45] [02miraheze/mw-config] 07Universal-Omega deleted branch 03Universal-Omega-patch-4 [19:38:46] miraheze/mw-config - Universal-Omega the build passed. [19:40:04] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-4 [+0/-0/±1] 13https://git.io/JKi4Z [19:40:06] [02miraheze/mw-config] 07Universal-Omega 03682c2b9 - enable/modify database logging channels [19:40:07] [02mw-config] 07Universal-Omega created branch 03Universal-Omega-patch-4 - 13https://git.io/vbvb3 [19:40:09] [02mw-config] 07Universal-Omega opened pull request 03#4159: enable/modify database logging channels - 13https://git.io/JKi4C [19:40:57] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 5.34, 7.91, 6.98 [19:41:09] miraheze/mw-config - Universal-Omega the build passed. [19:43:55] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 3.88, 6.08, 6.44 [19:49:12] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-4 [+0/-0/±1] 13https://git.io/JKiud [19:49:13] [02miraheze/mw-config] 07Universal-Omega 03b19733f - Update LocalSettings.php [19:49:15] [02mw-config] 07Universal-Omega synchronize pull request 03#4159: enable/modify database logging channels - 13https://git.io/JKi4C [19:50:11] miraheze/mw-config - Universal-Omega the build passed. [19:51:08] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 19.87, 19.22, 20.19 [20:03:41] !log [@test3] starting deploy of {'config': True} to skip [20:03:42] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [20:04:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:05:04] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:06:09] SRE, persistent 503 errors on `metawiki` [20:06:09] ```Error 503 Backend fetch failed, forwarded for , 127.0.0.1 [20:06:09] (Varnish XID 120356881) via cp12.miraheze.org at Sun, 17 Oct 2021 20:05:32 GMT.``` [20:06:30] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [20:07:30] !log [@mw11] starting deploy of {'config': True} to all [20:07:31] * dmehus thinks it could be related to that ^ [20:08:05] !log [@mw11] DEPLOY ABORTED: Canary check failed for mw12 [20:08:35] dmehus: looking [20:09:01] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:09:06] CosmicAlpha ty :) [20:10:25] PROBLEM - mw13 Current Load on mw13 is CRITICAL: CRITICAL - load average: 10.59, 8.93, 6.63 [20:10:42] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 12.01, 12.27, 8.50 [20:10:44] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 26.33, 25.63, 22.97 [20:11:34] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 8.83, 8.42, 6.27 [20:11:46] PROBLEM - test3 PowerDNS Recursor on test3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:11:47] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 8.14, 9.03, 7.08 [20:12:05] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 8.76, 9.71, 7.55 [20:12:11] PROBLEM - test3 MediaWiki Rendering on test3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:12:13] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 7.09, 9.09, 7.31 [20:12:32] PROBLEM - cloud5 Current Load on cloud5 is CRITICAL: CRITICAL - load average: 27.44, 22.83, 19.24 [20:12:51] PROBLEM - test3 Current Load on test3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:12:59] PROBLEM - test3 NTP time on test3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:13:00] PROBLEM - test3 php-fpm on test3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:13:23] PROBLEM - test3 Check Gluster Clients on test3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:13:27] PROBLEM - test3 HTTPS on test3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:13:51] PROBLEM - test3 Puppet on test3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:14:06] PROBLEM - test3 APT on test3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:14:15] dmehus: is it working for you? It is for me except test3 [20:14:51] test3 is me [20:14:53] PROBLEM - test3 Disk Space on test3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:14:55] messing with firewall [20:15:23] RECOVERY - test3 PowerDNS Recursor on test3 is OK: DNS OK: 1.743 second response time. miraheze.org returns 2001:41d0:800:170b::5,2001:41d0:801:2000::58af,51.38.69.175,54.38.211.199 [20:15:31] PROBLEM - mw11 Puppet on mw11 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [20:15:44] RECOVERY - test3 MediaWiki Rendering on test3 is OK: HTTP OK: HTTP/1.1 200 OK - 21644 bytes in 0.296 second response time [20:15:59] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 21.56, 21.50, 19.40 [20:16:15] RECOVERY - test3 Current Load on test3 is OK: OK - load average: 0.01, 0.04, 0.03 [20:16:19] RECOVERY - test3 NTP time on test3 is OK: NTP OK: Offset -0.003371149302 secs [20:16:19] RECOVERY - test3 php-fpm on test3 is OK: PROCS OK: 19 processes with command name 'php-fpm7.4' [20:16:28] CosmicAlpha, it works now, just more or a recurring 503 problem [20:16:30] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [20:16:36] RECOVERY - test3 Check Gluster Clients on test3 is OK: PROCS OK: 1 process with args '/usr/sbin/glusterfs' [20:16:37] RECOVERY - test3 HTTPS on test3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 545 bytes in 0.006 second response time [20:16:50] see DMs, if you want to check graylog related to my web requests [20:17:01] dmehus: looks like worker usage was high [20:17:01] RECOVERY - test3 Puppet on test3 is OK: OK: Puppet is currently enabled, last run 13 minutes ago with 0 failures [20:17:10] Spookreeeno, yeah [20:17:14] RECOVERY - test3 APT on test3 is OK: APT OK: 34 packages available for upgrade (0 critical updates). [20:18:01] RECOVERY - test3 Disk Space on test3 is OK: DISK OK - free space: / 5875 MB (32% inode=66%); [20:18:14] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 6.59, 7.62, 6.67 [20:19:02] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 17.59, 19.81, 19.13 [20:20:20] PROBLEM - mw13 Current Load on mw13 is WARNING: WARNING - load average: 5.97, 7.46, 7.16 [20:20:40] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.27, 22.93, 23.09 [20:21:09] !log apply iptables rules to mw[891[0123]] [20:21:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:21:23] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 8.86, 7.84, 6.90 [20:24:20] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 7.85, 7.75, 7.04 [20:24:40] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 5.85, 6.89, 7.29 [20:26:48] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 5.93, 6.85, 7.66 [20:27:21] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 5.62, 7.34, 7.66 [20:27:55] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 4.79, 7.33, 7.97 [20:30:33] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 4.31, 5.78, 6.75 [20:32:08] RECOVERY - mw13 Current Load on mw13 is OK: OK - load average: 5.47, 6.12, 6.68 [20:33:15] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 4.57, 5.63, 6.78 [20:33:22] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 3.89, 5.73, 6.46 [20:33:23] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.73, 5.41, 5.93 [20:35:43] [02puppet] 07Universal-Omega opened pull request 03#2052: prometheus-es_exporter: add mediawiki_params_prop aggs - 13https://git.io/JKij2 [20:35:45] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 6.53, 5.91, 6.73 [20:36:59] RECOVERY - mw11 Puppet on mw11 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:38:33] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 15.68, 18.66, 20.16 [20:39:46] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 4.98, 5.54, 6.66 [20:40:21] [02puppet] 07Universal-Omega synchronize pull request 03#2052: prometheus-es_exporter: add mediawiki_params_prop aggs - 13https://git.io/JKij2 [20:41:55] [02puppet] 07Universal-Omega synchronize pull request 03#2052: prometheus-es_exporter: add mediawiki_params_prop aggs - 13https://git.io/JKij2 [20:42:14] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.89, 6.22, 6.00 [20:45:05] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.32, 5.12, 5.60 [20:50:59] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.75, 6.07, 5.83 [20:53:01] [02puppet] 07Universal-Omega synchronize pull request 03#2052: prometheus-es_exporter: add mediawiki_params_prop aggs - 13https://git.io/JKij2 [20:53:13] [02puppet] 07Universal-Omega edited pull request 03#2052: prometheus-es_exporter: add mediawiki_params_action aggs - 13https://git.io/JKij2 [20:56:42] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.85, 5.88, 5.84 [20:59:35] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.79, 6.57, 6.10 [21:11:18] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.47, 5.42, 5.87 [21:15:35] PROBLEM - www.bluepageswiki.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.bluepageswiki.org' expires in 15 day(s) (Tue 02 Nov 2021 21:08:34 GMT +0000). [21:17:04] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.08, 5.33, 5.65 [21:18:53] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 11.36, 7.97, 6.22 [21:20:05] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.58, 5.48, 5.65 [21:22:35] PROBLEM - www.zenbuddhism.info - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.zenbuddhism.info' expires in 15 day(s) (Tue 02 Nov 2021 21:16:34 GMT +0000). [21:23:16] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.50, 5.95, 5.80 [21:23:41] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 9.53, 8.00, 6.36 [21:24:40] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.44, 23.38, 21.24 [21:26:26] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.43, 5.89, 5.81 [21:26:50] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 6.58, 7.52, 6.49 [21:27:18] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 5.69, 6.98, 6.16 [21:28:17] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 3.57, 6.44, 6.72 [21:29:19] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.67, 6.15, 5.92 [21:29:36] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 3.72, 5.90, 6.03 [21:30:08] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 4.97, 6.03, 5.92 [21:31:50] PROBLEM - www.wikimicrofinanza.it - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.wikimicrofinanza.it' expires in 15 day(s) (Tue 02 Nov 2021 21:20:46 GMT +0000). [21:32:07] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.00, 5.98, 5.92 [21:36:31] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.23, 18.44, 19.82 [21:36:53] PROBLEM - www.mcpk.wiki - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.mcpk.wiki' expires in 15 day(s) (Tue 02 Nov 2021 21:27:47 GMT +0000). [21:39:15] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 7.12, 7.69, 6.76 [21:41:00] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.26, 5.58, 5.63 [21:41:15] PROBLEM - www.lab612.at - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.lab612.at' expires in 15 day(s) (Tue 02 Nov 2021 21:29:23 GMT +0000). [21:42:13] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 4.26, 6.04, 6.27 [21:42:59] paladox: mind deploying #2052, if you have time? [21:43:20] PROBLEM - www.erikapedia.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.erikapedia.com' expires in 15 day(s) (Tue 02 Nov 2021 21:36:23 GMT +0000). [21:43:48] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.53, 5.09, 5.43 [21:44:03] I’m mobile, you’ll need to get it so the change to a file reloads the service [21:44:12] Otherwise if I deploy now it won’t restart the service [21:44:25] PROBLEM - iceria.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.iceria.org' expires in 15 day(s) (Tue 02 Nov 2021 21:35:18 GMT +0000). [21:44:45] PROBLEM - www.iceria.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.iceria.org' expires in 15 day(s) (Tue 02 Nov 2021 21:35:18 GMT +0000). [21:45:24] PROBLEM - rothwell-leeds.co.uk - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.rothwell-leeds.co.uk' expires in 15 day(s) (Tue 02 Nov 2021 21:41:30 GMT +0000). [21:45:30] PROBLEM - erikapedia.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.erikapedia.com' expires in 15 day(s) (Tue 02 Nov 2021 21:36:23 GMT +0000). [21:46:39] SRE, ```(Cannot access the database: Cannot access the database: Unknown error (db11.miraheze.org))``` [21:46:44] on `testwiki` [21:46:53] dmehus: can look [21:46:58] But 2/12 is not many [21:47:26] Spookreeeno, ok thanks :0 [21:47:26] Testwiki works for me [21:47:35] yeah [21:47:51] we just seem to be seeing more of this when in the summer we didn't [21:47:58] and our traffic has dropped [21:48:27] I see exceptions alert is firing [21:48:37] It'll page in a second [21:49:01] PROBLEM - www.rothwell-leeds.co.uk - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.rothwell-leeds.co.uk' expires in 15 day(s) (Tue 02 Nov 2021 21:41:30 GMT +0000). [21:49:05] CosmicAlpha: why does https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1&from=now-1h&to=now&viewPanel=187 not show time [21:49:22] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.92, 5.52, 5.47 [21:49:58] dmehus: not sure on cause but looks like an outage [21:50:53] The Public Test Wiki's working fine for me. [21:50:53] [tell] darkmatterman450: 2021-10-17 - 20:13:41UTC tell darkmatterman450 DM me on IRC when you get a chance, thanks :) [21:51:11] Spookreeeno: fixed [21:51:14] Uhh, okay then, dmehus. [21:52:18] darkmatterman450: it's back now [21:52:29] dmehus: looks like a spike in db usage [21:52:34] Ah, I see. [21:52:37] paladox, JohnLewis: around [21:52:42] PROBLEM - portalsofphereon.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.portalsofphereon.com' expires in 15 day(s) (Tue 02 Nov 2021 21:45:55 GMT +0000). [21:53:14] I’m only mobile [21:53:39] paladox: db11 hit max connections [21:53:45] PROBLEM - www.johanloopmans.nl - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.johanloopmans.nl' expires in 15 day(s) (Tue 02 Nov 2021 21:48:08 GMT +0000). [21:53:48] We had about 1000 exception [21:54:05] Oh I’m not sure what the exceptions were [21:54:39] paladox: probably db connection failures [21:54:42] PROBLEM - christipedia.nl - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.christipedia.nl' expires in 15 day(s) (Tue 02 Nov 2021 21:50:51 GMT +0000). [21:54:56] 22:46:39 SRE, ```(Cannot access the database: Cannot access the database: Unknown error (db11.miraheze.org))``` [21:55:07] Grafana shows a spike on all dbs [21:55:07] PROBLEM - johanloopmans.nl - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.johanloopmans.nl' expires in 15 day(s) (Tue 02 Nov 2021 21:48:08 GMT +0000). [21:55:07] PROBLEM - www.portalsofphereon.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.portalsofphereon.com' expires in 15 day(s) (Tue 02 Nov 2021 21:45:55 GMT +0000). [21:55:14] But 11 was far worse [21:55:27] Spookreeeno, ah [21:56:14] how do we determine what users connect to what db server, or do the mw and cp servers determine the backend database connection? [21:56:35] is there a better way to allocate frontend servers' connections to backend database servers? [21:56:52] It's based on wiki you're accessing [21:56:58] ah, right [21:57:05] We don't have any replaction [21:57:12] ah [21:57:13] Replication* [21:57:32] wonder if it would make sense to move Meta to a lesser used db server [21:57:40] Spookreeno: https://grafana.miraheze.org/explore?left=%5B%22now-15m%22,%22now%22,%22Prometheus%22,%7B%22exemplar%22:true,%22expr%22:%22%5Cnlog_mediawiki_mediawiki_channels_doc_count%7Binstance%3D%5C%22graylog2.miraheze.org:9206%5C%22,%20job%3D%5C%22elasticsearch%5C%22,%20mediawiki_channels%3D%5C%22DBConnection%5C%22%7D%22,%22interval%22:%22%22,%22refId%22:%22A%22,%22datasource%22:%22Prometheus%22%7D%5D&orgId=1 1614 DBConnection [21:57:41] exceptions at the time it seems. Although did not last long enough to alert it seems. [21:57:46] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.04, 5.63, 5.66 [21:57:50] or put testwiki and metawiki on separate db servers? [21:57:59] Spookreeeno, ah thanks :) [21:58:14] CosmicAlpha: yeah that's were I got the 1000 from for paladox [21:58:58] dmehus: they are [21:59:15] meta is c2, test is c4 [22:00:17] mhglobal has an impact though too [22:00:55] PROBLEM - wiki.autocountsoft.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.autocountsoft.com' expires in 15 day(s) (Tue 02 Nov 2021 21:52:05 GMT +0000). [22:01:28] PROBLEM - www.christipedia.nl - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.christipedia.nl' expires in 15 day(s) (Tue 02 Nov 2021 21:50:51 GMT +0000). [22:01:29] PROBLEM - wiki.gesamtschule-nordkirchen.de - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.gesamtschule-nordkirchen.de' expires in 15 day(s) (Tue 02 Nov 2021 21:54:47 GMT +0000). [22:05:12] wow, `15655ms (PHP7 via metawiki@mw13 / cp15` very high load times trying to look up a user in CentralAuth [22:05:32] Spookreeeno, ack okay [22:06:35] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.36, 6.74, 5.98 [22:07:00] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [22:07:26] PROBLEM - baharna.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.baharna.org' expires in 15 day(s) (Tue 02 Nov 2021 21:58:47 GMT +0000). [22:08:23] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.06, 22.41, 19.68 [22:08:35] PROBLEM - www.baharna.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.baharna.org' expires in 15 day(s) (Tue 02 Nov 2021 21:58:47 GMT +0000). [22:11:14] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 16.51, 19.60, 19.03 [22:12:00] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [22:12:18] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.29, 5.76, 5.83 [22:16:30] CosmicAlpha: i changed exceptions to 90s [22:16:39] trying to fix the config [22:17:41] Spookreeeno: Not sure what you mean? [22:17:59] PROBLEM - wiki.hrznstudio.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.hrznstudio.com' expires in 15 day(s) (Tue 02 Nov 2021 22:11:41 GMT +0000). [22:17:59] CosmicAlpha: so it alerts after it's been over for 90s [22:18:12] because 2m is a long time [22:18:18] oh. ok sounds good then. [22:19:44] * Spookreeeno is trying to make it page properly [22:21:35] alerting : [FIRING:1] (!sre MediaWiki Exception Rate Y mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [22:21:49] why no page [22:22:51] ok : [RESOLVED] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [22:23:34] Spookreeeno: I have an idea. [22:23:49] CosmicAlpha: what? [22:26:18] PROBLEM - wiki.vinesh.eu.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.vinesh.eu.org' expires in 15 day(s) (Tue 02 Nov 2021 22:19:41 GMT +0000). [22:29:21] alerting : [FIRING:1] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [22:32:13] Spookreeeno: guess not. I thought maybe the uppercase Y in label w as not working, but guess not. [22:33:06] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.31, 6.83, 5.71 [22:33:15] PROBLEM - wiki.meeusen.net - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.meeusen.net' expires in 15 day(s) (Tue 02 Nov 2021 22:26:13 GMT +0000). [22:33:38] uuuh. [22:36:13] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 9.99, 8.79, 6.72 [22:38:28] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.43, 5.24, 5.25 [22:38:31] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.09, 21.16, 19.49 [22:39:07] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 4.33, 6.89, 6.34 [22:39:21] [02miraheze/mw-config] 07RhinosF1 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKXL9 [22:39:21] ok : [RESOLVED] (sre MediaWiki Exception Rate yes mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [22:39:22] [02miraheze/mw-config] 07RhinosF1 03dffe554 - Update README.md [22:39:29] I hope you're still aware that no edit has taken place on famepedia since today, it keeps returning fatal error. [22:40:00] PROBLEM - wiki.worldsofweary.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.worldsofweary.com' expires in 15 day(s) (Tue 02 Nov 2021 22:28:44 GMT +0000). [22:40:03] for all edits? [22:40:33] miraheze/mw-config - RhinosF1 the build passed. [22:40:53] Yep. [22:41:03] No one has been able to edit. [22:41:17] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.61, 4.74, 5.04 [22:41:20] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 13.65, 18.18, 18.65 [22:41:52] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 3.39, 5.42, 5.87 [22:42:36] In fact, as I speak, everything looks broken when the Edit button is been clicked. [22:47:51] Ugochimobi: Our understanding was that only Special:Contributions was broken, not editing too [22:47:59] so I'll update the task [22:48:02] https://phabricator.wikimedia.org/T293574 [22:48:03] [url] ⚓ T293574 Call to undefined method HistoryBlobStub::uncompress() | phabricator.wikimedia.org [22:48:06] any other thing that's broken? [22:49:16] Haven't find any yet. [22:55:05] alerting : [FIRING:1] (!sre MediaWiki Exception Rate yes mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [22:55:26] dmehus: ^ [22:55:31] That answers your question [22:55:54] Spookreeeno, yeah [22:56:11] Looks like same issue [22:56:40] paladox: db connections have ran out again [22:57:47] !log [universalomega@mwtask1] sudo -u www-data /usr/local/bin/foreachwikiindblist /home/universalomega/advancedsearch.json /srv/mediawiki/w/extensions/ManageWiki/maintenance/toggleExtension.php --disable (END - exit=2) [22:57:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:58:30] CosmicAlpha: can we hold please [22:58:40] Databases are having issues [22:58:59] Spookreeeno, oh will cancel then [22:59:03] !log [universalomega@mwtask1] sudo -u www-data /usr/local/bin/foreachwikiindblist /home/universalomega/advancedsearch.json /srv/mediawiki/w/extensions/ManageWiki/maintenance/toggleExtension.php advancedsearch --disable (END - exit=2) [22:59:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:59:19] CosmicAlpha: no idea what the cause is [22:59:31] You'll have to wake someone with db knowledge up [23:00:05] ok : [RESOLVED] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [23:00:14] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKXCb [23:00:15] [02miraheze/services] 07MirahezeSSLBot 030279756 - BOT: Updating services config for wikis [23:03:13] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 12.19, 7.30, 5.87 [23:04:40] !log [@test3] starting deploy of {'config': True} to skip [23:04:45] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 4s [23:04:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:04:54] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 32.39, 24.43, 20.71 [23:05:05] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.25, 5.53, 5.30 [23:05:22] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:06:00] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [23:06:31] !log [@mw11] starting deploy of {'config': True} to all [23:06:45] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 21.08, 14.69, 9.05 [23:06:47] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:07:22] !log [@mw11] finished deploy of {'config': True} to all - SUCCESS in 50s [23:07:32] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:08:05] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 15.25, 12.73, 8.38 [23:09:13] PROBLEM - cloud5 Current Load on cloud5 is CRITICAL: CRITICAL - load average: 17.96, 24.25, 20.68 [23:10:22] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 5.61, 7.89, 7.12 [23:10:33] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 5.60, 9.83, 7.83 [23:10:36] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 6.15, 7.92, 6.97 [23:11:00] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [23:11:47] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.12, 5.83, 5.53 [23:12:12] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 18.12, 21.56, 20.27 [23:13:19] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 8.52, 7.44, 7.03 [23:13:27] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 12.28, 9.04, 7.49 [23:14:00] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 5.64, 7.34, 7.27 [23:14:36] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.49, 22.46, 22.18 [23:15:58] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 4.78, 7.37, 7.90 [23:16:15] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 5.53, 6.56, 6.76 [23:16:21] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 5.94, 6.93, 7.13 [23:17:48] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.38, 5.82, 5.60 [23:18:18] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 15.45, 19.04, 19.72 [23:19:04] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 11.49, 9.62, 8.66 [23:19:19] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 6.70, 7.81, 7.49 [23:20:04] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 8.09, 7.48, 7.28 [23:20:47] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 26.74, 24.03, 22.73 [23:22:10] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 9.24, 7.94, 7.56 [23:22:18] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 8.34, 6.89, 7.00 [23:23:58] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.04, 22.96, 22.50 [23:24:30] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [23:25:31] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 6.05, 6.49, 6.82 [23:26:32] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 7.14, 7.64, 7.16 [23:26:59] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.22, 5.94, 5.93 [23:28:12] PROBLEM - sdiy.info - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'sdiy.info' expires in 15 day(s) (Tue 02 Nov 2021 23:21:09 GMT +0000). [23:28:26] PROBLEM - mw13 Current Load on mw13 is WARNING: WARNING - load average: 7.46, 7.10, 6.61 [23:29:16] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 6.08, 7.82, 7.91 [23:29:26] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 4.39, 6.27, 6.72 [23:29:30] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [23:29:49] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.14, 5.65, 5.80 [23:30:58] PROBLEM - www.sdiy.info - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'sdiy.info' expires in 15 day(s) (Tue 02 Nov 2021 23:21:09 GMT +0000). [23:31:05] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 6.82, 7.25, 7.50 [23:31:20] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.26, 6.25, 6.70 [23:32:39] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.28, 5.36, 5.66 [23:35:12] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 10.86, 8.70, 8.09 [23:35:30] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [23:37:05] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 13.60, 10.57, 8.67 [23:37:24] PROBLEM - mw13 Current Load on mw13 is CRITICAL: CRITICAL - load average: 10.65, 9.34, 7.71 [23:38:15] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 6.05, 7.20, 7.59 [23:38:27] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.70, 7.38, 7.07 [23:38:45] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.77, 5.86, 5.72 [23:40:30] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1 [23:40:33] PROBLEM - cloud5 Current Load on cloud5 is CRITICAL: CRITICAL - load average: 25.81, 23.57, 21.32 [23:41:24] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 9.60, 9.56, 8.52 [23:41:34] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 8.78, 7.67, 7.21 [23:41:55] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.29, 5.74, 5.71 [23:43:28] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 4.51, 6.82, 7.74 [23:43:35] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 17.71, 21.25, 20.84 [23:44:22] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 5.12, 7.53, 7.92 [23:44:31] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 5.20, 6.59, 6.88 [23:46:00] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 6.90, 7.00, 7.78 [23:46:19] PROBLEM - mw13 Current Load on mw13 is WARNING: WARNING - load average: 6.41, 7.15, 7.37 [23:46:27] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 18.02, 19.84, 20.35 [23:47:19] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 6.18, 6.01, 6.58 [23:49:47] !log [universalomega@mwtask1] sudo -u www-data /usr/local/bin/foreachwikiindblist /home/universalomega/advancedsearch.json /srv/mediawiki/w/extensions/ManageWiki/maintenance/toggleExtension.php advancedsearch --disable (END - exit=0) [23:49:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:50:10] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKXHh [23:50:12] [02miraheze/services] 07MirahezeSSLBot 03aa869d8 - BOT: Updating services config for wikis [23:50:13] [02mw-config] 07Universal-Omega closed pull request 03#4157: T7740: remove AdvancedSearch - 13https://git.io/JKzq5 [23:50:15] [02mw-config] 07Universal-Omega deleted branch 03Universal-Omega-patch-2 - 13https://git.io/vbvb3 [23:50:16] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JKXHj [23:50:18] [02miraheze/mw-config] 07Universal-Omega 03502d460 - T7740: remove AdvancedSearch (#4157) [23:50:19] [02miraheze/mw-config] 07Universal-Omega deleted branch 03Universal-Omega-patch-2 [23:51:12] miraheze/mw-config - Universal-Omega the build passed. [23:52:13] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 4.30, 5.30, 6.55 [23:53:50] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 16.13, 18.59, 20.08 [23:54:38] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 5.75, 5.66, 6.72 [23:55:07] !log [universalomega@mw11] starting deploy of {'config': True} to all [23:55:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:55:25] !log [universalomega@mw11] finished deploy of {'config': True} to all - SUCCESS in 17s [23:55:29] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:55:39] !log [universalomega@test3] starting deploy of {'config': True} to skip [23:55:40] !log [universalomega@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [23:55:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:55:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:56:05] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 5.30, 5.62, 6.62 [23:59:20] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.10, 6.14, 5.87