[00:00:13] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.26, 6.70, 6.72 [00:00:17] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.77, 3.26, 3.29 [00:01:12] paladox: nope, I have no ideas. It really should be as far as I can see from what you've posted. [00:01:20] ok [00:02:42] the only thing i see is: [00:02:44] https://www.irccloud.com/pastebin/krlXSht0/ [00:02:44] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [00:03:02] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 29.00, 24.90, 21.87 [00:03:22] Have any of the betaheze servers been added to icinga alerts yet or no? [00:03:45] dmehus: betaheze is test3. [00:04:14] CosmicAlpha, yeah, but betaheze is on the new infrastructure isn't it, or no? [00:04:44] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [00:04:50] dmehus: no [00:05:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.64, 23.64, 21.79 [00:05:02] we don't have working anything there [00:05:07] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:05:19] RhinosF1, ah, okay. Thanks. I was just wondering whether the icinga alerts included IPs for any of the new VMs, but it sounds like no [00:05:38] when it says 8 datacenters are down, that's all production infrastructure, basically [00:05:52] dmehus: yes [00:05:58] RhinosF1, ack, thanks :) [00:06:04] there's no stunnel for scsvg yet [00:06:09] ah [00:06:23] i also did say that beta would only move slightly before prod [00:06:52] RhinosF1, oh okay, yeah I do recall you mentioning that now. Thanks :) [00:07:04] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.331 second response time [00:07:18] paladox: composer won't work with git either [00:07:34] it's supposed to use the proxy [00:07:44] composer doesn't use git [00:07:52] i think [00:08:01] paladox: it is Failed to clone https://github.com/wmde/Serialization.git via https, ssh protocols, aborting. [00:08:02] [url] GitHub - wmde/Serialization: Small library defining a Serializer and a Deserializer interface. | github.com [00:08:05] that's composer [00:08:14] https://stackoverflow.com/questions/17307600/php-composer-behind-http-proxy [00:08:15] [url] php 5.3 - PHP Composer behind http proxy - Stack Overflow | stackoverflow.com [00:08:18] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSI3G [00:08:19] [02miraheze/mw-config] 07Universal-Omega 03dd1398a - test111 -> test101 [00:08:19] did you set http_proxy? [00:09:02] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 26.19, 24.90, 22.69 [00:09:20] miraheze/mw-config - Universal-Omega the build passed. [00:10:01] paladox: ttrying [00:10:24] why it doesn't read /etc/gitconfig i don't know [00:11:44] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [00:11:47] !log [@mw11] starting deploy of {'config': True} to ovlon [00:12:17] oh hold onworks for me. [00:12:25] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:12:28] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:12:28] though yeh it's not reading the file :/ [00:13:06] paladox: it's not working still [00:13:09] !log [@mw11] finished deploy of {'config': True} to ovlon - SUCCESS in 82s [00:13:20] https://www.irccloud.com/pastebin/PoY11yH5/ [00:13:27] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSIsa [00:13:29] [02miraheze/mw-config] 07Universal-Omega 0345206e1 - +mwtask111 for $wgRequestTimeLimit [00:13:51] PROBLEM - mw11 MediaWiki Rendering on mw11 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:51] PROBLEM - cp20 Stunnel Http for mw11 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:13:57] PROBLEM - cp20 Stunnel Http for mw10 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:14:07] PROBLEM - cp30 Stunnel Http for mw11 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:14:07] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [00:14:10] PROBLEM - cp21 Stunnel Http for mw11 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:14:16] paladox: also setting HTTP_PROXY breaks localhost calls later on [00:14:18] https://www.irccloud.com/pastebin/zXCsgy1R/ [00:14:19] PROBLEM - cp31 Stunnel Http for mw10 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:14:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:14:31] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 8.358 second response time [00:14:31] miraheze/mw-config - Universal-Omega the build passed. [00:14:31] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14564 bytes in 5.481 second response time [00:14:32] i'm off to sleep [00:14:33] PROBLEM - mw10 MediaWiki Rendering on mw10 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:14:37] RhinosF1:worked for me when i manually did it [00:14:53] paladox: where [00:14:58] mwtask101 [00:15:18] paladox: what exactly did you run [00:15:21] http_proxy=http://bast101.miraheze.org:8080 composer install [00:15:30] in /srv/mediawiki-stagging/w/ as www-data [00:15:42] [02miraheze/MirahezeDebug] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-1 [+0/-0/±1] 13https://git.io/JSIGc [00:15:44] [02miraheze/MirahezeDebug] 07Universal-Omega 0383c4c9d - Update popup.html [00:15:44] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [00:15:45] [02MirahezeDebug] 07Universal-Omega synchronize pull request 03#6: Switch servers - 13https://git.io/JyDrR [00:15:53] RECOVERY - cp20 Stunnel Http for mw11 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 7.340 second response time [00:15:54] RECOVERY - mw11 MediaWiki Rendering on mw11 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 8.486 second response time [00:15:58] [02miraheze/MirahezeDebug] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-1 [+0/-0/±1] 13https://git.io/JSIG8 [00:15:59] RECOVERY - cp20 Stunnel Http for mw10 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.618 second response time [00:16:00] [02miraheze/MirahezeDebug] 07Universal-Omega 038118785 - Update background.js [00:16:01] [02MirahezeDebug] 07Universal-Omega synchronize pull request 03#6: Switch servers - 13https://git.io/JyDrR [00:16:11] RECOVERY - cp30 Stunnel Http for mw11 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 6.779 second response time [00:16:16] RECOVERY - cp21 Stunnel Http for mw11 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 5.677 second response time [00:16:21] RECOVERY - cp31 Stunnel Http for mw10 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 5.317 second response time [00:16:35] RECOVERY - mw10 MediaWiki Rendering on mw10 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 3.311 second response time [00:17:00] [02MirahezeDebug] 07Universal-Omega edited pull request 03#6: Switch servers to SCSVG - 13https://git.io/JyDrR [00:17:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.30, 3.78, 3.49 [00:18:05] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [00:18:42] paladox: you might be a genius [00:18:50] hmm? [00:20:01] paladox: i wondered if sudo wasn't preserving environment variables [00:20:26] RhinosF1: s/might be/are [00:20:26] dmehus thinks RhinosF1 meant to say: paladox: you are a genius [00:20:29] it probably wasn't [00:20:53] you should add support for proxy within the script [00:21:06] sine you don't want to have HTTP_PROXY set permenantly [00:21:20] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.230 second response time [00:21:34] paladox: composer requires that method afaics [00:21:39] PROBLEM - cp21 Stunnel Http for mw8 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:21:45] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2607:5300:201:3100::5ebc/cpweb [00:22:03] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 6 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::5ebc/cpweb [00:22:41] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:22:45] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:22:46] PROBLEM - cp20 Stunnel Http for mw8 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:22:48] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:22:50] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:23:04] PROBLEM - mw8 MediaWiki Rendering on mw8 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:22] PROBLEM - cp30 Stunnel Http for mw8 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:23:25] PROBLEM - cp31 Stunnel Http for mw8 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:24:03] paladox: that wasn't npm's issue [00:24:10] but it seems to fix composer [00:24:13] ok [00:25:12] !log [@test3] starting deploy of {'config': True} to skip [00:25:12] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [00:26:19] [02puppet] 07Universal-Omega reviewed pull request 03#2184 commit - 13https://git.io/JSIcE [00:26:20] [02puppet] 07Universal-Omega reviewed pull request 03#2184 commit - 13https://git.io/JSIcu [00:26:22] [02puppet] 07Universal-Omega reviewed pull request 03#2184 commit - 13https://git.io/JSIcz [00:26:29] https://github.com/wikimedia/puppet/blame/ad3f48845502c3969ee23ea382d1ebb6acc0bba9/modules/tcpircbot/files/tcpircbot.py#L99 hmm [00:26:31] [url] puppet/modules/tcpircbot/files/tcpircbot.py at ad3f48845502c3969ee23ea382d1ebb6acc0bba9 · wikimedia/puppet · GitHub | github.com [00:26:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:27:09] RECOVERY - mw8 MediaWiki Rendering on mw8 is OK: HTTP OK: HTTP/1.1 200 OK - 20514 bytes in 7.241 second response time [00:27:47] [02puppet] 07Universal-Omega reviewed pull request 03#2183 commit - 13https://git.io/JSIcA [00:27:48] [02puppet] 07Universal-Omega reviewed pull request 03#2183 commit - 13https://git.io/JSIcx [00:27:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:27:50] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:27:55] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:29:43] [02puppet] 07Universal-Omega reviewed pull request 03#2184 commit - 13https://git.io/JSICu [00:29:43] RECOVERY - cp30 Stunnel Http for mw8 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 7.736 second response time [00:29:44] [02puppet] 07Universal-Omega reviewed pull request 03#2184 commit - 13https://git.io/JSICz [00:29:45] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.022 second response time [00:29:46] [02puppet] 07Universal-Omega reviewed pull request 03#2184 commit - 13https://git.io/JSICg [00:29:47] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14564 bytes in 0.303 second response time [00:29:49] RECOVERY - cp31 Stunnel Http for mw8 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 5.305 second response time [00:29:52] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.347 second response time [00:29:54] RECOVERY - cp21 Stunnel Http for mw8 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 3.588 second response time [00:29:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.78, 3.84, 3.81 [00:30:07] CosmicAlpha: Sun, Jan 2, 00:29 [00:30:15] https://phabricator.miraheze.org/T8483#172948 [00:30:16] [url] ⚓ T8483 Migrate MediaWiki over to new infrastructure | phabricator.miraheze.org [00:30:53] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 5.491 second response time [00:30:54] RECOVERY - cp20 Stunnel Http for mw8 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 0.721 second response time [00:30:58] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14564 bytes in 0.084 second response time [00:31:06] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 0.387 second response time [00:31:11] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 1.017 second response time [00:31:35] RhinosF1: seen. Thanks! [00:31:45] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [00:31:59] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [00:32:49] CosmicAlpha: npm is more of an issue, i can easily fix local host & composer [00:32:57] npm has sent me round in circles [00:33:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.68, 4.09, 3.91 [00:35:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.01, 3.77, 3.82 [00:37:24] [02puppet] 07RhinosF1 opened pull request 03#2232: mwdeploy: support proxy - 13https://git.io/JSI8U [00:37:45] CosmicAlpha, paladox: off to sleep but ^ **should** work [00:37:48] RhinosF1: maybe we just drop npm from femiwiki and use a femiwiki-deploy repo, which clones into /srv/mediawiki-staging/w/skins/Femiwiki/node_modules? I didn't want to make node_modules a git repo before so didn't do that alongside mathoid and 3d2png but we could I guess. So not sure how much of an issue that'd have on submodule either. [00:37:56] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 1 datacenter is down: 2607:5300:201:3100::929a/cpweb [00:38:09] CosmicAlpha: i don't have other options [00:38:37] [02puppet] 07paladox closed pull request 03#2232: mwdeploy: support proxy - 13https://git.io/JSI8U [00:38:38] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSI8u [00:38:40] [02miraheze/puppet] 07RhinosF1 033d47b1f - mwdeploy: support proxy (#2232) [00:38:43] * RhinosF1 sleeps [00:41:21] !log [@mw11] starting deploy of {'config': True} to ovlon [00:41:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:41:44] !log [@mw11] finished deploy of {'config': True} to ovlon - SUCCESS in 22s [00:41:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 5.41, 4.43, 4.04 [00:42:05] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:42:35] CosmicAlpha: it shouldn't be an issue to put an npm repo inside there [00:43:07] (Apart from git clone taking time, might be worth setting it to present rather than latest in puppet as deploys would be rare) [00:43:53] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [00:50:12] RhinosF1: that wouldn't cause submodule to conflict? [00:51:23] CosmicAlpha: node_modules is in .gitignore [00:51:33] So if we just add it to puppet to clone there [00:51:43] I'm pretty sure git will be stupid enough to handle [00:51:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.23, 3.72, 3.92 [00:53:49] RhinosF1: Oh OK. I'll try to set that up later then, if we can't figure anything else out as really isn't all that ideal. Maybe some script to pull the tarball for it or something, as that'd be bundled with the dependencies also. Not sure though. [00:53:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.67, 3.97, 3.98 [00:54:45] CosmicAlpha: feel free to fiddle around and try stuff [00:54:52] I'm logged off now [00:55:01] Will do in a bit, maybe tomorrow. [00:55:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.59, 3.53, 3.83 [00:57:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 19.68, 21.79, 23.79 [00:57:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.53, 3.95, 3.94 [00:59:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.32, 3.80, 3.90 [01:01:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 5.08, 4.28, 4.06 [01:05:02] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.87, 21.05, 22.41 [01:07:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 16.68, 19.39, 21.65 [01:07:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.17, 3.90, 3.99 [01:09:41] OH [01:09:43] IT WORKS [01:09:53] CosmicAlpha: found the fix! [01:09:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.58, 4.27, 4.12 [01:11:02] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.32, 21.29, 21.87 [01:11:57] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JSIwb [01:11:59] [02miraheze/puppet] 07paladox 03400f84d - ircecho: Support ipv6 [01:12:00] [02puppet] 07paladox created branch 03paladox-patch-1 - 13https://git.io/vbiAS [01:12:02] [02puppet] 07paladox opened pull request 03#2233: ircecho: Support ipv6 - 13https://git.io/JSIwx [01:13:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.23, 21.62, 21.93 [01:13:48] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:13:52] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JSIrD [01:13:54] [02miraheze/puppet] 07paladox 037d57a42 - Update default.erb [01:13:55] [02puppet] 07paladox synchronize pull request 03#2233: ircecho: Support ipv6 - 13https://git.io/JSIwx [01:13:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.56, 3.69, 3.94 [01:14:02] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:14:28] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:14:29] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:14:52] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:15:49] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 6.818 second response time [01:16:34] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 9.548 second response time [01:16:56] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 3.985 second response time [01:17:26] paladox: oh, nice! [01:17:50] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 6.87, 6.27, 5.75 [01:18:09] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 2.681 second response time [01:18:33] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.342 second response time [01:19:02] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 26.48, 22.66, 22.16 [01:19:44] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 5 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 2607:5300:201:3100::5ebc/cpweb [01:20:59] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:21:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.50, 22.27, 22.09 [01:21:36] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 6 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [01:21:38] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:21:49] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 6.21, 6.29, 5.89 [01:22:01] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JSIiJ [01:22:03] [02miraheze/puppet] 07paladox 03e8aa289 - Update ircecho.py [01:22:04] [02puppet] 07paladox synchronize pull request 03#2233: ircecho: Support ipv6 - 13https://git.io/JSIwx [01:23:01] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.769 second response time [01:23:06] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JSIiz [01:23:07] [02miraheze/puppet] 07paladox 03c511e8c - Update default.erb [01:23:09] [02puppet] 07paladox synchronize pull request 03#2233: ircecho: Support ipv6 - 13https://git.io/JSIwx [01:23:35] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [01:23:37] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.449 second response time [01:23:44] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [01:23:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.17, 3.73, 3.81 [01:25:37] WOOO [01:25:58] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSIPX [01:25:59] [02miraheze/puppet] 07paladox 0380d797c - ircecho: Use ipv6 by default [01:26:08] [02puppet] 07paladox closed pull request 03#2233: ircecho: Support ipv6 - 13https://git.io/JSIwx [01:26:46] PROBLEM - mwtask111 Puppet on mwtask111 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Exec[femiwiki_npm] [01:26:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.65, 4.98, 5.94 [01:26:59] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.73, 6.98, 6.28 [01:27:46] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.32, 6.63, 6.12 [01:28:59] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.20, 6.39, 6.16 [01:29:41] [02dns] 07Universal-Omega reviewed pull request 03#233 commit - 13https://git.io/JSI13 [01:29:46] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 6.31, 6.55, 6.16 [01:32:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.84, 5.58, 5.85 [01:32:58] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 7.15, 6.41, 5.84 [01:34:54] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 5.85, 6.07, 5.78 [01:34:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.43, 4.76, 5.52 [01:35:02] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.29, 18.42, 20.09 [01:35:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.77, 4.00, 3.99 [01:37:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 3.68, 4.02, 4.00 [01:38:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.27, 5.70, 5.66 [01:41:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [01:41:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.91, 3.71, 3.91 [01:42:56] [02miraheze/mw-config] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JSI9V [01:42:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.86, 5.68, 5.65 [01:42:58] [02miraheze/mw-config] 07paladox 0384fa7a5 - Use mon2 ipv6 address for rc feed [01:42:59] [02mw-config] 07paladox created branch 03paladox-patch-1 - 13https://git.io/vbvb3 [01:43:01] [02mw-config] 07paladox opened pull request 03#4327: Use mon2 ipv6 address for rc feed - 13https://git.io/JSI9r [01:43:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.04, 20.92, 20.46 [01:43:08] CosmicAlpha: you can make the interface just default to :: [01:44:01] miraheze/mw-config - paladox the build passed. [01:44:03] paladox: yeah. Do we need to even keep the template variable or can it just be directly in ircrcbot? [01:44:16] it can just be directly in it [01:44:38] [02puppet] 07Universal-Omega synchronize pull request 03#2231: ircrcbot: support binding to IPV6 - 13https://git.io/JSkbY [01:44:58] [02puppet] 07Universal-Omega synchronize pull request 03#2231: ircrcbot: support binding to IPV6 - 13https://git.io/JSkbY [01:45:23] [02puppet] 07Universal-Omega synchronize pull request 03#2231: ircrcbot: support binding to IPV6 - 13https://git.io/JSkbY [01:45:35] [02puppet] 07Universal-Omega edited pull request 03#2231: ircrcbot: bind to IPV6 - 13https://git.io/JSkbY [01:45:43] Done paladox [01:45:58] [02puppet] 07paladox closed pull request 03#2231: ircrcbot: bind to IPV6 - 13https://git.io/JSkbY [01:45:59] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSIH7 [01:46:01] [02miraheze/puppet] 07Universal-Omega 03a8478c4 - ircrcbot: bind to IPV6 (#2231) [01:46:08] CosmicAlpha: here's my mw change https://git.io/JSI9r [01:47:02] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 15.80, 19.27, 19.97 [01:47:28] https://github.com/twonds/twisted/blob/d6e270a465d371c3bed01bf369af497b77eb9f1e/twisted/internet/ssl.py#L153 :ultraseeth: [01:47:28] [url] twisted/ssl.py at d6e270a465d371c3bed01bf369af497b77eb9f1e · twonds/twisted · GitHub | github.com [01:47:35] paladox: Thanks, should it be merged? [01:47:41] CosmicAlpha: yeh [01:48:20] [02mw-config] 07Universal-Omega closed pull request 03#4327: Use mon2 ipv6 address for rc feed - 13https://git.io/JSI9r [01:48:21] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSIQS [01:48:23] [02miraheze/mw-config] 07paladox 03bde0895 - Use mon2 ipv6 address for rc feed (#4327) [01:48:24] [02mw-config] 07Universal-Omega deleted branch 03paladox-patch-1 - 13https://git.io/vbvb3 [01:48:26] [02miraheze/mw-config] 07Universal-Omega deleted branch 03paladox-patch-1 [01:48:35] thanks! [01:48:45] guess i'm going to have to hardcode the ipv6 address for libera [01:48:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.70, 5.55, 5.53 [01:49:09] no problem paladox [01:49:26] miraheze/mw-config - Universal-Omega the build passed. [01:50:58] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.71, 5.42, 5.47 [01:51:44] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 198.244.148.90/cpweb, 149.56.141.75/cpweb [01:51:46] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSI5I [01:51:48] [02miraheze/puppet] 07paladox 03fcdf794 - ircrcbot use Libera ipv6 address directly [01:51:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.99, 3.95, 3.84 [01:51:57] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.12, 6.55, 6.11 [01:53:43] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [01:53:57] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.48, 6.13, 6.00 [01:54:21] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSIdL [01:54:23] [02miraheze/puppet] 07paladox 03da930a2 - irclogserverbot use Libera ipv6 address directly [01:54:46] !log [@test3] starting deploy of {'config': True} to skip [01:54:47] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [01:55:04] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:55:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:56:58] paladox: does: https://github.com/miraheze/puppet/blob/d6dffdd399e730c0d803792021c89b9b7e43661d/modules/irc/templates/logbot/config.py#L47 also need changed? [01:56:59] [url] puppet/config.py at d6dffdd399e730c0d803792021c89b9b7e43661d · miraheze/puppet · GitHub | github.com [01:57:16] i don't think so, as it's joined above [01:57:18] !log test [01:57:36] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:57:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.12, 3.71, 3.80 [01:57:57] paladox: LS bot did, not logbot [01:58:05] oh [02:00:56] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.12, 4.75, 5.09 [02:01:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.09, 3.95, 3.88 [02:02:02] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSINL [02:02:04] [02miraheze/puppet] 07paladox 0399796ef - ircecho: Fix ipv6 support [02:03:27] \o/ [02:04:32] [02puppet] 07Universal-Omega opened pull request 03#2234: logbot: use Libera ipv6 address directly - 13https://git.io/JSIAJ [02:04:47] paladox: ^ if that's necessary, if not feel free to close. [02:05:04] that didn't fix it [02:05:13] https://www.irccloud.com/pastebin/f7yn38sx/ [02:05:24] somewhere needs to have it's listener changed to :: [02:05:45] found it! [02:06:32] Yep, was just about to look for that also. [02:06:58] works but i get nick inuser [02:07:00] *inuse [02:07:44] WOOOO [02:07:52] paladox: awesome! [02:07:58] That's all of them working [02:09:02] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSIxx [02:09:04] [02miraheze/puppet] 07paladox 03cd1ea41 - logbot: Add support for ipv6 [02:10:04] PROBLEM - mon111 IRC Log Bot on mon111 is CRITICAL: PROCS CRITICAL: 0 processes with args 'adminlogbot.py' [02:10:24] will we still have separate nicks for MirahezeLogbot and MirahezeLSBot? [02:10:36] Yes. [02:10:37] I do like that because each bot performs a different function [02:10:41] okay, that's good :) [02:10:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.82, 5.05, 4.84 [02:10:58] We have 2 different versions of each bot right now dmehus, new and old infra. [02:11:11] CosmicAlpha, yeah, that part is fine heh [02:11:32] !log [@mw11] starting deploy of {'config': True} to ovlon [02:11:54] dmehus: We finally got them working on the new infra. Took hours to figure out how to make them work with IPV6. But glad they work now! [02:11:58] RECOVERY - mon111 IRC Log Bot on mon111 is OK: PROCS OK: 1 process with args 'adminlogbot.py' [02:12:04] We'll have to recloak the MirahezeLogbots on the new infra, CosmicAlpha [02:12:06] !log [@mw11] finished deploy of {'config': True} to ovlon - SUCCESS in 33s [02:12:16] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:12:20] CosmicAlpha, yeah, that's good :) [02:12:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:12:36] dmehus: It will be the old nick again, once it is one Infra. Nick change is temporary. [02:12:40] [url] Tech:Server admin log - Miraheze Meta | meta.miraheze.org [02:12:45] we're not keeping MirahezeLogbots nick [02:12:46] We probably should not run to logbots.... [02:12:46] CosmicAlpha: ah, okay [02:12:51] it's just temporary [02:12:51] 2 [02:12:53] paladox, ack SGTM [02:12:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.07, 5.36, 4.98 [02:12:59] and it'll change in an hour time anyways lol [02:13:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.91, 19.47, 18.46 [02:13:02] well 30 mins [02:13:07] ah, cool [02:13:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:13:26] CosmicAlpha: yeh don't want double logs [02:13:27] !log [02:13:30] !log test [02:13:34] lol yes [02:13:38] that'll be super annoying [02:13:43] Yeah one should be shut down. [02:13:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:13:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:13:50] [url] Tech:Server admin log - Miraheze Meta | meta.miraheze.org [02:13:54] lol [02:14:09] well it logged it twice [02:14:14] yeah [02:14:17] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 1 datacenter is down: 149.56.140.43/cpweb [02:14:22] should we manually remove the double entry? [02:14:27] yup [02:14:29] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.28, 6.44, 5.84 [02:14:31] okay [02:14:43] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 198.244.148.90/cpweb, 2607:5300:201:3100::929a/cpweb [02:14:45] dmehus: both the test entries can be removed. [02:14:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.69, 5.38, 5.06 [02:15:02] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.76, 19.09, 18.45 [02:15:35] CosmicAlpha: ack, yeah paladox did [02:16:17] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [02:16:28] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 6.21, 6.29, 5.86 [02:16:40] Now we just need to figure out this npm issue... there's got to be a way to get that working... [02:16:44] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [02:16:56] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.99, 4.80, 4.88 [02:17:02] what's npm for again? [02:17:09] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+1/-0/±0] 13https://git.io/JSLeq [02:17:10] [02miraheze/puppet] 07paladox 0377bd2df - Create graylog121.yaml [02:17:11] and did we use that on the production infra? [02:17:21] We only use it for femiwiki and yes we do. [02:17:25] s/production/current [02:17:25] dmehus meant to say: and did we use that on the current infra? [02:17:31] CosmicAlpha, ah [02:18:22] `femiwiki` doesn't exist [02:18:25] what's that? [02:18:43] dmehus: we originally used it for mathoid and 3d2png before I completed https://phabricator.miraheze.org/T8032 also but it is different for femiwiki (a skin) since its a submodule. [02:18:44] [url] ⚓ T8032 mediawiki: Create a deploy npm repo for mathoid and 3d2png | phabricator.miraheze.org [02:18:50] oh [02:18:52] it's a skin [02:19:03] RECOVERY - graylog121 Current Load on graylog121 is OK: OK - load average: 0.40, 0.12, 0.04 [02:19:08] RECOVERY - graylog121 PowerDNS Recursor on graylog121 is OK: DNS OK: 0.286 seconds response time. miraheze.org returns 198.244.148.90,2001:41d0:801:2000::1b80,2001:41d0:801:2000::4c25,51.195.220.68 [02:19:15] RECOVERY - graylog121 ferm_active on graylog121 is OK: OK ferm input default policy is set [02:19:15] RECOVERY - graylog121 Puppet on graylog121 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [02:19:16] RECOVERY - graylog121 Disk Space on graylog121 is OK: DISK OK - free space: / 6545 MB (74% inode=86%); [02:19:40] RECOVERY - graylog121 APT on graylog121 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [02:20:32] RECOVERY - graylog121 conntrack_table_size on graylog121 is OK: OK: nf_conntrack is 0 % full [02:20:56] RECOVERY - graylog121 NTP time on graylog121 is OK: NTP OK: Offset -0.0006191134453 secs [02:21:00] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 7.44, 6.34, 5.77 [02:22:08] CosmicAlpha: wondering are you deplying the config change [02:22:17] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 7 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [02:22:55] paladox: should've already been deployed (at least on ovlon) [02:22:57] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 5.33, 5.93, 5.69 [02:23:08] hmm -feed not working [02:24:17] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [02:25:35] Hmm... [02:26:21] https://www.irccloud.com/pastebin/pPCjAiY0/ [02:26:31] PROBLEM - cp30 Current Load on cp30 is CRITICAL: CRITICAL - load average: 2.31, 1.83, 1.21 [02:27:32] https://www.irccloud.com/pastebin/ZqFQrk3z/ [02:28:30] PROBLEM - cp30 Current Load on cp30 is WARNING: WARNING - load average: 1.72, 1.64, 1.21 [02:28:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.12, 5.71, 5.22 [02:29:44] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 6.99, 6.73, 6.13 [02:30:09] paladox: shouldn't the config ipv6 address be enclosed in brackets when there is a port like that. Could be wrong though. [02:30:23] maybe some places do that [02:30:30] RECOVERY - cp30 Current Load on cp30 is OK: OK - load average: 1.07, 1.56, 1.24 [02:30:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.85, 5.52, 5.20 [02:31:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.79, 20.87, 19.80 [02:31:43] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 6.72, 6.50, 6.11 [02:32:56] CosmicAlpha: anyway we can quickly test that? [02:32:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.77, 6.01, 5.42 [02:33:02] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.00, 20.05, 19.65 [02:35:34] paladox: the bot would need restarted I think, but I can deploy it and revert if doesn't work? [02:35:56] sure [02:36:25] hmm [02:36:28] localhost works [02:36:37] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLIP [02:36:39] [02miraheze/mw-config] 07Universal-Omega 03a7ede68 - Enclosw mon2 ipv6 address in brackets [02:37:05] Whoops typo in commit message. Oh well. [02:37:12] Deploying [02:37:47] !log [universalomega@mw11] starting deploy of {'pull': 'config', 'config': True} to all [02:37:51] miraheze/mw-config - Universal-Omega the build passed. [02:37:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:37:56] !log [universalomega@mw11] finished deploy of {'pull': 'config', 'config': True} to all - SUCCESS in 9s [02:38:01] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:38:42] it works [02:39:23] Great! [02:42:16] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+1/-0/±0] 13https://git.io/JSLtH [02:42:17] [02miraheze/puppet] 07paladox 03a08c83d - Install mail121 [02:42:19] [02puppet] 07paladox created branch 03paladox-patch-4 - 13https://git.io/vbiAS [02:42:20] [02puppet] 07paladox opened pull request 03#2235: Install mail121 - 13https://git.io/JSLt7 [02:42:22] So all the bots work now paladox? [02:42:43] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [02:42:46] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSLqe [02:42:47] [02miraheze/puppet] 07paladox 03e239716 - Update site.pp [02:42:49] [02puppet] 07paladox synchronize pull request 03#2235: Install mail121 - 13https://git.io/JSLt7 [02:42:55] [02puppet] 07paladox closed pull request 03#2235: Install mail121 - 13https://git.io/JSLt7 [02:42:57] [02puppet] 07paladox deleted branch 03paladox-patch-4 - 13https://git.io/vbiAS [02:42:58] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+1/-0/±1] 13https://git.io/JSLqI [02:43:00] [02miraheze/puppet] 07paladox 0390d1624 - Install mail121 (#2235) [02:43:01] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-4 [02:43:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.36, 21.82, 20.48 [02:44:39] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 26698 MB (5% inode=98%); [02:44:45] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:44:55] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:44:59] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:45:02] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:45:12] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:45:18] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:45:49] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:46:09] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb [02:46:32] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:47:02] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.00, 20.04, 20.08 [02:47:10] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 8.256 second response time [02:47:16] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 7.237 second response time [02:47:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.94, 3.23, 3.93 [02:48:13] [02puppet] 07Universal-Omega closed pull request 03#2234: logbot: use Libera ipv6 address directly - 13https://git.io/JSIAJ [02:48:17] PROBLEM - cp31 Current Load on cp31 is CRITICAL: CRITICAL - load average: 2.29, 1.59, 1.25 [02:48:57] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 7.729 second response time [02:49:06] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 5.495 second response time [02:49:09] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.484 second response time [02:49:36] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 8.685 second response time [02:49:37] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 8.58, 7.05, 6.44 [02:49:56] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14564 bytes in 8.714 second response time [02:49:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.29, 3.66, 4.00 [02:50:14] RECOVERY - Host mail121 is UP: PING OK - Packet loss = 0%, RTA = 2.02 ms [02:50:14] PROBLEM - mail121 conntrack_table_size on mail121 is CRITICAL: connect to address 31.24.105.139 port 5666: Network is unreachableconnect to host 31.24.105.139 port 5666: Network is unreachable [02:50:19] PROBLEM - mail121 Disk Space on mail121 is CRITICAL: connect to address 31.24.105.139 port 5666: Network is unreachableconnect to host 31.24.105.139 port 5666: Network is unreachable [02:50:24] PROBLEM - mail121 APT on mail121 is CRITICAL: connect to address 31.24.105.139 port 5666: Network is unreachableconnect to host 31.24.105.139 port 5666: Network is unreachable [02:50:34] PROBLEM - mail121 Current Load on mail121 is CRITICAL: connect to address 31.24.105.139 port 5666: Network is unreachableconnect to host 31.24.105.139 port 5666: Network is unreachable [02:50:39] PROBLEM - mail121 ferm_active on mail121 is CRITICAL: connect to address 31.24.105.139 port 5666: Network is unreachableconnect to host 31.24.105.139 port 5666: Network is unreachable [02:50:43] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [02:50:44] PROBLEM - mail121 NTP time on mail121 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:50:45] PROBLEM - mail121 PowerDNS Recursor on mail121 is CRITICAL: connect to address 31.24.105.139 port 5666: Network is unreachableconnect to host 31.24.105.139 port 5666: Network is unreachable [02:50:49] PROBLEM - mail121 HTTPS on mail121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:50:49] PROBLEM - mail121 Puppet on mail121 is CRITICAL: connect to address 31.24.105.139 port 5666: Network is unreachableconnect to host 31.24.105.139 port 5666: Network is unreachable [02:51:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.18, 21.73, 20.75 [02:51:14] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLO3 [02:51:15] [02miraheze/puppet] 07paladox 03ad293fa - Update mail121.yaml [02:51:36] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.15, 7.13, 6.55 [02:51:51] PROBLEM - mail121 IMAP on mail121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:52:14] RECOVERY - cp31 Current Load on cp31 is OK: OK - load average: 1.29, 1.58, 1.33 [02:52:15] PROBLEM - mail121 webmail.miraheze.org HTTPS on mail121 is CRITICAL: connect to address 2a10:6740::6:307 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [02:52:22] RECOVERY - mail121 Disk Space on mail121 is OK: DISK OK - free space: / 6027 MB (68% inode=84%); [02:52:31] RECOVERY - mail121 APT on mail121 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [02:52:32] RECOVERY - mail121 conntrack_table_size on mail121 is OK: OK: nf_conntrack is 0 % full [02:52:34] RECOVERY - mail121 NTP time on mail121 is OK: NTP OK: Offset -0.002284616232 secs [02:52:38] RECOVERY - mail121 Current Load on mail121 is OK: OK - load average: 1.19, 1.09, 0.59 [02:52:39] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.021 second response time [02:52:44] RECOVERY - mail121 ferm_active on mail121 is OK: OK ferm input default policy is set [02:52:47] RECOVERY - mail121 PowerDNS Recursor on mail121 is OK: DNS OK: 0.347 seconds response time. miraheze.org returns 198.244.148.90,2001:41d0:801:2000::1b80,2001:41d0:801:2000::4c25 [02:53:23] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 7.69, 6.73, 6.11 [02:53:47] RECOVERY - mail121 IMAP on mail121 is OK: IMAP OK - 0.033 second response time on 2a10:6740::6:307 port 143 [* OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE LITERAL+ STARTTLS LOGINDISABLED] Dovecot (Debian) ready.] [02:54:05] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [02:54:06] !log [@test3] starting deploy of {'config': True} to skip [02:54:07] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 1s [02:54:43] PROBLEM - mw13 Current Load on mw13 is WARNING: WARNING - load average: 7.07, 6.26, 5.73 [02:54:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:54:52] RECOVERY - mail121 Puppet on mail121 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [02:55:08] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:55:19] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 6.28, 6.42, 6.06 [02:56:39] RECOVERY - mw13 Current Load on mw13 is OK: OK - load average: 6.43, 6.26, 5.79 [02:56:39] RECOVERY - mail121 HTTPS on mail121 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 427 bytes in 0.010 second response time [02:57:02] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 19.69, 20.15, 20.37 [02:57:25] [02puppet] 07Universal-Omega opened pull request 03#2236: mirahezerenewssl: support IPV6 - 13https://git.io/JSLsR [02:57:34] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 6.03, 6.68, 6.57 [02:58:14] paladox: ^ (I'm not sure if that'll work on current infra or not though) [02:58:31] no it won't [02:58:42] we can make the existing infra support it tho [02:58:48] you just change the ip to use ipv6 [03:01:04] paladox: https://github.com/miraheze/puppet/blob/f3201b1e6d55fbfcdceb6719be04aba50913eeea/modules/monitoring/files/scripts/ssl-renew.sh#L24? [03:01:05] [url] puppet/ssl-renew.sh at f3201b1e6d55fbfcdceb6719be04aba50913eeea · miraheze/puppet · GitHub | github.com [03:01:20] Or somewhere else? [03:01:32] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 6.74, 6.97, 6.74 [03:03:31] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.95, 6.58, 6.62 [03:04:39] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 27563 MB (6% inode=98%); [03:04:56] CosmicAlpha: that's the only place yeh [03:05:05] possible the monitor for the script aswell (not sure) [03:05:19] [02puppet] 07Universal-Omega synchronize pull request 03#2236: mirahezerenewssl: support IPV6 - 13https://git.io/JSLsR [03:06:06] paladox: ^ done [03:06:43] https://github.com/miraheze/puppet/blob/3eb0abd1b88858ff4153faae0bea10443b0f34b2/modules/letsencrypt/manifests/web.pp#L31 already uses ipaddress6 [03:06:43] [url] puppet/web.pp at 3eb0abd1b88858ff4153faae0bea10443b0f34b2 · miraheze/puppet · GitHub | github.com [03:10:33] [02puppet] 07paladox closed pull request 03#2236: mirahezerenewssl: support IPV6 - 13https://git.io/JSLsR [03:10:35] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JSLCC [03:10:36] [02miraheze/puppet] 07Universal-Omega 03468c299 - mirahezerenewssl: support IPV6 (#2236) [03:10:54] Thanks! [03:11:44] i fixed puppet on mwtask111 [03:11:55] just copied over Femiwiki from mwtask1 [03:12:31] RECOVERY - mwtask111 Puppet on mwtask111 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:12:37] well mw11 [03:12:48] paladox: Yeah probably good idea then, think you can do that on test101 also? [03:13:27] done [03:13:58] Thanks! [03:14:34] also synced over the ssl certs [03:14:46] those will need to be done again when we migrate [03:14:51] unless there were no changes [03:15:18] RECOVERY - test101 Puppet on test101 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:15:37] RECOVERY - mwtask111 MirahezeRenewSsl on mwtask111 is OK: TCP OK - 0.001 second response time on 2a10:6740::6:208 port 5000 [03:15:45] Thanks paladox! [03:16:08] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.55, 7.06, 6.33 [03:16:42] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::5ebc/cpweb [03:17:55] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 4 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 149.56.140.43/cpweb, 2607:5300:201:3100::5ebc/cpweb [03:18:07] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 6.06, 6.66, 6.27 [03:18:42] RECOVERY - mwtask1 MirahezeRenewSsl on mwtask1 is OK: TCP OK - 0.000 second response time on 2001:41d0:800:1bbd::15 port 5000 [03:18:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.10, 5.27, 5.85 [03:20:41] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 7.50, 6.85, 6.34 [03:20:42] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [03:23:07] [02puppet] 07Universal-Omega edited pull request 03#2230: mirahezerenewssl: use force=True for logging - 13https://git.io/JSks3 [03:23:52] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [03:23:54] paladox: ^ I think that can probably be done as well and should work on both infrastructures since mwtask1 is also on bullseye/python3.9. [03:24:21] [02puppet] 07paladox closed pull request 03#2230: mirahezerenewssl: use force=True for logging - 13https://git.io/JSks3 [03:24:22] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLBm [03:24:24] [02miraheze/puppet] 07Universal-Omega 03507ec7e - mirahezerenewssl: use force=True for logging (#2230) [03:24:31] Thanks! [03:25:24] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.63, 7.16, 6.82 [03:26:30] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 5.97, 6.64, 6.44 [03:27:23] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 9.02, 7.54, 6.98 [03:27:50] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 6 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::5ebc/cpweb [03:28:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.85, 5.66, 5.66 [03:29:22] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 6.50, 7.09, 6.88 [03:29:49] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [03:30:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.08, 5.10, 5.46 [03:31:40] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [03:32:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.39, 5.65, 5.62 [03:34:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.33, 5.10, 5.42 [03:37:19] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 4.91, 6.20, 6.62 [03:38:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [03:42:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.65, 6.12, 5.71 [03:45:04] [02puppet] 07Universal-Omega opened pull request 03#2237: Fix typo in `mediawiki::default_sync` for mwtask111 - 13https://git.io/JSL2a [03:45:11] paladox: ^ [03:46:05] [02puppet] 07paladox closed pull request 03#2237: Fix typo in `mediawiki::default_sync` for mwtask111 - 13https://git.io/JSL2a [03:46:07] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSL2b [03:46:08] [02miraheze/puppet] 07Universal-Omega 034070d42 - Fix typo in `mediawiki::default_sync` for mwtask111 (#2237) [03:46:27] Thanks again! [03:46:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.23, 5.84, 5.78 [03:47:46] deployed [03:48:00] Thanks! [03:49:56] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLVs [03:49:58] [02miraheze/mw-config] 07Universal-Omega 03b853dea - Test [03:50:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.98, 6.82, 6.11 [03:50:58] miraheze/mw-config - Universal-Omega the build passed. [03:51:26] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 25.53, 21.93, 20.47 [03:51:42] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [03:52:20] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:52:23] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [03:52:26] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [03:52:37] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 5 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 2607:5300:201:3100::929a/cpweb [03:53:21] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.44, 21.74, 20.57 [03:54:05] PROBLEM - mwtask111 Puppet on mwtask111 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [03:54:19] Hmm... [03:54:20] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 3.856 second response time [03:54:21] !log [@test3] starting deploy of {'config': True} to skip [03:54:22] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [03:54:23] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.699 second response time [03:54:29] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 3.482 second response time [03:54:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:54:36] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [03:54:40] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [03:55:17] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.47, 21.90, 20.72 [03:55:42] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [03:56:00] Yeah, not all the bots will be logged in as you cannot have more then 5 simultaneous logins to a single account [03:56:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.31, 5.76, 5.98 [03:57:12] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.31, 21.92, 20.84 [03:59:08] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.52, 20.09, 20.29 [04:00:03] RECOVERY - mwtask111 Puppet on mwtask111 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:01:07] I wonder why it's not logging syncs for SCSVG. It should be syncing. [04:02:20] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLKF [04:02:22] [02miraheze/mw-config] 07Universal-Omega 03787d282 - Revert "Test" [04:03:22] miraheze/mw-config - Universal-Omega the build passed. [04:03:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.91, 3.73, 3.92 [04:04:46] PROBLEM - wiki.landev.vn - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.landev.vn' expires in 15 day(s) (Tue 18 Jan 2022 03:58:21 GMT +0000). [04:05:27] PROBLEM - sbs.wiki - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'sbs.wiki' expires in 15 day(s) (Tue 18 Jan 2022 04:01:54 GMT +0000). [04:07:13] PROBLEM - correypedia.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'correypedia.org' expires in 15 day(s) (Tue 18 Jan 2022 03:59:17 GMT +0000). [04:08:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.26, 5.13, 5.38 [04:09:44] CosmicAlpha: Notice: /Stage[main]/Mediawiki/Exec[MediaWiki Config Sync]/returns: mon111.miraheze.org.miraheze.org [198.244.148.90] 5071 (?) : Network is unreachable [04:09:56] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.25, 3.89, 3.91 [04:09:58] somehow it's not using the ipv6 address??? [04:10:03] the ipv4 is cache proxies [04:10:04] PROBLEM - wiki.digitaldesignhq.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.digitaldesignhq.com' expires in 15 day(s) (Tue 18 Jan 2022 04:02:54 GMT +0000). [04:10:06] oh [04:10:07] hold on [04:10:10] why is it double [04:10:15] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLPK [04:10:17] [02miraheze/ssl] 07MirahezeSSLBot 032f46c46 - Bot: Update SSL cert for wiki.landev.vn [04:10:57] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.38, 5.12, 5.34 [04:10:58] PROBLEM - biblestrength.net - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'biblestrength.net' expires in 15 day(s) (Tue 18 Jan 2022 04:07:10 GMT +0000). [04:12:30] PROBLEM - wiki.apico.buzz - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.apico.buzz' expires in 15 day(s) (Tue 18 Jan 2022 04:05:09 GMT +0000). [04:12:39] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLXg [04:12:40] [02miraheze/puppet] 07paladox 03322e1a6 - hardcode mon111 ipv6 address [04:12:45] PROBLEM - biblestrength.net - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'biblestrength.net' expires in 15 day(s) (Tue 18 Jan 2022 04:07:10 GMT +0000). [04:12:56] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.86, 4.55, 5.10 [04:13:06] PROBLEM - pj-masks-info.cf - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'pj-masks-info.cf' expires in 15 day(s) (Tue 18 Jan 2022 04:04:11 GMT +0000). [04:14:06] PROBLEM - wiki.jumpbound.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.jumpbound.com' expires in 15 day(s) (Tue 18 Jan 2022 04:06:18 GMT +0000). [04:14:35] PROBLEM - steamdecklinux.wiki - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'steamdecklinux.wiki' expires in 15 day(s) (Tue 18 Jan 2022 04:08:10 GMT +0000). [04:15:31] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSL10 [04:15:33] [02miraheze/ssl] 07MirahezeSSLBot 0362db1d7 - Bot: Update SSL cert for correypedia.org [04:15:48] !log [paladox@puppet3] [04:15:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.50, 3.93, 3.94 [04:16:01] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:16:40] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSL1d [04:16:41] [02miraheze/ssl] 07MirahezeSSLBot 03cea50dc - Bot: Update SSL cert for wiki.jumpbound.com [04:17:57] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.17, 4.05, 3.99 [04:18:05] PROBLEM - mwtask111 Puppet on mwtask111 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [04:18:08] paladox: how come I can't connect to ssh like the other servers for the new infra, trying to connect to universalomega@mwtask111.miraheze.org gives mwtask111.miraheze.org: host does not exist. Also something weird, on PC in order to access icinga-new. grafana-new, test101.miraheze.org, etc... I have to be connected to the same proxy as for graylog, however on my mobile device I don't have to be, as I don't even have it. Am I [04:18:08] doing something wrong here? [04:18:31] what error do you get [04:18:43] note that you have to have an ipv6 address to connect to them [04:18:57] you can proxy via bastion [04:20:02] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLDt [04:20:04] [02miraheze/ssl] 07MirahezeSSLBot 034e792d5 - Bot: Update SSL cert for pj-masks-info.cf [04:23:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.11, 3.91, 3.98 [04:24:04] RECOVERY - mwtask111 Puppet on mwtask111 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:23] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [04:24:24] !log [@test3] starting deploy of {'config': True} to skip [04:24:25] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 1s [04:24:39] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:24:41] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 7 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::5ebc/cpweb [04:24:42] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [04:25:08] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:25:23] RECOVERY - wiki.landev.vn - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.landev.vn' will expire on Sat 02 Apr 2022 03:10:10 GMT +0000. [04:25:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:25:47] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.46, 4.99, 4.84 [04:25:52] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLSm [04:25:53] [02miraheze/ssl] 07MirahezeSSLBot 03b841f74 - Bot: Update SSL cert for steamdecklinux.wiki [04:26:22] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [04:26:38] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 1.376 second response time [04:26:40] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [04:26:42] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14564 bytes in 0.356 second response time [04:26:56] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLS2 [04:26:57] [02miraheze/ssl] 07MirahezeSSLBot 038c96b8a - Bot: Update SSL cert for wiki.digitaldesignhq.com [04:27:38] RECOVERY - correypedia.org - LetsEncrypt on sslhost is OK: OK - Certificate 'correypedia.org' will expire on Sat 02 Apr 2022 03:15:26 GMT +0000. [04:27:43] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.77, 4.81, 4.78 [04:28:01] RECOVERY - wiki.jumpbound.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.jumpbound.com' will expire on Sat 02 Apr 2022 03:16:34 GMT +0000. [04:29:11] !log [@mwtask111] [04:29:17] ohhh [04:29:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:30:48] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSL9d [04:30:49] [02miraheze/puppet] 07paladox 032825aee - Install netcat-openbsd globally [04:30:51] [02puppet] 07paladox created branch 03paladox-patch-4 - 13https://git.io/vbiAS [04:30:52] [02puppet] 07paladox opened pull request 03#2238: Install netcat-openbsd globally - 13https://git.io/JSL9b [04:31:10] !log [@mwtask111] [04:31:15] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:31:21] !log [@mwtask111] [04:31:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:31:31] !log [@mwtask111] [04:31:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:31:48] !log test [04:31:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:31:56] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.12, 2.65, 3.36 [04:31:57] oh damn sorry for spam [04:32:31] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLHE [04:32:33] [02miraheze/ssl] 07MirahezeSSLBot 03aa13068 - Bot: Update SSL cert for biblestrength.net [04:32:38] [02puppet] 07paladox closed pull request 03#2238: Install netcat-openbsd globally - 13https://git.io/JSL9b [04:32:40] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLHz [04:32:41] [02miraheze/puppet] 07paladox 03a24805e - Install netcat-openbsd globally (#2238) [04:32:43] [02puppet] 07paladox deleted branch 03paladox-patch-4 - 13https://git.io/vbiAS [04:32:44] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-4 [04:33:39] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLHd [04:33:40] [02miraheze/puppet] 07paladox 039cf7d03 - Revert "hardcode mon111 ipv6 address" [04:35:12] !log [paladox@mon2] test [04:35:15] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:35:51] !log [paladox@mon111] test [04:36:13] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:36:39] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 26708 MB (5% inode=98%); [04:36:49] CosmicAlpha: should log now [04:38:31] thanks paladox, I'm still trying to figure out how to get logged into mwtask111 now. I get connection timed out now, after setting up ssh proxy the same way we do graylog proxy but does not seem to be the same way... [04:38:53] CosmicAlpha: do you have a ipv6 address [04:39:05] you cannot access it if you only have an ipv4 address [04:40:03] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSL5U [04:40:04] [02miraheze/ssl] 07MirahezeSSLBot 035117b14 - Bot: Update SSL cert for sbs.wiki [04:41:16] paladox: I thought proxying via bast101 would work, no? But I can't figure that out. But I guess my WIFI does not have IPv6, which I actually did not know until just now... [04:41:31] proxing through bast101 would [04:41:36] you have an account, right? [04:41:59] bast101 has a ipv4 address [04:42:31] you can do ssh -vvv [04:43:39] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSLdC [04:43:41] [02miraheze/ssl] 07MirahezeSSLBot 03d332a37 - Bot: Update SSL cert for wiki.apico.buzz [04:47:03] paladox: account for what? [04:47:12] on bast101 [04:48:08] i don't think that it was setup to allow everyone to login (e.g. those who have full root) thinking about it now [04:48:48] don't think so. [04:51:46] PROBLEM - cloud3 Puppet on cloud3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[netcat-openbsd] [04:52:50] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.54, 3.51, 3.38 [04:54:21] well --- I guess I just need to get IPv6 working for my WIFI, or get something with IPv6 support ... I think I can enable support. Might try tomorrow. [04:55:31] RECOVERY - steamdecklinux.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'steamdecklinux.wiki' will expire on Sat 02 Apr 2022 03:25:46 GMT +0000. [04:56:43] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.49, 3.26, 3.34 [04:57:57] RECOVERY - biblestrength.net - LetsEncrypt on sslhost is OK: OK - Certificate 'biblestrength.net' will expire on Sat 02 Apr 2022 03:32:26 GMT +0000. [04:58:08] RECOVERY - wiki.digitaldesignhq.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.digitaldesignhq.com' will expire on Sat 02 Apr 2022 03:26:50 GMT +0000. [04:59:32] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.08, 6.92, 6.31 [04:59:53] RECOVERY - sbs.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'sbs.wiki' will expire on Sat 02 Apr 2022 03:39:57 GMT +0000. [04:59:55] RECOVERY - biblestrength.net - LetsEncrypt on sslhost is OK: OK - Certificate 'biblestrength.net' will expire on Sat 02 Apr 2022 03:32:26 GMT +0000. [05:00:18] RECOVERY - pj-masks-info.cf - LetsEncrypt on sslhost is OK: OK - Certificate 'pj-masks-info.cf' will expire on Sat 02 Apr 2022 03:19:56 GMT +0000. [05:00:23] RECOVERY - wiki.apico.buzz - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.apico.buzz' will expire on Sat 02 Apr 2022 03:43:34 GMT +0000. [05:01:51] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.80, 4.98, 4.33 [05:03:31] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 4.59, 5.92, 6.05 [05:03:47] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.53, 4.45, 4.21 [05:09:23] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.71, 3.39, 3.33 [05:11:19] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.94, 3.39, 3.35 [05:13:30] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.40, 4.99, 4.45 [05:15:26] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.67, 5.43, 4.69 [05:18:08] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.04, 3.59, 3.44 [05:19:18] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.46, 5.91, 5.05 [05:19:46] RECOVERY - cloud3 Puppet on cloud3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:20:05] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.22, 3.08, 3.27 [05:21:14] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.93, 5.15, 4.87 [05:25:07] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 2.91, 4.55, 4.75 [05:42:29] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.84, 3.55, 3.36 [05:44:25] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.83, 3.15, 3.23 [05:55:19] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.24, 4.72, 4.05 [05:57:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.92, 5.56, 4.44 [05:58:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [06:03:04] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.05, 4.58, 4.43 [06:04:40] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.43, 6.19, 4.20 [06:05:24] PROBLEM - db13 Current Load on db13 is WARNING: WARNING - load average: 7.50, 5.51, 3.39 [06:05:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [06:07:24] RECOVERY - db13 Current Load on db13 is OK: OK - load average: 6.46, 5.81, 3.77 [06:12:40] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 6.58, 7.57, 5.85 [06:13:06] PROBLEM - cp21 Stunnel Http for mw10 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [06:13:40] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 149.56.141.75/cpweb [06:14:40] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.75, 7.99, 6.21 [06:15:05] RECOVERY - cp21 Stunnel Http for mw10 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 1.680 second response time [06:15:40] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [06:16:39] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 27120 MB (6% inode=98%); [06:18:39] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 26448 MB (5% inode=98%); [06:20:39] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 31194 MB (7% inode=98%); [06:28:40] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 6.76, 7.74, 7.32 [06:30:39] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 26376 MB (5% inode=98%); [06:30:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [06:36:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [06:42:40] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.29, 6.21, 6.68 [06:48:41] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.81, 7.44, 7.08 [06:54:40] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 6.97, 7.77, 7.39 [06:56:40] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.41, 7.95, 7.49 [06:59:41] [02ssl] 07Universal-Omega opened pull request 03#461: Redirect test3 to betaheze - 13https://git.io/JStW4 [06:59:51] [02ssl] 07Universal-Omega edited pull request 03#461: Redirect test3 to betaheze - 13https://git.io/JStW4 [07:06:39] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 26727 MB (6% inode=98%); [07:08:39] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 26487 MB (5% inode=98%); [07:08:40] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 6.11, 7.34, 7.65 [07:14:40] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 9.81, 8.10, 7.84 [07:18:40] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.57, 7.76, 7.77 [07:23:23] [02ssl] 07Universal-Omega opened pull request 03#462: Remove explicit meta.orain.org redirect - 13https://git.io/JStWP [07:24:40] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.04, 7.31, 7.49 [07:25:29] PROBLEM - db13 Current Load on db13 is CRITICAL: CRITICAL - load average: 8.59, 11.49, 8.03 [07:26:40] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.50, 7.30, 7.46 [07:29:00] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-2 [+0/-0/±1] 13https://git.io/JStW9 [07:29:01] [02miraheze/mw-config] 07Universal-Omega 0359d6be3 - Split `$wgCreateWikiCannedResponses` into sections [07:29:03] [02mw-config] 07Universal-Omega created branch 03Universal-Omega-patch-2 - 13https://git.io/vbvb3 [07:29:04] [02mw-config] 07Universal-Omega opened pull request 03#4328: Split `$wgCreateWikiCannedResponses` into sections - 13https://git.io/JStWH [07:29:24] RECOVERY - db13 Current Load on db13 is OK: OK - load average: 2.59, 6.34, 6.67 [07:29:59] miraheze/mw-config - Universal-Omega the build passed. [07:36:40] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 6.24, 6.18, 6.78 [07:41:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [07:46:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [07:59:34] [02puppet] 07Universal-Omega opened pull request 03#2239: monitoring: add elasticsearch and bastion host groups - 13https://git.io/JStl3 [08:01:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [08:16:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [08:26:17] PROBLEM - bacula2 conntrack_table_size on bacula2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [08:26:19] PROBLEM - bacula2 PowerDNS Recursor on bacula2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [08:26:39] PROBLEM - bacula2 Current Load on bacula2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [08:26:39] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 30862 MB (6% inode=98%); [08:27:04] PROBLEM - bacula2 Disk Space on bacula2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [08:27:09] PROBLEM - bacula2 ferm_active on bacula2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [08:27:24] PROBLEM - ping6 on bacula2 is CRITICAL: PING CRITICAL - Packet loss = 100% [08:27:25] PROBLEM - Host bacula2 is DOWN: PING CRITICAL - Packet loss = 100% [08:31:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [08:31:57] paladox: bacula2 is down [08:52:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [09:00:39] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 26698 MB (5% inode=98%); [09:02:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [09:04:39] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 28079 MB (6% inode=98%); [09:10:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [09:16:39] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 26510 MB (5% inode=98%); [09:17:02] You'll just have to find room for a week or so [09:22:39] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 27053 MB (6% inode=98%); [09:24:39] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 26243 MB (5% inode=98%); [09:25:01] PROBLEM - cp30 Current Load on cp30 is WARNING: WARNING - load average: 1.51, 1.72, 1.06 [09:27:00] RECOVERY - cp30 Current Load on cp30 is OK: OK - load average: 0.51, 1.27, 0.97 [09:35:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [09:49:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [10:04:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [10:13:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [10:18:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [10:21:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [10:27:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.43, 19.23, 16.30 [10:28:40] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.84, 7.02, 6.17 [10:29:02] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 15.63, 18.15, 16.29 [10:30:40] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.92, 6.62, 6.13 [10:31:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [10:51:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [10:56:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [10:59:34] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.54, 7.45, 6.71 [10:59:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [11:01:32] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 6.19, 7.00, 6.65 [11:03:30] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 5.22, 6.43, 6.48 [11:04:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [11:17:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [11:32:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [11:38:39] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 27538 MB (6% inode=98%); [11:38:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [11:48:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [11:52:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [11:57:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [12:01:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [12:06:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [12:11:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [12:26:23] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 20.69, 19.16, 16.47 [12:28:21] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.15, 3.57, 3.10 [12:28:22] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 16.29, 18.12, 16.43 [12:30:17] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.26, 3.26, 3.05 [12:40:58] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.83, 3.50, 3.17 [12:42:54] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.00, 2.95, 3.01 [12:47:52] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.45, 6.01, 4.94 [12:49:22] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [12:49:51] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 6.28, 5.95, 5.04 [12:50:20] PROBLEM - cp20 Stunnel Http for mw10 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:50:23] PROBLEM - cp31 HTTP 4xx/5xx ERROR Rate on cp31 is CRITICAL: CRITICAL - NGINX Error Rate is 100% [12:50:24] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.363 second response time [12:50:32] PROBLEM - cp31 Stunnel Http for mw8 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:50:34] PROBLEM - cp31 Stunnel Http for mw10 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:50:40] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [12:50:41] PROBLEM - cp30 Varnish Backends on cp30 is CRITICAL: 4 backends are down. mw8 mw10 mw11 mw13 [12:50:42] PROBLEM - cp21 Stunnel Http for mw8 on cp21 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.008 second response time [12:50:46] PROBLEM - cp20 Varnish Backends on cp20 is CRITICAL: 7 backends are down. mw8 mw9 mw10 mw11 mw12 mw13 mediawiki [12:50:54] PROBLEM - cp21 Varnish Backends on cp21 is CRITICAL: 4 backends are down. mw8 mw9 mw10 mw11 [12:50:56] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:51:04] PROBLEM - cp31 Varnish Backends on cp31 is CRITICAL: 3 backends are down. mw8 mw10 mw11 [12:51:06] PROBLEM - cp21 Stunnel Http for mw13 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:51:10] PROBLEM - cp20 Stunnel Http for mw13 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:51:18] PROBLEM - mw13 MediaWiki Rendering on mw13 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:51:26] PROBLEM - cp20 Stunnel Http for mw11 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:51:29] PROBLEM - cp30 Stunnel Http for mw8 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:51:30] PROBLEM - cp31 Stunnel Http for mw11 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:51:34] PROBLEM - cp30 Stunnel Http for mw13 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:51:35] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:51:50] PROBLEM - cp31 HTTPS on cp31 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 5681 bytes in 5.081 second response time [12:52:22] PROBLEM - cp31 HTTP 4xx/5xx ERROR Rate on cp31 is WARNING: WARNING - NGINX Error Rate is 48% [12:52:28] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 6.963 second response time [12:53:01] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 6.092 second response time [12:53:26] RECOVERY - Host bacula2 is UP: PING OK - Packet loss = 0%, RTA = 75.56 ms [12:53:27] RECOVERY - bacula2 PowerDNS Recursor on bacula2 is OK: DNS OK: 0.340 seconds response time. miraheze.org returns 149.56.140.43,149.56.141.75,2607:5300:201:3100::5ebc,2607:5300:201:3100::929a [12:53:27] PROBLEM - bacula2 Bacula Databases db11 on bacula2 is CRITICAL: connect to address 2604:180:f3::382 port 5666: Connection timed outconnect to host 2604:180:f3::382 port 5666: Connection timed out [12:53:27] RECOVERY - bacula2 Current Load on bacula2 is OK: OK - load average: 0.48, 0.12, 0.04 RECOVERY - bacula2 Disk Space on bacula2 is OK: DISK OK - free space: / 2074308 MB (82% inode=99%); [12:53:37] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 7.611 second response time [12:53:37] RECOVERY - cp31 Stunnel Http for mw11 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 7.878 second response time [12:53:40] RECOVERY - bacula2 ferm_active on bacula2 is OK: OK ferm input default policy is set [12:53:51] RECOVERY - cp31 HTTPS on cp31 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 3125 bytes in 1.128 second response time [12:54:22] RECOVERY - cp31 HTTP 4xx/5xx ERROR Rate on cp31 is OK: OK - NGINX Error Rate is 8% [12:54:22] RECOVERY - cp20 Stunnel Http for mw10 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.013 second response time [12:54:32] RECOVERY - bacula2 conntrack_table_size on bacula2 is OK: OK: nf_conntrack is 0 % full [12:54:37] RECOVERY - cp31 Stunnel Http for mw10 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.704 second response time [12:54:41] RECOVERY - cp30 Varnish Backends on cp30 is OK: All 9 backends are healthy [12:54:41] RECOVERY - ping6 on bacula2 is OK: PING OK - Packet loss = 0%, RTA = 84.63 ms [12:54:46] RECOVERY - cp20 Varnish Backends on cp20 is OK: All 9 backends are healthy [12:54:54] RECOVERY - bacula2 Bacula Databases db11 on bacula2 is OK: OK: Full, 83645 files, 16.78GB, 2021-12-26 00:14:00 (1.1 weeks ago) [12:54:54] RECOVERY - cp21 Varnish Backends on cp21 is OK: All 9 backends are healthy [12:55:04] RECOVERY - cp31 Varnish Backends on cp31 is OK: All 9 backends are healthy [12:55:15] RECOVERY - cp21 Stunnel Http for mw13 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.883 second response time [12:55:16] RECOVERY - cp20 Stunnel Http for mw13 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.695 second response time [12:55:19] RECOVERY - mw13 MediaWiki Rendering on mw13 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 3.434 second response time [12:55:24] RECOVERY - cp20 Stunnel Http for mw11 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.006 second response time [12:55:30] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.23, 5.09, 4.19 [12:55:40] RECOVERY - cp30 Stunnel Http for mw13 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.335 second response time [12:55:44] RECOVERY - cp30 Stunnel Http for mw8 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 9.710 second response time [12:56:01] PROBLEM - mw8 MediaWiki Rendering on mw8 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:56:05] PROBLEM - cp20 Stunnel Http for mw8 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:56:40] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [12:57:02] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.70, 19.36, 17.19 [12:59:18] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [12:59:57] CosmicAlpha: thanks [13:00:02] RECOVERY - mw8 MediaWiki Rendering on mw8 is OK: HTTP OK: HTTP/1.1 200 OK - 20514 bytes in 3.433 second response time [13:00:04] RECOVERY - cp20 Stunnel Http for mw8 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 1.078 second response time [13:01:02] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 25.73, 21.52, 18.49 [13:01:04] RECOVERY - cp21 Stunnel Http for mw8 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14554 bytes in 5.489 second response time [13:01:11] RECOVERY - cp31 Stunnel Http for mw8 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 2.174 second response time [13:03:02] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.02, 19.99, 18.29 [13:06:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [13:10:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [13:14:42] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 6.84, 6.20, 5.58 [13:16:41] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 4.07, 5.52, 5.41 [13:20:36] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.37, 20.98, 19.75 [13:22:17] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 51.195.220.68/cpweb, 2607:5300:201:3100::5ebc/cpweb [13:22:39] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 198.244.148.90/cpweb [13:24:17] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [13:24:39] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [13:28:28] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.47, 3.35, 2.90 [13:28:47] [02mw-config] 07lens0021 opened pull request 03#4329: Add `$wgDarkModeTogglePosition` to ManageWikiSettings - 13https://git.io/JStuZ [13:30:13] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 15.38, 19.78, 20.16 [13:30:24] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 1.69, 2.81, 2.76 [13:31:16] [02mw-config] 07lens0021 edited pull request 03#4329: Add `$wgDarkModeTogglePosition` to ManageWikiSettings - 13https://git.io/JStuZ [13:40:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [13:44:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.85, 4.99, 5.77 [13:45:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [13:48:50] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.06, 3.53, 3.12 [13:50:46] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.36, 3.42, 3.12 [13:50:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.12, 5.28, 5.54 [13:51:27] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.30, 20.26, 19.45 [13:52:42] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.50, 3.27, 3.11 [13:53:23] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.29, 19.12, 19.13 [13:53:56] miraheze/mw-config - RhinosF1 the build passed. [13:54:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.67, 5.29, 5.53 [14:00:28] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.54, 3.45, 3.27 [14:00:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [14:02:24] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.40, 3.01, 3.12 [14:02:56] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.79, 4.30, 4.95 [14:05:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [14:10:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [14:10:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.30, 5.43, 5.22 [14:13:37] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.61, 20.27, 18.97 [14:14:58] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 1 datacenter is down: 51.195.220.68/cpweb [14:15:33] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 19.96, 20.00, 19.01 [14:15:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [14:16:57] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [14:20:53] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.18, 3.86, 3.43 [14:26:41] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.23, 3.72, 3.55 [14:26:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.33, 5.89, 5.94 [14:27:05] [02puppet] 07RhinosF1 synchronize pull request 03#2184: varnish: add new mw servers - 13https://git.io/JySOU [14:29:27] [02puppet] 07RhinosF1 synchronize pull request 03#2183: setup new mw* with stunnel & prometheus - 13https://git.io/JySYc [14:30:19] [02puppet] 07RhinosF1 synchronize pull request 03#2183: setup new mw* with stunnel & prometheus - 13https://git.io/JySYc [14:30:34] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.26, 3.12, 3.35 [14:30:37] paladox: they should be ok now ^ [14:30:44] [02miraheze/puppet] 07JohnFLewis pushed 032 commits to 03master [+2/-0/±13] 13https://git.io/JStwY [14:30:45] [02miraheze/puppet] 07JohnFLewis 038a7c94e - users: add bastion control group [14:30:47] [02miraheze/puppet] 07JohnFLewis 034f365af - Merge branch 'master' of github.com:/miraheze/puppet [14:30:47] ok [14:30:57] looking [14:31:23] [02puppet] 07paladox closed pull request 03#2183: setup new mw* with stunnel & prometheus - 13https://git.io/JySYc [14:31:25] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±4] 13https://git.io/JStwE [14:31:25] JohnLewis: space between gid: and 2005 [14:31:26] [02miraheze/puppet] 07RhinosF1 03b67a07c - setup new mw* with stunnel & prometheus (#2183) [14:32:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.26, 5.22, 5.53 [14:33:36] [02miraheze/puppet] 07JohnFLewis pushed 033 commits to 03master [+0/-0/±7] 13https://git.io/JStrc [14:33:38] [02miraheze/puppet] 07JohnFLewis 0341e3611 - spacing [14:33:39] [02miraheze/puppet] 07JohnFLewis 03d53ab9a - add SMART monitoring for cloud1* [14:33:41] [02miraheze/puppet] 07JohnFLewis 034d43178 - Merge branch 'master' of github.com:/miraheze/puppet [14:34:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.89, 5.47, 5.59 [14:37:49] paladox: what about https://git.io/JySOU [14:38:04] That sets up the probes and X-Debug but no backend [14:38:56] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 2.97, 4.15, 5.02 [14:46:18] i'm not sure that'll work because the backends aren't being used thus i think varnish may fail [14:46:40] paladox: they should [14:47:06] [02puppet] 07paladox closed pull request 03#2184: varnish: add new mw servers - 13https://git.io/JySOU [14:47:08] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSti7 [14:47:09] [02miraheze/puppet] 07RhinosF1 0328f019d - varnish: add new mw servers (#2184) [14:47:47] [02puppet] 07JohnFLewis closed pull request 03#2239: monitoring: add elasticsearch and bastion host groups - 13https://git.io/JStl3 [14:47:48] [02miraheze/puppet] 07JohnFLewis pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JStPk [14:47:50] [02miraheze/puppet] 07Universal-Omega 0349d5757 - monitoring: add elasticsearch and bastion host groups (#2239) [14:48:58] paladox: https://icinga.miraheze.org/monitoring/service/show?host=cp20&service=cp20%20Puppet [14:48:59] [url] Icinga Web 2 Login | icinga.miraheze.org [14:49:14] Something not listening on v6 [14:49:23] yeh it failed [14:49:26] reverting [14:49:28] Wait that became an error now [14:49:34] Can you paste error too [14:49:41] https://www.irccloud.com/pastebin/hbVzQOk0/ [14:50:10] paladox: hang on [14:50:14] PROBLEM - cp20 Puppet on cp20 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Exec[load-new-vcl-file] [14:50:16] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JStPx [14:50:17] [02miraheze/puppet] 07paladox 03dd574fd - Fix [14:50:46] Yey you fixed [14:50:55] I was just blind and killed a ; [14:51:14] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.58, 20.42, 19.24 [14:51:27] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 8.19, 6.58, 5.61 [14:51:48] works [14:52:13] RECOVERY - cp20 Puppet on cp20 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [14:52:19] PROBLEM - cp20 Stunnel Http for mw122 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:19] PROBLEM - cp20 Stunnel Http for mw121 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:27] PROBLEM - cp31 Stunnel Http for mw102 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:27] PROBLEM - cp30 Stunnel Http for mw122 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:29] PROBLEM - cp30 Stunnel Http for mw112 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:33] PROBLEM - cp30 Stunnel Http for mw111 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:33] PROBLEM - cp30 Stunnel Http for mw121 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:38] PROBLEM - cp31 Stunnel Http for mw111 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:49] PROBLEM - cp20 Stunnel Http for mw111 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:51] PROBLEM - cp30 Stunnel Http for mw101 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:58] PROBLEM - cp31 Stunnel Http for mw122 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:59] PROBLEM - cp31 Stunnel Http for mw101 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:52:59] PROBLEM - cp31 Stunnel Http for mw121 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:53:05] PROBLEM - cp20 Stunnel Http for mw112 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:53:06] PROBLEM - cp20 Stunnel Http for mw102 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:53:08] PROBLEM - cp31 Puppet on cp31 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[load-new-vcl-file] [14:53:11] PROBLEM - cp20 Stunnel Http for mw101 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:53:13] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 19.84, 20.23, 19.32 [14:53:14] PROBLEM - cp30 Stunnel Http for mw102 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:53:17] PROBLEM - cp31 Stunnel Http for mw112 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:53:23] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 6.60, 6.54, 5.71 [14:53:27] Yey [14:54:21] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSt10 [14:54:23] [02miraheze/puppet] 07paladox 03b826704 - varnish: Add icinga/grafana/phabricator [14:54:24] [02puppet] 07paladox created branch 03paladox-patch-4 - 13https://git.io/vbiAS [14:54:26] [02puppet] 07paladox opened pull request 03#2240: varnish: Add icinga/grafana/phabricator - 13https://git.io/JSt1z [14:54:45] PROBLEM - cp20 Varnish Backends on cp20 is CRITICAL: 6 backends are down. mw101 mw102 mw111 mw112 mw121 mw122 [14:54:49] paladox: stunnel failures will be expected until cp* move over to new puppet [14:54:53] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.27, 5.26, 5.01 [14:54:53] PROBLEM - cp30 Puppet on cp30 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[load-new-vcl-file] [14:54:57] ok [14:55:11] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSt1H [14:55:13] [02miraheze/puppet] 07paladox 0390c81d8 - Update default.vcl [14:55:15] [02puppet] 07paladox synchronize pull request 03#2240: varnish: Add icinga/grafana/phabricator - 13https://git.io/JSt1z [14:56:26] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JStML [14:56:27] [02miraheze/puppet] 07paladox 03d82b4ac - Update stunnel.conf [14:56:29] [02puppet] 07paladox synchronize pull request 03#2240: varnish: Add icinga/grafana/phabricator - 13https://git.io/JSt1z [14:56:50] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.94, 5.03, 4.95 [14:56:59] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JStM4 [14:57:00] [02miraheze/puppet] 07paladox 03429300c - Update stunnel4.pp [14:57:00] The backends check should work when JohnLewis adds databases [14:57:02] [02puppet] 07paladox synchronize pull request 03#2240: varnish: Add icinga/grafana/phabricator - 13https://git.io/JSt1z [14:57:09] dmehus: now it includes the new backends [14:57:40] RhinosF1, oh, okay, you mean the icinga alerts now monitor the new infrastructure? [14:57:56] dmehus: as of a few minutes ago ye [14:58:01] RhinosF1, ack [14:58:15] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JStM5 [14:58:17] [02miraheze/puppet] 07paladox 0344bb901 - Update nrpe.cfg.erb [14:58:18] [02puppet] 07paladox synchronize pull request 03#2240: varnish: Add icinga/grafana/phabricator - 13https://git.io/JSt1z [14:58:45] PROBLEM - cp21 Varnish Backends on cp21 is CRITICAL: 6 backends are down. mw101 mw102 mw111 mw112 mw121 mw122 [15:00:54] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JStDQ [15:00:55] [02miraheze/puppet] 07paladox 037c12411 - Update stunnel.conf [15:00:57] [02puppet] 07paladox synchronize pull request 03#2240: varnish: Add icinga/grafana/phabricator - 13https://git.io/JSt1z [15:01:40] [02puppet] 07RhinosF1 opened pull request 03#2241: [DO NOT MERGE] MediaWiki: Switchover to new infra - 13https://git.io/JStDp [15:02:10] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.01, 21.12, 19.93 [15:03:58] [02puppet] 07RhinosF1 opened pull request 03#2242: jobrunner: switch on new DC - 13https://git.io/JSty7 [15:04:34] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.39, 5.91, 5.33 [15:05:14] [02puppet] 07RhinosF1 synchronize pull request 03#2242: jobrunner: switch on new DC - 13https://git.io/JSty7 [15:05:16] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.97, 3.73, 3.29 [15:06:01] [02puppet] 07RhinosF1 synchronize pull request 03#2242: jobrunner: switch on new DC - 13https://git.io/JSty7 [15:06:09] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.49, 20.23, 19.88 [15:06:18] [02puppet] 07RhinosF1 synchronize pull request 03#2242: jobrunner: switch on new DC - 13https://git.io/JSty7 [15:06:31] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.12, 5.69, 5.33 [15:06:32] [02puppet] 07RhinosF1 synchronize pull request 03#2242: jobrunner: switch on new DC - 13https://git.io/JSty7 [15:06:48] [02puppet] 07RhinosF1 synchronize pull request 03#2242: jobrunner: switch on new DC - 13https://git.io/JSty7 [15:07:06] [02puppet] 07RhinosF1 synchronize pull request 03#2242: jobrunner: switch on new DC - 13https://git.io/JSty7 [15:07:10] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.42, 3.34, 3.20 [15:09:41] [02puppet] 07paladox closed pull request 03#2240: varnish: Add icinga/grafana/phabricator - 13https://git.io/JSt1z [15:09:42] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±4] 13https://git.io/JStHI [15:09:44] [02miraheze/puppet] 07paladox 03a5e9099 - varnish: Add icinga/grafana/phabricator (#2240) [15:09:45] [02puppet] 07paladox deleted branch 03paladox-patch-4 - 13https://git.io/vbiAS [15:09:47] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-4 [15:09:55] paladox: grab #2242 too pls [15:10:23] PROBLEM - cp20 Stunnel Http for mw102 on cp20 is UNKNOWN: CHECK_NRPE: Receive header underflow - only 0 bytes received (4 expected). [15:10:39] [02puppet] 07paladox closed pull request 03#2242: jobrunner: switch on new DC - 13https://git.io/JSty7 [15:10:40] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±7] 13https://git.io/JStHa [15:10:42] [02miraheze/puppet] 07RhinosF1 0314a0599 - jobrunner: switch on new DC (#2242) [15:11:47] [02miraheze/dns] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JStHj [15:11:49] [02miraheze/dns] 07paladox 0306bbe22 - Make Icinga-new and grafana-new use cache proxies [15:12:09] You're a star paladox [15:12:33] PROBLEM - cp20 Stunnel Http for mw102 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:13:47] PROBLEM - cp31 Stunnel Http for mw101 on cp31 is UNKNOWN: CHECK_NRPE: Receive header underflow - only 0 bytes received (4 expected). [15:14:27] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.64, 4.51, 4.99 [15:15:52] PROBLEM - cp31 Stunnel Http for mw101 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:16:32] RECOVERY - cp30 Puppet on cp30 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:55] RECOVERY - cp31 Puppet on cp31 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:19] PROBLEM - cp31 Varnish Backends on cp31 is CRITICAL: 6 backends are down. mw101 mw102 mw111 mw112 mw121 mw122 [15:17:44] PROBLEM - cp30 Varnish Backends on cp30 is CRITICAL: 6 backends are down. mw101 mw102 mw111 mw112 mw121 mw122 [15:20:27] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.66, 5.02, 5.02 [15:21:57] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.42, 20.72, 19.75 [15:21:58] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.27, 3.43, 3.18 [15:22:02] PROBLEM - cp20 Stunnel Http for phab121 on cp20 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 5450 bytes in 2.037 second response time [15:22:07] PROBLEM - cp21 Stunnel Http for phab121 on cp21 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 5450 bytes in 2.035 second response time [15:22:10] PROBLEM - cp21 Stunnel Http for mw102 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:22:12] PROBLEM - cp21 Stunnel Http for mw121 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:22:17] PROBLEM - cp30 Stunnel Http for phab121 on cp30 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 5450 bytes in 2.255 second response time [15:22:27] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.98, 4.69, 4.90 [15:22:29] PROBLEM - cp21 Stunnel Http for mw111 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:22:35] PROBLEM - cp21 Stunnel Http for mw112 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:22:41] PROBLEM - cp31 Stunnel Http for phab121 on cp31 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 5450 bytes in 2.374 second response time [15:22:55] PROBLEM - cp21 Stunnel Http for mw122 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:22:57] PROBLEM - cp21 Stunnel Http for mw101 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:23:54] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 16.87, 19.41, 19.40 [15:23:56] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 3.15, 3.24, 3.14 [15:26:29] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.00, 6.55, 6.09 [15:28:27] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.61, 6.15, 5.99 [15:31:41] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.51, 3.57, 3.32 [15:33:39] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.57, 3.87, 3.46 [15:34:39] SRE, I'm getting persistent 502s and 503s on `metawiki` and `annoyingorangewiki` for the past 30 minutes, trying to make 1 edit to 1 page, but cannot proceed. Any idea when this clear, or as to possible cause? [15:35:36] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.69, 3.45, 3.35 [15:35:53] s/when this/when this will [15:35:53] dmehus meant to say: SRE, I'm getting persistent 502s and 503s on `metawiki` and `annoyingorangewiki` for the past 30 minutes, trying to make 1 edit to 1 page, but cannot proceed. Any idea when this will clear, or as to possible cause? [15:36:34] dmehus Now I know why you asked. Because you were attempting to edit there, right? [15:37:24] darkmatterman450, on one of the wikis, yes, but I still get persistent 502s and 503s trying to load things like Special:Log on the other, yeah [15:39:30] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 1.99, 3.14, 3.28 [15:40:23] dmehus: meta loads fine for me [15:42:19] The old infra seems to have said if you don't want me then screw you since we started setting up all the new stuff [15:42:23] RhinosF1, it loads fine for me, but can't edit for the past 30 minutes, due to the 502s and 503s. Will try again and see if resolved [15:42:36] Ah right [15:42:38] I see [15:43:22] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.60, 3.88, 3.52 [15:43:37] Edits run fine for me [15:44:37] RhinosF1, hrm, I wonder if it could be the frontend/backend I'm using then? [15:44:49] can I try proxying to Meta via another cp? [15:44:56] You seem to worse affected than anyone else [15:45:05] You could try proxy through test [15:45:17] Or force a different backend [15:45:18] Remind me how to do that with MirahezeDebug [15:45:19] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.57, 3.67, 3.48 [15:45:29] or the latter, whichever you recommend [15:45:34] If you want to do test then just look for the test3 backend [15:46:20] well it will route me to the cp that's closest to me, can I not bypass that and use a different cp? [15:46:52] `There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Please resubmit the form.` on betaheze [15:47:00] I'll try clearing my betaheze cookies [15:47:02] That's your cookies [15:47:11] as I assume betaheze isn't using loginwiki for central login? [15:47:16] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.21, 3.18, 3.32 [15:47:20] No [15:47:24] ok [15:47:31] it uses betawiki [15:47:35] ah [15:47:38] ok [15:47:51] We've reset that many stuff setting up beta that if you've only used it once then it will be broken [15:49:51] did you reset passwords? [15:50:41] Yes [15:50:45] will try resetting password then [15:50:56] Please don't use the same password as production [15:51:01] ok [15:51:10] As much as it should be as secure [15:51:15] It's not the best [15:51:27] As it will be used for testing [15:51:48] ok [15:53:10] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.69, 3.70, 3.47 [15:53:29] If the user tables on betaheze are separate, how can I proxy to Meta then? [15:55:09] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.99, 3.82, 3.55 [15:56:31] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.38, 5.73, 5.05 [15:58:14] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 7.43, 6.74, 6.09 [15:58:26] RhinosF1: not sure if you saw my question above [15:58:45] dmehus: because it still has access [15:58:55] It's flipped based on which wiki you're accessing [15:59:05] RhinosF1, oh, ok [15:59:18] so remind me again how to proxy to meta using betaheze [15:59:26] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:59:28] there's a special page I need to use, correct? [15:59:51] Install the Miraheze debug chrome extension [15:59:57] oh [15:59:59] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:59:59] Click test3 [16:00:01] it's a Chrome extension [16:00:13] thought it was a MediaWiki extension lol [16:00:22] who built that? [16:01:18] I can't find it in the Chrome Web Store [16:01:24] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.332 second response time [16:01:33] [02puppet] 07RhinosF1 opened pull request 03#2243: SVSVG-mediawiki: up workers as we have more memory - 13https://git.io/JSqTi [16:01:54] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 0.466 second response time [16:02:14] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 4.81, 6.41, 6.16 [16:02:22] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.41, 5.22, 5.16 [16:02:51] [02puppet] 07RhinosF1 synchronize pull request 03#2243: SVSVG-mediawiki: up workers as we have more memory - 13https://git.io/JSqTi [16:03:03] dmehus: https://github.com/miraheze/MirahezeDebug [16:03:04] [url] GitHub - miraheze/MirahezeDebug | github.com [16:03:09] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.24, 3.76, 3.59 [16:03:10] [02puppet] 07RhinosF1 synchronize pull request 03#2243: SVSVG-mediawiki: up workers as we have more memory - 13https://git.io/JSqTi [16:03:28] [02puppet] 07RhinosF1 synchronize pull request 03#2243: SVSVG-mediawiki: up workers as we have more memory - 13https://git.io/JSqTi [16:03:38] MacFan4000, thanks. seems like too much work to try and manually install and the install docs are scant, how can I switch the cache proxy I'm using? [16:03:43] I'd rather try that [16:03:59] https://www.mattcutts.com/blog/how-to-install-a-chrome-extension-from-github/ [16:03:59] [url] How to install a Chrome extension from GitHub | www.mattcutts.com [16:04:04] [02puppet] 07RhinosF1 synchronize pull request 03#2243: SVSVG-mediawiki: up workers as we have more memory - 13https://git.io/JSqTi [16:04:09] Only with the extension can that be done [16:04:21] [02puppet] 07RhinosF1 synchronize pull request 03#2243: SVSVG-mediawiki: up workers as we have more memory - 13https://git.io/JSqTi [16:04:22] oh [16:04:53] Switching cache proxy is harder [16:04:58] As you'd have to trick dns [16:05:11] paladox: can you merge 2243 [16:05:30] [02puppet] 07paladox closed pull request 03#2243: SVSVG-mediawiki: up workers as we have more memory - 13https://git.io/JSqTi [16:05:31] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±6] 13https://git.io/JSqY9 [16:05:33] [02miraheze/puppet] 07RhinosF1 03e2d5191 - SVSVG-mediawiki: up workers as we have more memory (#2243) [16:05:40] Thank you! [16:05:58] Commit message was wrong [16:06:00] But meh [16:06:05] I type too fast [16:08:15] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.70, 4.85, 5.01 [16:09:35] RhinosF1, yeah, using a VPN located near the cache proxy one is trying to use, I assume? Assuming so, that's definitely more involved [16:10:12] dmehus: you can also override your nameservers on most computers [16:10:16] But that's also a way [16:11:10] RhinosF1, oh, yeah, but that wouldn't change my IP, but if you're saying you can just override it locally with a Windows Host file change, yeah, that's definitely easier [16:11:27] It should work [16:12:09] * dmehus will try editing the page in question again, hoping the problem as subsided [16:13:09] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.65, 3.80, 3.83 [16:14:15] nope [16:14:37] I seem to be hitting mw8 and cp30 on `testwiki`, which, oddly, loads fine. It's just `metawiki` [16:15:28] RhinosF1, can you try proxying to `metawiki` via cp30 and/or mw8? [16:16:30] I load fine on mw11 / cp21 [16:16:38] I'm mobile at moment [16:17:09] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.08, 4.04, 3.93 [16:17:51] hrm [16:18:31] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSqST [16:18:33] [02miraheze/puppet] 07paladox 03878f912 - Fix typo [16:19:09] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.89, 3.58, 3.77 [16:19:16] RECOVERY - cloud10 SMART on cloud10 is OK: OK: [cciss,0] - Device is clean --- [cciss,1] - Device is clean --- [cciss,2] - Device is clean --- [cciss,3] - Device is clean --- [cciss,4] - Device is clean --- [cciss,5] - Device is clean --- [cciss,6] - Device is clean [16:19:42] PROBLEM - cp30 Stunnel Http for mw102 on cp30 is UNKNOWN: CHECK_NRPE: Receive header underflow - only 0 bytes received (4 expected). [16:19:44] RECOVERY - cloud11 SMART on cloud11 is OK: OK: [cciss,0] - Device is clean --- [cciss,1] - Device is clean --- [cciss,2] - Device is clean --- [cciss,3] - Device is clean --- [cciss,4] - Device is clean --- [cciss,5] - Device is clean --- [cciss,6] - Device is clean [16:19:59] RECOVERY - cloud12 SMART on cloud12 is OK: OK: [cciss,0] - Device is clean --- [cciss,1] - Device is clean --- [cciss,2] - Device is clean --- [cciss,3] - Device is clean --- [cciss,4] - Device is clean --- [cciss,5] - Device is clean --- [cciss,6] - Device is clean [16:20:16] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.97, 5.01, 4.89 [16:20:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb, 2607:5300:201:3100::5ebc/cpweb [16:20:51] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.12, 20.72, 19.47 [16:21:27] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:21:36] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:21:38] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:21:49] PROBLEM - cp30 Stunnel Http for mw102 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:21:58] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:22:16] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.95, 5.03, 4.92 [16:22:22] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 3 datacenters are down: 51.195.220.68/cpweb, 149.56.140.43/cpweb, 2607:5300:201:3100::5ebc/cpweb [16:22:51] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 19.93, 20.23, 19.44 [16:23:31] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14564 bytes in 3.890 second response time [16:23:32] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 2.534 second response time [16:23:36] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.854 second response time [16:23:53] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 0.553 second response time [16:24:22] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [16:24:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [16:26:16] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.94, 5.55, 5.15 [16:28:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.56, 5.75, 5.26 [16:30:16] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.91, 5.77, 5.33 [16:31:09] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 3.35, 3.04, 3.35 [16:33:32] [02mw-config] 07Universal-Omega commented on pull request 03#4328: Split `$wgCreateWikiCannedResponses` into sections - 13https://git.io/JSmgd [16:34:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.93, 6.48, 5.65 [16:35:52] [02miraheze/jobrunner-service] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JSmKz [16:35:53] [02miraheze/jobrunner-service] 07paladox 0308c8fcc - Add support for ipv6 [16:35:55] [02jobrunner-service] 07paladox created branch 03paladox-patch-1 - 13https://git.io/JYA8S [16:35:56] [02jobrunner-service] 07paladox opened pull request 03#2: Add support for ipv6 - 13https://git.io/JSmK6 [16:36:28] paladox: that won't fix it [16:36:39] Port is still defined
: [16:36:55] Both need to be separated [16:37:19] Oh, i guess a new section for the port? [16:37:39] IPv6 standard is [
]:port [16:37:56] so config should read like that [16:38:17] ohhhhhh [16:38:33] JohnLewis: how would i get it to do that? [16:39:33] https://www.irccloud.com/pastebin/9y9YFIFR/ [16:39:41] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 6.79, 7.05, 6.50 [16:40:40] the code still splits the first : [16:41:09] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.05, 3.52, 3.47 [16:41:49] PROBLEM - mw111 JobRunner Service on mw111 is CRITICAL: PROCS CRITICAL: 0 processes with args 'redisJobRunnerService' [16:42:25] JohnLewis: it needs to be last [16:42:36] Also why did jobchron never alert [16:42:54] no, it doesn't need to be last [16:43:04] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [16:43:08] It needs to correct recognise the IPv6, and probably use preg_split [16:43:10] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.36, 3.17, 3.35 [16:43:13] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:43:14] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 5 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb [16:43:41] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.69, 6.43, 6.39 [16:43:48] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:43:51] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:43:54] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:43:58] PROBLEM - cp30 Stunnel Http for mw8 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:43:58] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:44:02] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:44:08] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:44:15] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:44:23] https://stackoverflow.com/questions/19275221/regex-to-determine-if-ipv6-or-ipv4-and-if-port-is-given from a google search [16:44:23] [url] php - Regex to determine, if IPv6 or IPv4 and if port is given - Stack Overflow | stackoverflow.com [16:45:02] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:45:24] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.20, 20.98, 19.95 [16:45:36] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:45:56] RECOVERY - cp30 Stunnel Http for mw8 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 0.998 second response time [16:45:56] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 9.040 second response time [16:46:04] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 4.576 second response time [16:46:11] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 3.564 second response time [16:46:46] https://stackoverflow.com/questions/19275221/regex-to-determine-if-ipv6-or-ipv4-and-if-port-is-given [16:46:47] [url] php - Regex to determine, if IPv6 or IPv4 and if port is given - Stack Overflow | stackoverflow.com [16:46:53] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [16:46:54] oh i got the same as you :P [16:47:03] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 3.561 second response time [16:47:13] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [16:47:16] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.716 second response time [16:47:30] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.888 second response time [16:47:47] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 1.688 second response time [16:47:59] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 1.233 second response time [16:48:01] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 1.180 second response time [16:48:26] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14564 bytes in 1.591 second response time [16:50:41] @raidarr: if a specific page won't load then it means it's taking too long to parse [16:51:00] Someone must have put a stupid template on [16:51:03] https://www.irccloud.com/pastebin/6YqqSdFK/ [16:51:12] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 7 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb [16:52:04] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:52:06] Wonderful, presumably not the only page with that problem then [16:52:15] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.37, 5.68, 5.95 [16:52:18] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:52:25] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:52:43] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:52:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 51.195.220.68/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2607:5300:201:3100::5ebc/cpweb [16:53:15] I can't see anything super obvious [16:53:22] But that's definitely slow [16:53:38] paladox: you want preg_split though [16:53:41] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:53:47] because it's a list = explode [16:53:49] oh [16:54:16] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.27, 6.46, 6.21 [16:54:20] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 5.549 second response time [16:54:30] I can try and profile it later @raidarr [16:54:30] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 8.034 second response time [16:54:38] JohnLewis: [16:54:39] https://www.irccloud.com/pastebin/7fZDiNEI/ [16:54:49] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 6.877 second response time [16:55:11] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [16:55:41] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 4.512 second response time [16:56:08] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 5.192 second response time [16:56:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [16:57:09] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.97, 19.87, 20.01 [16:57:10] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.66, 3.60, 3.36 [16:58:18] [02mw-config] 07Universal-Omega commented on pull request 03#4329: Add `$wgDarkModeTogglePosition` to ManageWikiSettings - 13https://git.io/JSYVG [16:58:24] [02mw-config] 07Universal-Omega closed pull request 03#4329: Add `$wgDarkModeTogglePosition` to ManageWikiSettings - 13https://git.io/JStuZ [16:58:26] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JSYVc [16:58:27] [02miraheze/mw-config] 07lens0021 031f4943a - Add `$wgDarkModeTogglePosition` to ManageWikiSettings (#4329) [16:58:31] you'll just have to add another line for the list on matches [16:59:11] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.83, 4.00, 3.53 [16:59:31] miraheze/mw-config - Universal-Omega the build passed. [17:00:17] PROBLEM - mw8 MediaWiki Rendering on mw8 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:00:28] PROBLEM - cp30 Stunnel Http for mw8 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:00:31] PROBLEM - cp20 Stunnel Http for mw8 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:00:49] JohnLewis: i found https://github.com/wikimedia/ip-utils/blob/master/src/IPUtils.php#L337 [17:00:49] [url] ip-utils/IPUtils.php at master · wikimedia/ip-utils · GitHub | github.com [17:01:09] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.62, 3.87, 3.54 [17:01:13] PROBLEM - cp31 Stunnel Http for mw8 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:01:31] that's also an option [17:02:06] [02miraheze/CreateWiki] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSYwg [17:02:07] [02miraheze/CreateWiki] 07Universal-Omega 03ee70f38 - Use key rather than 'name' when disabling extensions [17:02:20] PROBLEM - cp21 Stunnel Http for mw8 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:02:57] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.42, 21.16, 20.51 [17:03:09] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.91, 3.37, 3.39 [17:03:18] RECOVERY - cp31 Stunnel Http for mw8 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 8.352 second response time [17:03:37] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 5 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 149.56.140.43/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [17:04:05] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:04:15] RECOVERY - cp21 Stunnel Http for mw8 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 0.014 second response time [17:04:16] RECOVERY - mw8 MediaWiki Rendering on mw8 is OK: HTTP OK: HTTP/1.1 200 OK - 20514 bytes in 0.275 second response time [17:04:29] RECOVERY - cp20 Stunnel Http for mw8 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 0.094 second response time [17:04:34] RECOVERY - cp30 Stunnel Http for mw8 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 2.482 second response time [17:04:55] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:05:23] [02CreateWiki] 07Universal-Omega edited pull request 03#272: Use selectorother type for canned responses to support custom - 13https://git.io/JSv2X [17:05:32] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [17:05:40] RECOVERY - mw111 JobRunner Service on mw111 is OK: PROCS OK: 1 process with args 'redisJobRunnerService' [17:06:09] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 9.598 second response time [17:07:42] miraheze/CreateWiki - Universal-Omega the build passed. [17:08:40] [02CreateWiki] 07Universal-Omega closed pull request 03#272: Use selectorother type for canned responses to support custom - 13https://git.io/JSv2X [17:08:41] [02miraheze/CreateWiki] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSYPw [17:08:43] [02miraheze/CreateWiki] 07Universal-Omega 03b36257c - Use selectorother type for canned responses to support custom (#272) [17:08:44] [02miraheze/CreateWiki] 07Universal-Omega deleted branch 03Universal-Omega-patch-1 [17:08:46] [02CreateWiki] 07Universal-Omega deleted branch 03Universal-Omega-patch-1 - 13https://git.io/vpJTL [17:09:01] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 3.095 second response time [17:11:31] !log [@mw11] starting deploy of {'config': True} to ovlon [17:12:07] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:12:55] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 6.61, 6.90, 6.43 [17:13:11] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.93, 4.19, 3.69 [17:13:26] [02miraheze/jobrunner-service] 07paladox pushed 031 commit to 03paladox-patch-2 [+0/-0/±1] 13https://git.io/JSYy0 [17:13:28] [02miraheze/jobrunner-service] 07paladox 036b6f7eb - Add support for ipv6 [17:13:29] [02jobrunner-service] 07paladox created branch 03paladox-patch-2 - 13https://git.io/JYA8S [17:13:34] [02jobrunner-service] 07paladox opened pull request 03#3: Add support for ipv6 - 13https://git.io/JSYy2 [17:13:54] JohnLewis: ^ lgty? [17:14:30] miraheze/CreateWiki - Universal-Omega the build passed. [17:14:51] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.12, 19.75, 20.34 [17:15:09] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.64, 3.80, 3.60 [17:15:43] paladox: what is "self::RE_IPV6_ADD"? [17:15:59] [02miraheze/jobrunner-service] 07paladox pushed 031 commit to 03paladox-patch-2 [+0/-0/±1] 13https://git.io/JSYSa [17:16:00] i forgot to add it, now added. JohnLewis [17:16:00] [02miraheze/jobrunner-service] 07paladox 0342d9fb8 - Update RedisJobService.php [17:16:02] [02jobrunner-service] 07paladox synchronize pull request 03#3: Add support for ipv6 - 13https://git.io/JSYy2 [17:16:27] [02jobrunner-service] 07paladox edited pull request 03#3: Add support for ipv6 - 13https://git.io/JSYy2 [17:16:49] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 6.72, 6.65, 6.45 [17:17:10] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.09, 3.94, 3.68 [17:17:18] heh I like the "lgty" abbreviation, paladox. Reverse of "lgtm" [17:17:30] heh [17:17:49] !log [@test101] starting deploy of {'config': True} to skip [17:17:50] !log [@test101] DEPLOY ABORTED: Canary check failed for localhost [17:18:03] paladox: bar the fact the last const would be a syntax error, looks good [17:18:15] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.57, 5.17, 5.91 [17:18:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:18:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:19:01] [02miraheze/jobrunner-service] 07paladox pushed 031 commit to 03paladox-patch-2 [+0/-0/±1] 13https://git.io/JSY9N [17:19:02] [02miraheze/jobrunner-service] 07paladox 032de7e22 - Update RedisJobService.php [17:19:04] Fixed, thanks! [17:19:04] [02jobrunner-service] 07paladox synchronize pull request 03#3: Add support for ipv6 - 13https://git.io/JSYy2 [17:19:09] merging [17:19:09] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.36, 3.78, 3.65 [17:19:17] PROBLEM - mw11 Puppet on mw11 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [17:19:50] [02jobrunner-service] 07paladox closed pull request 03#3: Add support for ipv6 - 13https://git.io/JSYy2 [17:19:51] [02miraheze/jobrunner-service] 07paladox pushed 034 commits to 03paladox-patch-1 [+0/-0/±4] 13https://git.io/JSYHI [17:19:53] [02miraheze/jobrunner-service] 07paladox 038d325d4 - Merge pull request #3 from miraheze/paladox-patch-2 [17:19:54] [02jobrunner-service] 07paladox synchronize pull request 03#2: Add support for ipv6 - 13https://git.io/JSmK6 [17:20:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.57, 5.87, 6.08 [17:20:20] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSYHC [17:20:21] [02miraheze/puppet] 07paladox 03c529432 - Fix ipv6 address for new mw cluster (redis) [17:20:23] [02puppet] 07paladox created branch 03paladox-patch-4 - 13https://git.io/vbiAS [17:20:24] [02puppet] 07paladox opened pull request 03#2244: Fix ipv6 address for new mw cluster (redis) - 13https://git.io/JSYH8 [17:20:39] PROBLEM - test101 Puppet on test101 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [17:20:43] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSYH2 [17:20:44] [02miraheze/puppet] 07paladox 03b300d87 - Update mw102.yaml [17:20:46] [02puppet] 07paladox synchronize pull request 03#2244: Fix ipv6 address for new mw cluster (redis) - 13https://git.io/JSYH8 [17:20:53] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.13, 22.17, 21.11 [17:20:58] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSYHo [17:20:59] [02miraheze/puppet] 07paladox 03fbb068e - Update mw111.yaml [17:21:01] [02puppet] 07paladox synchronize pull request 03#2244: Fix ipv6 address for new mw cluster (redis) - 13https://git.io/JSYH8 [17:21:11] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSYHM [17:21:13] [02miraheze/puppet] 07paladox 0302be319 - Update mw112.yaml [17:21:14] [02puppet] 07paladox synchronize pull request 03#2244: Fix ipv6 address for new mw cluster (redis) - 13https://git.io/JSYH8 [17:21:23] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSYHH [17:21:24] [02miraheze/puppet] 07paladox 0334cbeb5 - Update mw121.yaml [17:21:26] [02puppet] 07paladox synchronize pull request 03#2244: Fix ipv6 address for new mw cluster (redis) - 13https://git.io/JSYH8 [17:21:37] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSYHN [17:21:38] [02miraheze/puppet] 07paladox 031b9535f - Update mw122.yaml [17:21:40] [02puppet] 07paladox synchronize pull request 03#2244: Fix ipv6 address for new mw cluster (redis) - 13https://git.io/JSYH8 [17:21:48] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSYHh [17:21:50] [02miraheze/puppet] 07paladox 0300e8250 - Update mwtask111.yaml [17:21:51] [02puppet] 07paladox synchronize pull request 03#2244: Fix ipv6 address for new mw cluster (redis) - 13https://git.io/JSYH8 [17:21:59] !log [@mwtask111] starting deploy of {'config': True} to scsvg [17:22:00] !log [@mwtask111] DEPLOY ABORTED: Canary check failed for localhost [17:22:01] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 3 datacenters are down: 51.195.220.68/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb [17:22:07] [02puppet] 07paladox closed pull request 03#2244: Fix ipv6 address for new mw cluster (redis) - 13https://git.io/JSYH8 [17:22:08] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±7] 13https://git.io/JSYQT [17:22:10] [02miraheze/puppet] 07paladox 039d3f342 - Fix ipv6 address for new mw cluster (redis) (#2244) [17:22:10] Ooh, it does something [17:22:11] [02puppet] 07paladox deleted branch 03paladox-patch-4 - 13https://git.io/vbiAS [17:22:13] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-4 [17:22:26] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:22:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:23:09] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.19, 4.03, 3.79 [17:24:00] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [17:24:09] [02jobrunner-service] 07paladox deleted branch 03paladox-patch-2 - 13https://git.io/JYA8S [17:24:11] [02miraheze/jobrunner-service] 07paladox deleted branch 03paladox-patch-2 [17:24:26] [02jobrunner-service] 07paladox closed pull request 03#2: Add support for ipv6 - 13https://git.io/JSmK6 [17:24:28] [02miraheze/jobrunner-service] 07paladox pushed 036 commits to 03master [+0/-0/±6] 13https://git.io/JSYQh [17:24:29] !log [@test3] starting deploy of {'config': True} to skip [17:24:29] [02miraheze/jobrunner-service] 07paladox 0358a49a3 - Merge pull request #2 from miraheze/paladox-patch-1 [17:24:30] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [17:24:31] [02jobrunner-service] 07paladox deleted branch 03paladox-patch-1 - 13https://git.io/JYA8S [17:24:32] [02miraheze/jobrunner-service] 07paladox deleted branch 03paladox-patch-1 [17:24:34] CosmicAlpha: can you please check prod sync [17:24:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:24:54] mw11 puppet failed [17:24:56] it works :D [17:25:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:25:07] Ty paladox [17:25:28] PROBLEM - mwtask111 Puppet on mwtask111 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [17:26:02] mirahezebots_: we know [17:26:14] You cant expect it to pass with no database [17:27:25] paladox: jobchron needs new ip too [17:27:26] RECOVERY - mwtask111 Puppet on mwtask111 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:27:28] I think [17:27:42] what do you mean? [17:27:57] paladox: in heira [17:28:13] ok, i still not sure what you mean? [17:28:15] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.52, 5.69, 6.00 [17:28:23] paladox: https://github.com/miraheze/puppet/blob/master/hieradata/hosts/jobchron121.yaml#L8 [17:28:23] [url] puppet/jobchron121.yaml at master · miraheze/puppet · GitHub | github.com [17:28:27] oh i see [17:28:34] Although no idea why it worked there [17:28:37] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSY5O [17:28:38] [02miraheze/puppet] 07paladox 03523098f - Fix [17:30:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 5.49, 5.86, 6.04 [17:30:37] RECOVERY - test101 Puppet on test101 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [17:31:10] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.49, 3.87, 3.89 [17:32:15] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.00, 5.24, 5.79 [17:32:50] PROBLEM - db11 Disk Space on db11 is CRITICAL: DISK CRITICAL - free space: / 24933 MB (5% inode=97%); [17:32:51] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.54, 19.32, 20.38 [17:33:09] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.59, 3.95, 3.91 [17:41:10] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.81, 3.82, 3.89 [17:43:15] RECOVERY - mw11 Puppet on mw11 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:43:58] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.89, 7.18, 6.68 [17:44:06] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.46, 6.81, 6.44 [17:44:15] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.27, 4.06, 4.93 [17:44:20] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:44:31] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:44:31] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:45:55] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 6.08, 6.76, 6.59 [17:46:05] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.18, 6.14, 6.24 [17:46:20] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 4.795 second response time [17:46:34] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 5.956 second response time [17:46:34] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 6.557 second response time [17:47:50] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 1 datacenter is down: 2607:5300:201:3100::5ebc/cpweb [17:48:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 2001:41d0:801:2000::4c25/cpweb, 149.56.140.43/cpweb, 2607:5300:201:3100::929a/cpweb [17:50:40] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:50:59] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:50:59] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:51:04] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 7.00, 6.33, 5.67 [17:51:06] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:51:09] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.15, 3.67, 3.62 [17:51:36] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.57, 19.89, 19.35 [17:51:42] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:51:43] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:52:24] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:52:34] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:53:00] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 5.63, 5.94, 5.60 [17:53:10] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.74, 3.42, 3.55 [17:53:43] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 3.311 second response time [17:54:16] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.50, 5.32, 5.16 [17:54:22] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 2.201 second response time [17:54:43] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 9.486 second response time [17:55:31] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 19.06, 20.13, 19.61 [17:55:57] PROBLEM - cp30 Stunnel Http for mw8 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:56:13] PROBLEM - cp31 Stunnel Http for mw8 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:57:09] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.79, 3.01, 3.36 [17:57:16] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 8.717 second response time [17:57:17] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 8.372 second response time [17:57:17] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 8.488 second response time [17:58:06] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 4.803 second response time [17:58:16] RECOVERY - cp31 Stunnel Http for mw8 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 6.793 second response time [17:58:58] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 8.729 second response time [18:00:02] RECOVERY - cp30 Stunnel Http for mw8 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 1.428 second response time [18:00:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.11, 5.45, 5.25 [18:00:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [18:00:50] PROBLEM - db11 Disk Space on db11 is WARNING: DISK WARNING - free space: / 27680 MB (6% inode=97%); [18:02:15] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.96, 5.42, 5.27 [18:03:45] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [18:04:15] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.77, 4.90, 5.10 [18:08:51] PROBLEM - mw10 MediaWiki Rendering on mw10 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:09:02] PROBLEM - cp31 Stunnel Http for mw10 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:09:11] PROBLEM - cp20 Stunnel Http for mw10 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:09:42] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 2001:41d0:801:2000::4c25/cpweb, 2607:5300:201:3100::929a/cpweb [18:09:47] PROBLEM - cp30 Stunnel Http for mw10 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:09:55] PROBLEM - cp21 Stunnel Http for mw10 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:10:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 2001:41d0:801:2000::4c25/cpweb [18:11:11] PROBLEM - mw13 MediaWiki Rendering on mw13 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:15:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [18:15:40] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [18:17:09] RECOVERY - mw10 MediaWiki Rendering on mw10 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 10.000 second response time [18:17:19] RECOVERY - cp31 Stunnel Http for mw10 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.011 second response time [18:17:33] RECOVERY - mw13 MediaWiki Rendering on mw13 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 9.666 second response time [18:17:37] RECOVERY - cp20 Stunnel Http for mw10 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.018 second response time [18:18:07] RECOVERY - cp21 Stunnel Http for mw10 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14571 bytes in 3.652 second response time [18:18:14] RECOVERY - cp30 Stunnel Http for mw10 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.603 second response time [18:19:12] PROBLEM - mw13 Current Load on mw13 is WARNING: WARNING - load average: 7.13, 6.51, 5.74 [18:19:38] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [18:21:12] RECOVERY - mw13 Current Load on mw13 is OK: OK - load average: 6.62, 6.42, 5.80 [18:21:29] PROBLEM - mw10 MediaWiki Rendering on mw10 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:21:55] PROBLEM - mw13 MediaWiki Rendering on mw13 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:22:07] PROBLEM - cp20 Stunnel Http for mw10 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:22:15] PROBLEM - cp20 Stunnel Http for mw13 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:22:26] PROBLEM - cp21 Stunnel Http for mw10 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:22:31] PROBLEM - cp31 Stunnel Http for mw13 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:22:39] PROBLEM - cp30 Stunnel Http for mw10 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:22:40] PROBLEM - cp30 Stunnel Http for mw13 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:23:00] PROBLEM - mw11 MediaWiki Rendering on mw11 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:23:02] PROBLEM - cp21 Stunnel Http for mw13 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:23:13] PROBLEM - cp20 Stunnel Http for mw11 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:23:24] PROBLEM - cp30 Stunnel Http for mw11 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:23:34] PROBLEM - cp31 Stunnel Http for mw11 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:23:45] PROBLEM - cp21 Stunnel Http for mw11 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:23:50] PROBLEM - cp31 Stunnel Http for mw10 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:24:09] !log [@test3] starting deploy of {'config': True} to skip [18:24:10] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 1s [18:24:35] RECOVERY - cp31 Stunnel Http for mw13 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 8.527 second response time [18:24:39] RECOVERY - cp30 Stunnel Http for mw13 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 4.192 second response time [18:25:00] RECOVERY - cp21 Stunnel Http for mw13 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 1.277 second response time [18:25:30] RECOVERY - cp30 Stunnel Http for mw11 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 6.992 second response time [18:25:36] RECOVERY - cp31 Stunnel Http for mw11 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 5.612 second response time [18:25:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:25:51] RECOVERY - cp21 Stunnel Http for mw11 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 5.916 second response time [18:26:00] RECOVERY - mw13 MediaWiki Rendering on mw13 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 0.192 second response time [18:26:22] RECOVERY - cp20 Stunnel Http for mw13 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 2.407 second response time [18:26:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:27:04] RECOVERY - mw11 MediaWiki Rendering on mw11 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 1.543 second response time [18:27:12] RECOVERY - cp20 Stunnel Http for mw11 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.617 second response time [18:27:49] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:27:54] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:29:01] RECOVERY - cp30 Stunnel Http for mw10 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 5.720 second response time [18:29:37] RECOVERY - mw10 MediaWiki Rendering on mw10 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 0.225 second response time [18:30:02] RECOVERY - cp31 Stunnel Http for mw10 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.160 second response time [18:30:41] RECOVERY - cp21 Stunnel Http for mw10 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14571 bytes in 8.074 second response time [18:30:42] RECOVERY - cp20 Stunnel Http for mw10 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 7.723 second response time [18:30:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [18:31:50] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 1.960 second response time [18:31:56] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.015 second response time [18:33:26] PROBLEM - cp30 Stunnel Http for mw10 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:33:33] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [18:33:56] PROBLEM - mw10 MediaWiki Rendering on mw10 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:34:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 149.56.140.43/cpweb, 149.56.141.75/cpweb [18:35:25] RECOVERY - cp30 Stunnel Http for mw10 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.323 second response time [18:35:51] RECOVERY - mw10 MediaWiki Rendering on mw10 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 0.258 second response time [18:36:05] [02mw-config] 07Universal-Omega closed pull request 03#4328: Split `$wgCreateWikiCannedResponses` into sections - 13https://git.io/JStWH [18:36:06] [02mw-config] 07Universal-Omega deleted branch 03Universal-Omega-patch-2 - 13https://git.io/vbvb3 [18:36:08] [02miraheze/mw-config] 07Universal-Omega deleted branch 03Universal-Omega-patch-2 [18:36:09] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSOYx [18:36:11] [02miraheze/mw-config] 07Universal-Omega 0312bdbca - Split `$wgCreateWikiCannedResponses` into sections (#4328) [18:36:34] !log [universalomega@test3] starting deploy of {'config': True} to skip [18:36:35] !log [universalomega@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [18:36:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [18:36:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:37:13] miraheze/mw-config - Universal-Omega the build passed. [18:37:33] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:38:50] !log [universalomega@test3] starting deploy of {'folders': 'w/extensions/CreateWiki'} to skip [18:38:51] !log [universalomega@test3] finished deploy of {'folders': 'w/extensions/CreateWiki'} to skip - SUCCESS in 0s [18:39:27] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:39:55] !log [universalomega@test3] starting deploy of {'pull': 'config', 'config': True} to skip [18:39:56] !log [universalomega@test3] finished deploy of {'pull': 'config', 'config': True} to skip - SUCCESS in 1s [18:40:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:40:26] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:40:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2607:5300:201:3100::5ebc/cpweb [18:41:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:41:10] !log [@mw11] starting deploy of {'config': True} to ovlon [18:41:26] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:42:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [18:45:09] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.47, 3.08, 2.86 [18:45:28] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 1 datacenter is down: 2001:41d0:801:2000::4c25/cpweb [18:46:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 5 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [18:47:10] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.92, 2.95, 2.84 [18:47:45] !log [@test101] starting deploy of {'config': True} to skip [18:47:46] !log [@test101] DEPLOY ABORTED: Canary check failed for localhost [18:48:08] Yes we know [18:48:28] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:48:50] PROBLEM - test3 Puppet on test3 is WARNING: WARNING: Puppet is currently disabled, message: Universal Omega, last run 24 minutes ago with 0 failures [18:48:58] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:49:16] PROBLEM - mw11 Puppet on mw11 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [18:49:35] CosmicAlpha: will you please check ^ [18:49:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:49:44] How come mw11 is failing again? I fixed last time... looking [18:50:20] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:50:26] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:50:35] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:50:39] PROBLEM - test101 Puppet on test101 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [18:50:44] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:51:15] RECOVERY - mw11 Puppet on mw11 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [18:51:27] !log [@mwtask111] starting deploy of {'config': True} to scsvg [18:51:28] !log [universalomega@mw11] starting deploy of {'pull': 'config', 'config': True} to all [18:51:28] !log [@mwtask111] DEPLOY ABORTED: Canary check failed for localhost [18:51:33] CosmicAlpha: what was it failing with [18:52:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:52:29] RhinosF1: no idea. Ran puppet and worked... trying deploy manually now. [18:52:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:52:45] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:52:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [18:52:53] CosmicAlpha: puppet doesn't do syncs twice [18:52:56] It's stupid [18:54:00] I think deploy-mediawiki is stuck. It is not doing anything... [18:54:20] CosmicAlpha: use htop [18:54:25] And search for it [18:54:33] It'll show you any subprocess [18:55:27] PROBLEM - mwtask111 Puppet on mwtask111 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [18:56:26] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 2.543 second response time [18:56:29] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 1.757 second response time [18:56:55] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 1.461 second response time [18:56:57] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 1.873 second response time [18:57:15] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.387 second response time [18:57:24] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [18:58:09] RhinosF1 https://usercontent.irccloud-cdn.com/file/HgU1dTSL/image.png [18:58:38] paladox: ^ [18:58:43] That's logsalmsg [18:58:49] hmm? [18:59:03] paladox: mw11 can't connect to mon2 [18:59:09] logsalmsg is stuck [18:59:09] oh [18:59:27] i guess must be related to me upgrading it last night to support ipv6 [18:59:28] Or not consistently at least [18:59:48] yeah it works sometimes. Not others. [19:00:01] !log [paladox@mon2] test [19:00:07] I assume eventually it'll time out [19:00:15] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:02:01] test [19:02:30] test [19:02:45] test [19:03:04] test [19:03:10] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.32, 3.44, 3.09 [19:03:33] test [19:05:10] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.92, 3.38, 3.12 [19:07:24] test [19:07:27] test [19:07:30] test [19:07:33] test [19:07:40] oh i may have a solution [19:07:42] test [19:07:45] test [19:07:48] -w0 [19:08:23] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSOBz [19:08:25] [02miraheze/puppet] 07paladox 037134b25 - logsalmsg set -w0 [19:09:08] What does that do [19:10:38] got it from https://serverfault.com/questions/512722/how-to-automatically-close-netcat-connection-after-data-is-sent and thought i try [19:10:38] [url] linux - How to automatically close netcat connection after data is sent? - Server Fault | serverfault.com [19:10:49] !log [paladox@mon2] test [19:10:53] !log [paladox@mon2] test [19:11:11] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.49, 3.46, 3.23 [19:11:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:11:35] PROBLEM - db101 Current Load on db101 is WARNING: WARNING - load average: 4.15, 5.79, 3.55 [19:11:42] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.85, 20.97, 19.07 [19:11:49] !log [paladox@mwtask1] test [19:11:55] RhinosF1: CosmicAlpha works now [19:12:16] CosmicAlpha: ^ [19:12:17] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.26, 5.91, 4.85 [19:12:21] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:12:21] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:12:25] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:12:42] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:12:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 4 datacenters are down: 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 2607:5300:201:3100::929a/cpweb [19:12:57] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:12:57] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:12:59] @raidarr: I have a theory on why performance has been rubbish with that page [19:13:09] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.60, 3.39, 3.25 [19:13:19] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 3 datacenters are down: 198.244.148.90/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::5ebc/cpweb [19:13:26] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:13:34] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 1.86, 4.41, 3.30 [19:13:36] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:13:37] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:13:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:15:26] RECOVERY - mwtask111 Puppet on mwtask111 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [19:15:32] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 5.774 second response time [19:15:36] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 19.75, 20.26, 19.21 [19:15:40] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 8.585 second response time [19:15:44] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 8.587 second response time [19:16:11] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/JSOEC [19:16:13] [02miraheze/puppet] 07paladox 037d00c18 - mediawiki: Switch gluster volume to gluster4 [19:16:14] [02puppet] 07paladox created branch 03paladox-patch-4 - 13https://git.io/vbiAS [19:16:16] [02puppet] 07paladox opened pull request 03#2245: mediawiki: Switch gluster volume to gluster4 - 13https://git.io/JSOEl [19:16:16] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.37, 5.42, 4.89 [19:16:23] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 4.497 second response time [19:16:26] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 2.208 second response time [19:16:30] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSOEg [19:16:32] [02miraheze/puppet] 07paladox 0396c53ea - mediawiki: Switch gluster volume to gluster4 (#2245) [19:16:33] [02puppet] 07paladox closed pull request 03#2245: mediawiki: Switch gluster volume to gluster4 - 13https://git.io/JSOEl [19:16:35] [02puppet] 07paladox deleted branch 03paladox-patch-4 - 13https://git.io/vbiAS [19:16:36] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.61, 6.77, 6.01 [19:16:36] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-4 [19:16:52] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 4.995 second response time [19:17:03] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 4.871 second response time [19:17:06] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 2.947 second response time [19:17:07] @raidarr: performance should improve soon and the page load again [19:17:18] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [19:18:32] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 6.39, 6.72, 6.09 [19:18:37] RECOVERY - test101 Puppet on test101 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [19:18:49] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:19:10] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.97, 3.72, 3.39 [19:21:10] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.32, 3.57, 3.37 [19:22:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.56, 5.79, 5.15 [19:22:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 7 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb [19:23:34] PROBLEM - mw10 Puppet on mw10 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static] [19:24:16] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.71, 5.68, 5.22 [19:25:10] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 1.99, 2.96, 3.19 [19:25:19] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:25:31] PROBLEM - mw8 MediaWiki Rendering on mw8 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:25:41] PROBLEM - cp20 Stunnel Http for mw8 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:25:42] paladox: puppet fail ^ [19:25:44] PROBLEM - cp21 Stunnel Http for mw8 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:25:48] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:25:48] PROBLEM - cp31 Stunnel Http for mw8 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:25:49] i know [19:25:50] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:26:14] PROBLEM - cp30 Stunnel Http for mw8 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:26:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.59, 5.84, 5.32 [19:26:41] PROBLEM - mw9 Puppet on mw9 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static] [19:27:14] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 3 datacenters are down: 51.195.220.68/cpweb, 2001:41d0:801:2000::4c25/cpweb, 149.56.140.43/cpweb [19:28:15] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.21, 4.84, 5.02 [19:29:13] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [19:29:22] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.012 second response time [19:29:31] RECOVERY - mw8 MediaWiki Rendering on mw8 is OK: HTTP OK: HTTP/1.1 200 OK - 20514 bytes in 1.031 second response time [19:29:39] RECOVERY - cp20 Stunnel Http for mw8 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 0.019 second response time [19:29:46] RECOVERY - cp21 Stunnel Http for mw8 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14560 bytes in 1.085 second response time [19:29:48] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 2.954 second response time [19:29:51] RECOVERY - cp31 Stunnel Http for mw8 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 1.258 second response time [19:29:56] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 1.063 second response time [19:30:06] PROBLEM - mw12 Puppet on mw12 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static] [19:30:21] RECOVERY - cp30 Stunnel Http for mw8 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14554 bytes in 3.623 second response time [19:31:24] Glad to hear when it does [19:32:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:33:30] @raidarr: we gonna move gluster around a bit to so it's not under so much load while data gets copied to new DC [19:37:55] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.82, 3.67, 3.43 [19:39:00] I'll pretty much be transitioning to other stuff later today anyway, I realize I've been on Miraheze pretty much all day >.> though I can't imagine where SRE is at given the ongoing DC business. [19:41:31] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 5 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::5ebc/cpweb [19:42:05] RECOVERY - mw12 Puppet on mw12 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [19:42:51] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.74, 20.23, 19.38 [19:43:17] PROBLEM - mw11 Puppet on mw11 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static] [19:43:34] PROBLEM - mwtask1 Puppet on mwtask1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static] [19:43:47] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.86, 3.36, 3.39 [19:44:05] PROBLEM - mw13 Puppet on mw13 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static] [19:44:07] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 3 datacenters are down: 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 2607:5300:201:3100::5ebc/cpweb [19:44:51] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.59, 21.89, 20.09 [19:45:15] RECOVERY - mw11 Puppet on mw11 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:45:48] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:46:05] RECOVERY - mw13 Puppet on mw13 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:46:20] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:46:51] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.25, 20.28, 19.71 [19:47:14] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:47:42] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 0.497 second response time [19:48:06] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [19:48:16] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 1.622 second response time [19:50:40] RECOVERY - mw9 Puppet on mw9 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [19:51:34] RECOVERY - mw10 Puppet on mw10 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [19:51:41] PROBLEM - cp20 Stunnel Http for mw8 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:51:57] PROBLEM - cp21 Stunnel Http for mw8 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:52:04] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 5 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb [19:52:12] PROBLEM - cp31 Stunnel Http for mw8 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:52:54] PROBLEM - cp30 Stunnel Http for mw8 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:52:57] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [19:53:40] RECOVERY - cp20 Stunnel Http for mw8 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 5.274 second response time [19:53:57] RECOVERY - cp21 Stunnel Http for mw8 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 4.449 second response time [19:53:59] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:54:05] !log [universalomega@mw11] starting deploy of {'pull': 'config', 'config': True} to all [19:54:12] RECOVERY - cp31 Stunnel Http for mw8 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 3.847 second response time [19:54:47] PROBLEM - cp31 Current Load on cp31 is WARNING: WARNING - load average: 0.74, 1.98, 1.34 [19:55:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:56:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:56:58] RECOVERY - cp30 Stunnel Http for mw8 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14554 bytes in 0.357 second response time [19:57:47] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:57:53] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:58:06] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 9.468 second response time [19:58:43] RECOVERY - cp31 Current Load on cp31 is OK: OK - load average: 1.07, 1.43, 1.26 [19:58:59] PROBLEM - db101 Current Load on db101 is CRITICAL: CRITICAL - load average: 19.96, 8.37, 5.03 [19:59:42] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.748 second response time [19:59:54] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 1.134 second response time [20:00:01] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [20:00:16] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 6.84, 6.50, 6.05 [20:02:13] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 6.59, 6.37, 6.05 [20:02:56] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 1.89, 4.62, 4.22 [20:09:57] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 149.56.140.43/cpweb, 2607:5300:201:3100::929a/cpweb [20:11:41] RECOVERY - mwtask1 Puppet on mwtask1 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [20:12:04] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.28, 6.62, 6.10 [20:12:25] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.11, 4.73, 4.54 [20:13:09] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [20:13:43] PROBLEM - cp30 Stunnel Http for mw13 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:13:48] PROBLEM - cp31 Stunnel Http for mw13 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:13:49] PROBLEM - cp21 Stunnel Http for mw13 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:13:55] PROBLEM - cp21 Stunnel Http for mw10 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:13:58] PROBLEM - cp30 Stunnel Http for mw10 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:14:03] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 6.42, 6.60, 6.16 [20:14:05] PROBLEM - cp20 Stunnel Http for mw10 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:14:08] PROBLEM - cp31 Stunnel Http for mw10 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:14:15] PROBLEM - mw13 MediaWiki Rendering on mw13 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:14:18] PROBLEM - cp20 Stunnel Http for mw13 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:14:22] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.14, 4.50, 4.48 [20:14:52] PROBLEM - mw10 MediaWiki Rendering on mw10 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:15:23] PROBLEM - cp30 Stunnel Http for mw11 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:15:25] PROBLEM - cp21 Stunnel Http for mw11 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:15:55] RECOVERY - cp21 Stunnel Http for mw10 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14571 bytes in 5.040 second response time [20:16:01] RECOVERY - cp30 Stunnel Http for mw10 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 3.881 second response time [20:16:07] RECOVERY - cp31 Stunnel Http for mw10 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 2.851 second response time [20:16:08] RECOVERY - cp20 Stunnel Http for mw10 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 2.953 second response time [20:16:48] PROBLEM - mw11 MediaWiki Rendering on mw11 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:16:48] RECOVERY - mw10 MediaWiki Rendering on mw10 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 2.229 second response time [20:16:49] PROBLEM - cp31 Stunnel Http for mw11 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:17:17] PROBLEM - cp20 Stunnel Http for mw11 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:18:25] RECOVERY - mw13 MediaWiki Rendering on mw13 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 5.926 second response time [20:18:27] RECOVERY - cp20 Stunnel Http for mw13 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 4.532 second response time [20:18:51] RECOVERY - cp31 Stunnel Http for mw11 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 5.826 second response time [20:18:51] RECOVERY - mw11 MediaWiki Rendering on mw11 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 6.420 second response time [20:19:48] RECOVERY - cp30 Stunnel Http for mw13 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.355 second response time [20:19:56] RECOVERY - cp31 Stunnel Http for mw13 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.367 second response time [20:20:00] RECOVERY - cp21 Stunnel Http for mw13 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.038 second response time [20:20:25] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:20:31] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:20:35] PROBLEM - mw13 Current Load on mw13 is WARNING: WARNING - load average: 7.08, 6.77, 6.11 [20:20:35] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:21:37] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:21:50] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:22:30] RECOVERY - mw13 Current Load on mw13 is OK: OK - load average: 5.56, 6.57, 6.14 [20:23:04] PROBLEM - mw10 MediaWiki Rendering on mw10 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:17] RECOVERY - cp20 Stunnel Http for mw11 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14571 bytes in 0.546 second response time [20:23:34] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 2.499 second response time [20:23:44] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 0.371 second response time [20:23:48] RECOVERY - cp30 Stunnel Http for mw11 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 1.650 second response time [20:23:53] RECOVERY - cp21 Stunnel Http for mw11 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 1.343 second response time [20:24:32] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 4.199 second response time [20:24:36] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.872 second response time [20:24:44] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:24:46] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 1.648 second response time [20:24:54] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 8.01, 6.98, 6.44 [20:25:01] RECOVERY - mw10 MediaWiki Rendering on mw10 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 2.535 second response time [20:25:26] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:25:36] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:25:52] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [20:26:19] !log [universalomega@test3] starting deploy of {'folders': 'w/extensions/CreateWiki'} to skip [20:26:20] !log [universalomega@test3] finished deploy of {'folders': 'w/extensions/CreateWiki'} to skip - SUCCESS in 0s [20:26:30] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:27:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:28:32] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 4.429 second response time [20:28:46] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 1.266 second response time [20:28:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [20:28:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:28:52] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 6.52, 7.11, 6.65 [20:29:14] PROBLEM - mw8 MediaWiki Rendering on mw8 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:29:20] PROBLEM - cp31 Stunnel Http for mw8 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:29:23] PROBLEM - cp30 Stunnel Http for mw8 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:29:30] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.73, 3.31, 3.16 [20:29:31] CosmicAlpha: great job! [20:29:39] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 5.270 second response time [20:29:41] Just tested CW out, it works pretty good [20:29:41] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 5.178 second response time [20:29:50] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 3 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 149.56.140.43/cpweb [20:29:52] PROBLEM - cp20 Stunnel Http for mw8 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:30:02] PROBLEM - cp21 Stunnel Http for mw8 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:30:29] ssh-agent: while proxying to test3 or just normal? [20:30:50] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 6.03, 6.71, 6.56 [20:30:59] CosmicAlpha: proxying via test3 [20:31:18] RECOVERY - mw8 MediaWiki Rendering on mw8 is OK: HTTP OK: HTTP/1.1 200 OK - 20514 bytes in 9.301 second response time [20:31:23] RECOVERY - cp31 Stunnel Http for mw8 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 7.190 second response time [20:31:26] RECOVERY - cp30 Stunnel Http for mw8 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 6.490 second response time [20:31:29] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.09, 3.02, 3.09 [20:31:46] ssh-agent: don't you normally do that? [20:31:51] RECOVERY - cp20 Stunnel Http for mw8 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 5.006 second response time [20:32:05] RECOVERY - cp21 Stunnel Http for mw8 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 6.743 second response time [20:32:25] ssh-agent: ah cool, should the default stay as "other", or moved to something else, also should the "other" label stay the same or be renamed to "Custom" or something? Just trying to make it fully right before I update CW. [20:33:49] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [20:33:49] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.22, 5.21, 4.92 [20:35:46] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.13, 5.48, 5.04 [20:36:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 2001:41d0:801:2000::1b80/cpweb [20:37:19] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:37:28] PROBLEM - db101 Current Load on db101 is CRITICAL: CRITICAL - load average: 7.66, 8.72, 5.82 [20:37:44] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.15, 5.43, 5.08 [20:38:49] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [20:39:26] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 7.097 second response time [20:39:42] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.49, 5.68, 5.20 [20:40:42] RhinosF1: proxy via test3? Not normally, no. I just proxied to it right now to see how the new CreateWiki "Other" field looked/functioned [20:40:50] Ah [20:41:25] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 1.80, 4.92, 4.91 [20:41:39] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.97, 5.70, 5.26 [20:42:32] CosmicAlpha: I think "Perfect request" should remain the default. As for the "Other" field, it's current name is fine [20:42:46] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 51.195.220.68/cpweb, 2001:41d0:801:2000::4c25/cpweb [20:42:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 2607:5300:201:3100::929a/cpweb [20:43:36] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.27, 5.81, 5.33 [20:45:33] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.93, 5.15, 5.13 [20:46:46] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [20:46:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [20:47:30] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.21, 6.30, 5.55 [20:50:32] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:50:45] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 5 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 2607:5300:201:3100::5ebc/cpweb [20:50:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 51.195.220.68/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [20:51:13] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:52:37] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 3.819 second response time [20:52:44] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [20:53:09] CosmicAlpha: Wiki request section headers LGTM :) [20:53:12] Good job :) [20:53:16] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 6.519 second response time [20:56:05] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:56:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [20:58:05] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 4.982 second response time [21:01:10] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.41, 5.81, 5.86 [21:09:56] [02miraheze/CreateWiki] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-1 [+0/-0/±1] 13https://git.io/JSsNh [21:09:58] [02miraheze/CreateWiki] 07Universal-Omega 0385e80e1 - set default for reason to the first array key in `$wgCreateWikiCannedResponses` [21:09:59] [02CreateWiki] 07Universal-Omega created branch 03Universal-Omega-patch-1 - 13https://git.io/vpJTL [21:10:01] [02CreateWiki] 07Universal-Omega opened pull request 03#273: set default for reason to the first array key in `$wgCreateWikiCanned… - 13https://git.io/JSsNj [21:10:16] [02CreateWiki] 07Universal-Omega edited pull request 03#273: set default for reason to the first array key in `$wgCreateWikiCannedResponses` - 13https://git.io/JSsNj [21:10:20] [02CreateWiki] 07Universal-Omega edited pull request 03#273: set default for reason to the first array key in `$wgCreateWikiCannedResponses` - 13https://git.io/JSsNj [21:13:14] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 51.195.220.68/cpweb, 2001:41d0:801:2000::4c25/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [21:13:35] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 4 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::5ebc/cpweb [21:13:39] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.48, 20.34, 19.79 [21:14:51] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.27, 4.15, 4.87 [21:15:09] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [21:15:35] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [21:15:36] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 20.24, 20.22, 19.80 [21:15:36] miraheze/CreateWiki - Universal-Omega the build passed. [21:19:54] PROBLEM - db101 Current Load on db101 is CRITICAL: CRITICAL - load average: 3.06, 6.13, 4.36 [21:21:16] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:21:25] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:21:41] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:21:49] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 2001:41d0:801:2000::1b80/cpweb [21:21:53] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 2.04, 4.70, 4.04 [21:22:02] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:22:04] [02miraheze/CreateWiki] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-1 [+0/-0/±1] 13https://git.io/JSGef [21:22:05] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:22:05] [02miraheze/CreateWiki] 07Universal-Omega 0390b500e - Update ext.createwiki.oouiform.ooui.less [21:22:07] [02CreateWiki] 07Universal-Omega synchronize pull request 03#273: set default for reason to the first array key in `$wgCreateWikiCannedResponses` - 13https://git.io/JSsNj [21:22:36] [02CreateWiki] 07Universal-Omega edited pull request 03#273: set default for reason to the first array key in `$wgCreateWikiCannedResponses` - 13https://git.io/JSsNj [21:23:07] PROBLEM - db111 Current Load on db111 is WARNING: WARNING - load average: 3.89, 5.85, 4.40 [21:23:32] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 1 datacenter is down: 2607:5300:201:3100::929a/cpweb [21:23:43] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [21:24:04] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 5.452 second response time [21:24:08] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 6.503 second response time [21:24:41] miraheze/CreateWiki - Universal-Omega the build has errored. [21:25:15] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 1.823 second response time [21:25:24] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.246 second response time [21:25:31] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [21:25:44] [02miraheze/CreateWiki] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-1 [+0/-0/±1] 13https://git.io/JSGvE [21:25:46] [02miraheze/CreateWiki] 07Universal-Omega 03508aafd - Update ext.createwiki.oouiform.ooui.less [21:25:47] [02CreateWiki] 07Universal-Omega synchronize pull request 03#273: set default for reason to the first array key in `$wgCreateWikiCannedResponses` - 13https://git.io/JSsNj [21:25:52] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 1.258 second response time [21:26:08] !log [universalomega@test3] starting deploy of {'folders': 'w/extensions/CreateWiki'} to skip [21:26:09] !log [universalomega@test3] finished deploy of {'folders': 'w/extensions/CreateWiki'} to skip - SUCCESS in 0s [21:27:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:27:07] RECOVERY - db111 Current Load on db111 is OK: OK - load average: 2.06, 4.09, 4.03 [21:27:26] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:31:09] miraheze/CreateWiki - Universal-Omega the build passed. [21:31:17] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.36, 5.93, 5.27 [21:32:31] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.50, 6.97, 6.51 [21:34:23] [02miraheze/CreateWiki] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-1 [+0/-0/±1] 13https://git.io/JSGTt [21:34:25] [02miraheze/CreateWiki] 07Universal-Omega 033bc98f6 - Update ext.createwiki.oouiform.ooui.less [21:34:26] [02CreateWiki] 07Universal-Omega synchronize pull request 03#273: set default for reason to the first array key in `$wgCreateWikiCannedResponses` - 13https://git.io/JSsNj [21:34:29] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.77, 6.48, 6.39 [21:34:36] !log [universalomega@test3] starting deploy of {'folders': 'w/extensions/CreateWiki'} to skip [21:34:37] !log [universalomega@test3] finished deploy of {'folders': 'w/extensions/CreateWiki'} to skip - SUCCESS in 1s [21:34:57] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.83, 21.34, 20.51 [21:35:01] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:35:11] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.33, 5.72, 5.31 [21:35:21] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:36:53] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.33, 19.99, 20.12 [21:38:59] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 51.195.220.68/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb [21:39:05] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.22, 4.75, 5.02 [21:39:24] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 149.56.140.43/cpweb, 149.56.141.75/cpweb [21:39:56] [02CreateWiki] 07Universal-Omega edited pull request 03#273: Improve `selectorother` for `$wgCreateWikiCannedResponses` - 13https://git.io/JSsNj [21:40:00] [02CreateWiki] 07Universal-Omega edited pull request 03#273: Improve `selectorother` for `$wgCreateWikiCannedResponses` - 13https://git.io/JSsNj [21:40:00] miraheze/CreateWiki - Universal-Omega the build passed. [21:40:28] [02miraheze/CreateWiki] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JSGIM [21:40:30] [02miraheze/CreateWiki] 07Universal-Omega 03a486bde - Improve `selectorother` for `$wgCreateWikiCannedResponses` (#273) [21:40:31] [02CreateWiki] 07Universal-Omega closed pull request 03#273: Improve `selectorother` for `$wgCreateWikiCannedResponses` - 13https://git.io/JSsNj [21:40:33] [02CreateWiki] 07Universal-Omega deleted branch 03Universal-Omega-patch-1 - 13https://git.io/vpJTL [21:40:34] [02miraheze/CreateWiki] 07Universal-Omega deleted branch 03Universal-Omega-patch-1 [21:43:23] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [21:44:12] [02miraheze/mediawiki] 07Universal-Omega pushed 031 commit to 03REL1_37 [+0/-0/±1] 13https://git.io/JSGtl [21:44:13] [02miraheze/mediawiki] 07Universal-Omega 03ef065af - Update CreateWiki [21:44:53] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.75, 5.21, 5.17 [21:46:31] miraheze/CreateWiki - Universal-Omega the build passed. [21:46:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [21:46:50] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.71, 4.75, 5.01 [21:47:34] !log [universalomega@test3] starting deploy of {'world': True} to skip [21:47:40] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:48:42] [02miraheze/mw-config] 07RhinosF1 pushed 031 commit to 03RhinosF1-patch-2 [+0/-0/±1] 13https://git.io/JSGmq [21:48:44] [02miraheze/mw-config] 07RhinosF1 035e273cd - Database: put SCSVG in RO mode [21:48:46] [02mw-config] 07RhinosF1 created branch 03RhinosF1-patch-2 - 13https://git.io/vbvb3 [21:48:47] [02mw-config] 07RhinosF1 opened pull request 03#4330: Database: put SCSVG in RO mode - 13https://git.io/JSGm3 [21:48:54] JohnLewis: ^ [21:49:06] [02miraheze/mw-config] 07github-actions[bot] pushed 031 commit to 03RhinosF1-patch-2 [+0/-0/±1] 13https://git.io/JSGmC [21:49:07] CosmicAlpha: can you deploy so I can collapse in bed soon [21:49:08] [02miraheze/mw-config] 07github-actions 039540378 - CI: lint code to MediaWiki standards [21:49:09] [02mw-config] 07github-actions[bot] synchronize pull request 03#4330: Database: put SCSVG in RO mode - 13https://git.io/JSGm3 [21:49:43] miraheze/mw-config - RhinosF1 the build passed. [21:50:50] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 7 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [21:51:29] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03RhinosF1-patch-2 [+0/-0/±1] 13https://git.io/JSGYn [21:51:30] [02miraheze/mw-config] 07Universal-Omega 033d3d8fc - Update Database.php [21:51:32] [02mw-config] 07Universal-Omega synchronize pull request 03#4330: Database: put SCSVG in RO mode - 13https://git.io/JSGm3 [21:51:37] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.91, 21.71, 20.62 [21:51:43] PROBLEM - cp30 Stunnel Http for mw13 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:51:54] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03RhinosF1-patch-2 [+0/-0/±1] 13https://git.io/JSGYu [21:51:56] [02miraheze/mw-config] 07Universal-Omega 03b03289c - Update Database.php [21:51:57] [02mw-config] 07Universal-Omega synchronize pull request 03#4330: Database: put SCSVG in RO mode - 13https://git.io/JSGm3 [21:51:59] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:52:13] PROBLEM - cp20 Stunnel Http for mw8 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:52:18] [02miraheze/mw-config] 07github-actions[bot] pushed 031 commit to 03RhinosF1-patch-2 [+0/-0/±1] 13https://git.io/JSGYo [21:52:20] [02miraheze/mw-config] 07github-actions 03377a544 - CI: lint code to MediaWiki standards [21:52:21] [02mw-config] 07github-actions[bot] synchronize pull request 03#4330: Database: put SCSVG in RO mode - 13https://git.io/JSGm3 [21:52:24] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:52:31] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:52:33] miraheze/mw-config - Universal-Omega the build passed. [21:52:41] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.11, 5.58, 5.30 [21:52:55] miraheze/mw-config - Universal-Omega the build passed. [21:53:01] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:53:16] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:53:19] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 4 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb [21:53:38] RECOVERY - cp30 Stunnel Http for mw13 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.308 second response time [21:53:54] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.04, 6.43, 6.06 [21:54:09] RECOVERY - cp20 Stunnel Http for mw8 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14554 bytes in 1.969 second response time [21:55:18] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [21:56:34] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 6.580 second response time [21:56:35] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.25, 4.42, 4.89 [21:56:39] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 5.452 second response time [21:56:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [21:57:02] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 3.540 second response time [21:57:17] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 1.724 second response time [21:57:28] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.55, 19.81, 20.12 [21:57:51] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.44, 6.24, 6.09 [21:58:18] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.321 second response time [21:59:21] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03RhinosF1-patch-2 [+0/-0/±1] 13https://git.io/JSGsB [21:59:23] [02miraheze/mw-config] 07Universal-Omega 03870ae33 - Update Database.php [21:59:24] [02mw-config] 07Universal-Omega synchronize pull request 03#4330: Database: put SCSVG in RO mode - 13https://git.io/JSGm3 [22:00:22] miraheze/mw-config - Universal-Omega the build passed. [22:02:07] !log [universalomega@test3] starting deploy of {'config': True} to skip [22:02:08] !log [universalomega@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [22:02:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:02:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:03:59] [02mw-config] 07Universal-Omega closed pull request 03#4330: Database: put SCSVG in RO mode - 13https://git.io/JSGm3 [22:04:01] [02miraheze/mw-config] 07Universal-Omega deleted branch 03RhinosF1-patch-2 [22:04:02] [02mw-config] 07Universal-Omega deleted branch 03RhinosF1-patch-2 - 13https://git.io/vbvb3 [22:04:03] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSGZn [22:04:05] [02miraheze/mw-config] 07RhinosF1 039c3c9ef - Database: put SCSVG in RO mode (#4330) [22:05:09] miraheze/mw-config - Universal-Omega the build passed. [22:05:15] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.16, 6.35, 6.13 [22:05:42] !log [universalomega@test101] starting deploy of {'pull': 'config', 'config': True, 'force': True} to skip [22:05:43] !log [universalomega@test101] finished deploy of {'pull': 'config', 'config': True, 'force': True} to skip - SUCCESS in 0s [22:05:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:05:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:06:21] !log [universalomega@mwtask111] starting deploy of {'pull': 'config', 'config': True, 'force': True} to scsvg [22:06:27] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:07:15] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.43, 5.92, 5.99 [22:07:27] !log [universalomega@mwtask111] finished deploy of {'pull': 'config', 'config': True, 'force': True} to scsvg - SUCCESS in 66s [22:07:47] !log [universalomega@mw11] starting deploy of {'pull': 'config', 'config': True} to all [22:08:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:08:27] !log [universalomega@mw11] finished deploy of {'pull': 'config', 'config': True} to all - SUCCESS in 40s [22:08:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:09:00] !log [universalomega@test3] starting deploy of {'pull': 'config', 'config': True} to skip [22:09:01] !log [universalomega@test3] finished deploy of {'pull': 'config', 'config': True} to skip - SUCCESS in 0s [22:09:29] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:09:42] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 8.20, 6.56, 6.06 [22:09:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:10:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:10:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 4 datacenters are down: 51.195.220.68/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [22:11:12] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 1 datacenter is down: 149.56.141.75/cpweb [22:11:41] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 6.91, 6.77, 6.21 [22:12:37] [02puppet] 07Universal-Omega opened pull request 03#2246: deploy-mediawiki: fix --world on current infra/without --use-proxy - 13https://git.io/JSGCb [22:12:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [22:12:49] paladox: ^ [22:13:03] [02puppet] 07paladox closed pull request 03#2246: deploy-mediawiki: fix --world on current infra/without --use-proxy - 13https://git.io/JSGCb [22:13:05] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSGWJ [22:13:06] [02miraheze/puppet] 07Universal-Omega 0348cfa62 - deploy-mediawiki: fix --world on current infra/without --use-proxy (#2246) [22:13:09] thanks! [22:13:09] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.59, 3.53, 3.21 [22:13:12] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [22:13:41] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.62, 6.18, 6.04 [22:15:37] deployed [22:15:39] https://blog.miraheze.org/post/17/introducing_scsvg/ [22:15:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [22:15:40] [url] ✩ Introducing SCSVG | blog.miraheze.org [22:16:02] thanks paladox! [22:16:38] JohnLewis: looking [22:17:09] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.49, 4.00, 3.48 [22:17:48] are those things with the red parts all SSD hard drives in our server? [22:18:19] !log [universalomega@test3] starting deploy of {'world': True} to skip [22:18:50] RECOVERY - test3 Puppet on test3 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [22:18:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:19:10] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.07, 3.87, 3.51 [22:19:21] dmehus: 4 on the bottom are HDDs, 4 on the top (only 3 are filled) are SSDs [22:20:43] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:20:45] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:20:49] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 149.56.141.75/cpweb [22:20:58] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.62, 4.91, 4.51 [22:21:11] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.92, 4.02, 3.59 [22:21:23] !log [universalomega@test3] finished deploy of {'world': True} to skip - SUCCESS in 185s [22:21:29] \ [22:21:32] er [22:21:48] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:22:12] !log [universalomega@mw11] starting deploy of {'world': True} to all [22:22:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:22:20] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:22:22] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:22:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [22:22:55] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.41, 4.89, 4.57 [22:23:09] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.84, 3.58, 3.48 [22:23:10] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 2 datacenters are down: 51.195.220.68/cpweb, 2607:5300:201:3100::929a/cpweb [22:23:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:24:26] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 9.168 second response time [22:24:29] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 8.762 second response time [22:24:51] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 18.91, 20.69, 19.85 [22:25:11] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [22:25:46] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.14, 7.03, 6.57 [22:26:16] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 8.21, 6.77, 6.21 [22:27:43] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 8.18, 7.70, 6.89 [22:27:57] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 8.112 second response time [22:27:58] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:28:14] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 5.69, 6.26, 6.10 [22:28:15] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:28:22] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:28:49] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.141.75/cpweb [22:28:51] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.37, 20.27, 19.95 [22:28:59] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.676 second response time [22:29:01] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.556 second response time [22:29:39] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 5.48, 6.99, 6.73 [22:29:54] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 0.528 second response time [22:30:16] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14564 bytes in 7.124 second response time [22:30:26] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 7.366 second response time [22:30:36] JohnLewis, ah, cool :) [22:30:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [22:31:21] I'm guessing Gluster and DB servers will use the SSDs? [22:31:35] Gluster will use HDDs, DB will use SSDs [22:31:36] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 4.94, 6.52, 6.61 [22:31:41] ah [22:31:42] !log [universalomega@mw11] finished deploy of {'world': True} to all - SUCCESS in 570s [22:31:50] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:31:53] We're not in a position currently to be able to afford sufficiently sized SSDs for Gluster [22:31:56] * dmehus should've gone with his first guess [22:32:03] ah, true, makes sense [22:32:49] But given we will own the hardware, we will always be in a position to evaluate the situation - and should things change, we will definitely consider using SSDs for everything if its something we evaluate we can invest in [22:33:09] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.13, 3.07, 3.31 [22:34:01] yeah, that makes sense, we have a way more optionality now, and it's not too far to have you or Owen visit the data centre to install them, or would we have the new hard drives delivered directly to the DC and have our DC provider install them? [22:34:07] PROBLEM - db121 Current Load on db121 is CRITICAL: CRITICAL - load average: 17.57, 9.13, 4.76 [22:34:31] It depends [22:34:38] We can ask them to remote hands stuff [22:34:43] ah [22:34:43] It costs though [22:34:45] yeah [22:34:51] I figured there'd be a cost for that [22:35:02] since we're just paying them for the space + network traffic [22:35:18] You get a bit free each month [22:35:21] ah [22:35:22] But it's not long [22:35:53] yeah, and that's probably intended for things like power outages or whatever [22:36:09] Probably [22:36:14] But I hope we don't get any [22:36:21] although they probably have generators that provide for a decent amount of backup power [22:36:23] yeah [22:36:24] There should be a fair bit of redundancy [22:36:48] At least it's not up here [22:36:54] As northern powergrid are useless [22:37:07] ah [22:37:19] I've actually not googled where in the UK the DC is located [22:37:28] is it north of the midlands? [22:37:35] They have 100% uptime and they've been operating quite a few years [22:37:41] ah [22:37:42] It's north of London :P [22:37:46] oh heh [22:37:48] ServerChoice.com [22:37:49] 01438 532300 [22:37:49] https://goo.gl/maps/BQ8xBS576VFGNacQ8 [22:37:50] [url] Google Maps | goo.gl [22:37:55] it's about a 2 hour [22:37:56] thanks :) [22:39:03] * RhinosF1 still holds a grudge against his power supplier for last month [22:41:21] the DC is an hour from me [22:41:22] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.10, 5.84, 5.00 [22:41:29] cool paladox [22:41:37] For disk installation, it would depend on urgency tbh whether we consider remote hands or not, if its more involved than just a plug in, we'd probably wait rather than use remote hands [22:42:07] down the M1 [22:42:11] JohnLewis, yeah, that's what I was thinking, like if it was part of our annual/semi-annual upgrade cycle, we could just do it [22:42:27] We could actually probably send Paladox to do it, since he seems to be the closest :P [22:43:17] Any physical work would be tracked on https://phabricator.miraheze.org/tag/cloud_infrastructure/ - there's 2 tasks open currently too [22:43:18] [url] Cloud Infrastructure · Workboard | phabricator.miraheze.org [22:43:38] JohnLewis, ah, cool, thanks, I should follow that workboard [22:44:07] PROBLEM - db121 Current Load on db121 is WARNING: WARNING - load average: 1.70, 5.21, 4.99 [22:44:30] where'd we borrow the power cables from? [22:44:37] ServiceChoice, or a volunteer? [22:44:37] the data centre [22:44:41] serverchoice [22:44:42] ah, cool [22:44:46] thanks [22:44:48] they are very good [22:44:53] yeah [22:44:59] customer relations, much much better [22:45:03] better then ovh by miles [22:45:05] definitely [22:45:10] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:45:11] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:45:29] OVH is...yeah, affordable, but not great in terms of customer service excellence [22:45:59] we have higher uptime with the new data centre anyways [22:46:05] 100% uptime [22:46:06] cool [22:46:07] RECOVERY - db121 Current Load on db121 is OK: OK - load average: 1.41, 3.92, 4.54 [22:46:09] OVH abuse team were awful [22:46:11] wow [22:46:42] I don't even think we got an automated receipt for some of abuse bots was getting [22:46:46] RhinosF1, not surprising [22:46:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 8 datacenters are down: 51.195.220.68/cpweb, 198.244.148.90/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [22:46:52] Even when I got Reception to submit [22:47:04] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 6 datacenters are down: 51.195.220.68/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb [22:47:14] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 6.971 second response time [22:47:16] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 7.040 second response time [22:47:17] Best abuse team has to go to Microsoft's CERT team [22:47:27] ah [22:47:44] They respond very quickly [22:47:54] And also you can tell they act [22:47:58] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:47:58] Because it stops [22:48:03] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:48:05] heh [22:48:41] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:48:51] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:48:59] dmehus: it's the one part of microsoft I like [22:49:34] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:50:40] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 1.603 second response time [22:50:49] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14564 bytes in 0.477 second response time [22:52:07] PROBLEM - db121 Current Load on db121 is WARNING: WARNING - load average: 3.73, 5.93, 5.25 [22:52:50] PROBLEM - db11 Disk Space on db11 is CRITICAL: DISK CRITICAL - free space: / 23904 MB (5% inode=97%); [22:53:08] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.35, 5.95, 5.68 [22:53:41] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-2 [+0/-0/±1] 13https://git.io/JSGo2 [22:53:43] [02miraheze/mw-config] 07Universal-Omega 037974651 - Enable Purge and WikiSEO by in ManageWiki default [22:53:44] [02mw-config] 07Universal-Omega created branch 03Universal-Omega-patch-2 - 13https://git.io/vbvb3 [22:53:54] [02mw-config] 07Universal-Omega opened pull request 03#4331: Enable Purge and WikiSEO by in ManageWiki default - 13https://git.io/JSGoK [22:54:07] RECOVERY - db121 Current Load on db121 is OK: OK - load average: 1.77, 4.48, 4.80 [22:54:29] PROBLEM - db12 Disk Space on db12 is CRITICAL: DISK CRITICAL - free space: / 17079 MB (3% inode=97%); [22:54:53] miraheze/mw-config - Universal-Omega the build passed. [22:55:02] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:55:12] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:55:24] [02mw-config] 07Universal-Omega edited pull request 03#4331: Enable Purge and WikiSEO in ManageWiki by default - 13https://git.io/JSGoK [22:55:28] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JSGKC [22:55:30] [02miraheze/mw-config] 07Universal-Omega 03fcf1f9f - Enable Purge and WikiSEO in ManageWiki by default (#4331) [22:55:31] [02mw-config] 07Universal-Omega closed pull request 03#4331: Enable Purge and WikiSEO in ManageWiki by default - 13https://git.io/JSGoK [22:55:33] [02mw-config] 07Universal-Omega deleted branch 03Universal-Omega-patch-2 - 13https://git.io/vbvb3 [22:55:34] [02miraheze/mw-config] 07Universal-Omega deleted branch 03Universal-Omega-patch-2 [22:56:04] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 9.431 second response time [22:56:18] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 9.597 second response time [22:56:27] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.83, 20.65, 20.06 [22:56:35] miraheze/mw-config - Universal-Omega the build passed. [22:56:58] PROBLEM - db13 Disk Space on db13 is WARNING: DISK WARNING - free space: / 44503 MB (9% inode=97%); [22:57:00] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [22:57:02] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.46, 6.53, 5.96 [22:57:15] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 6.88, 6.18, 6.00 [22:58:24] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.15, 19.83, 19.83 [22:58:29] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 47241 MB (10% inode=98%); [22:58:57] RhinosF1, I prefer Microsoft to Google these days [22:58:59] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.67, 5.95, 5.83 [22:59:11] dmehus: I prefer Apple & Debian [22:59:15] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.92, 6.20, 6.04 [22:59:30] oh [23:00:02] !log [@mw11] starting deploy of {'l10nupdate': True} to ovlon [23:00:02] !log [@mwtask111] starting deploy of {'l10nupdate': True} to scsvg [23:00:03] !log [@test101] starting deploy of {'l10nupdate': True} to skip [23:00:04] !log [@test3] starting deploy of {'l10nupdate': True} to skip [23:00:04] !log [@test101] DEPLOY ABORTED: Canary check failed for localhost [23:00:06] * dmehus is not an Apple fan; for search, I'd go with StartPage, and for an OS, Manjaro :P [23:00:07] !log [@mwtask111] DEPLOY ABORTED: Canary check failed for localhost [23:00:28] Debian++ [23:00:34] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:00:36] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:00:41] MirahezeLSBot_: you can't work with no DB [23:00:43] Well done [23:00:44] ssh-agent: there's a Debian++ fork of Debian? [23:00:47] Just be happy [23:00:56] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.23, 6.20, 5.94 [23:00:58] dmehus: no some bots on irc allow you to vote for stuff [23:00:58] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 4 datacenters are down: 51.195.220.68/cpweb, 149.56.140.43/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [23:01:02] Like give it points [23:01:07] oh [23:01:07] By doing that [23:01:10] gotcha [23:01:10] yeah [23:01:13] None of ours [23:01:18] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:01:22] ah [23:01:40] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:02:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:02:30] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 5.521 second response time [23:02:36] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 6.158 second response time [23:02:40] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 6.076 second response time [23:02:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:02:58] RhinosF1, or CosmicAlpha, when's the next puppet run? [23:03:21] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.029 second response time [23:03:32] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14556 bytes in 0.339 second response time [23:03:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:04:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [23:04:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:04:57] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [23:05:36] PROBLEM - cp21 Stunnel Http for mw13 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:05:43] PROBLEM - cp30 Stunnel Http for mw13 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:05:55] PROBLEM - cloud5 Current Load on cloud5 is CRITICAL: CRITICAL - load average: 26.72, 21.85, 19.28 [23:06:07] PROBLEM - cp20 Stunnel Http for mw13 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:06:25] PROBLEM - mw13 MediaWiki Rendering on mw13 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:06:48] PROBLEM - cp31 Stunnel Http for mw13 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:07:26] PROBLEM - test3 APT on test3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:08:44] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.77, 5.79, 5.88 [23:08:48] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 198.244.148.90/cpweb, 149.56.140.43/cpweb [23:08:55] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 4 datacenters are down: 51.195.220.68/cpweb, 2001:41d0:801:2000::1b80/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [23:09:06] PROBLEM - cp20 Stunnel Http for mw11 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:09:07] PROBLEM - db111 Current Load on db111 is CRITICAL: CRITICAL - load average: 4.40, 7.15, 5.05 [23:09:17] PROBLEM - cp31 Stunnel Http for mw11 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:09:27] RECOVERY - test3 APT on test3 is OK: APT OK: 19 packages available for upgrade (0 critical updates). [23:09:45] RECOVERY - cp21 Stunnel Http for mw13 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 5.265 second response time [23:09:47] RECOVERY - cp30 Stunnel Http for mw13 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 5.529 second response time [23:09:55] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 18.06, 22.16, 20.18 [23:10:12] RECOVERY - cp20 Stunnel Http for mw13 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 1.162 second response time [23:10:30] RECOVERY - mw13 MediaWiki Rendering on mw13 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 0.217 second response time [23:10:48] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [23:10:50] RECOVERY - cp31 Stunnel Http for mw13 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.362 second response time [23:10:55] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [23:11:01] RECOVERY - cp20 Stunnel Http for mw11 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 0.725 second response time [23:11:07] PROBLEM - db111 Current Load on db111 is WARNING: WARNING - load average: 1.83, 5.26, 4.61 [23:11:13] RECOVERY - cp31 Stunnel Http for mw11 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14571 bytes in 0.338 second response time [23:11:36] !log [@mw11] starting deploy of {'config': True} to ovlon [23:11:55] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 16.02, 19.80, 19.55 [23:12:01] !log [@mw11] finished deploy of {'config': True} to ovlon - SUCCESS in 24s [23:12:08] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:12:38] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.85, 6.00, 5.91 [23:12:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:12:52] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:13:03] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.14, 6.51, 6.18 [23:13:07] RECOVERY - db111 Current Load on db111 is OK: OK - load average: 2.43, 4.28, 4.32 [23:13:33] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:13:51] PROBLEM - mw12 MediaWiki Rendering on mw12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:13:52] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:13:58] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:15:01] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 6.39, 6.54, 6.24 [23:15:18] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.41, 3.18, 3.14 [23:17:15] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 3.34, 3.34, 3.21 [23:17:21] !log [@test101] starting deploy of {'config': True} to skip [23:17:22] !log [@test101] DEPLOY ABORTED: Canary check failed for localhost [23:18:10] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 9.651 second response time [23:18:15] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:18:36] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:19:13] PROBLEM - mw8 MediaWiki Rendering on mw8 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:19:16] PROBLEM - cp21 Stunnel Http for mw8 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:19:25] PROBLEM - cp31 Stunnel Http for mw8 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:19:37] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 51.195.220.68/cpweb, 2001:41d0:801:2000::4c25/cpweb, 2607:5300:201:3100::929a/cpweb [23:19:41] PROBLEM - cp20 Stunnel Http for mw8 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:19:51] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 6 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [23:19:51] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.35, 20.42, 19.56 [23:20:39] PROBLEM - test101 Puppet on test101 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [23:21:21] RECOVERY - cp21 Stunnel Http for mw8 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 9.361 second response time [23:21:34] dmehus: they are every 30 minutes [23:21:46] It's also logged when things deploy [23:22:00] Production will come from mw11 [23:22:06] !log [@mwtask111] starting deploy of {'config': True} to scsvg [23:22:07] !log [@mwtask111] DEPLOY ABORTED: Canary check failed for localhost [23:22:35] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:22:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:22:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:23:14] RECOVERY - mw8 MediaWiki Rendering on mw8 is OK: HTTP OK: HTTP/1.1 200 OK - 20514 bytes in 2.595 second response time [23:23:25] PROBLEM - db101 Current Load on db101 is WARNING: WARNING - load average: 2.74, 5.17, 3.73 [23:23:29] RECOVERY - cp31 Stunnel Http for mw8 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14546 bytes in 0.820 second response time [23:23:40] RECOVERY - cp20 Stunnel Http for mw8 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14554 bytes in 0.406 second response time [23:25:28] PROBLEM - mwtask111 Puppet on mwtask111 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[MediaWiki Config Sync] [23:25:49] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [23:26:02] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 6.982 second response time [23:26:12] RECOVERY - mw12 MediaWiki Rendering on mw12 is OK: HTTP OK: HTTP/1.1 200 OK - 20526 bytes in 7.972 second response time [23:26:17] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.52, 5.08, 5.74 [23:26:26] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 4.096 second response time [23:26:40] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 14557 bytes in 2.687 second response time [23:27:07] PROBLEM - db111 Current Load on db111 is CRITICAL: CRITICAL - load average: 9.83, 9.15, 6.37 [23:27:14] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [23:27:22] PROBLEM - db101 Current Load on db101 is CRITICAL: CRITICAL - load average: 6.99, 5.55, 4.18 [23:27:36] PROBLEM - test3 Current Load on test3 is CRITICAL: CRITICAL - load average: 4.97, 2.66, 1.59 [23:27:50] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 14565 bytes in 0.308 second response time [23:27:55] PROBLEM - cloud5 Current Load on cloud5 is CRITICAL: CRITICAL - load average: 32.00, 23.16, 20.02 [23:28:07] PROBLEM - test3 APT on test3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:29:07] !log [@test3] starting deploy of {'config': True} to skip [23:29:08] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 2s [23:29:09] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.67, 3.65, 3.27 [23:29:21] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 2.95, 4.49, 3.95 [23:29:27] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:29:37] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.90, 20.38, 20.14 [23:29:55] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 23.72, 23.22, 20.45 [23:30:04] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:30:08] RECOVERY - test3 APT on test3 is OK: APT OK: 19 packages available for upgrade (0 critical updates). [23:31:07] PROBLEM - db111 Current Load on db111 is WARNING: WARNING - load average: 2.39, 5.34, 5.42 [23:31:09] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.65, 3.20, 3.15 [23:31:35] RECOVERY - test3 Current Load on test3 is OK: OK - load average: 1.83, 3.04, 2.08 [23:31:55] PROBLEM - cloud5 Current Load on cloud5 is CRITICAL: CRITICAL - load average: 25.61, 24.48, 21.26 [23:32:44] PROBLEM - gluster4 Current Load on gluster4 is WARNING: WARNING - load average: 5.26, 4.91, 3.80 [23:33:07] RECOVERY - db111 Current Load on db111 is OK: OK - load average: 1.64, 4.08, 4.94 [23:33:17] PROBLEM - db101 Current Load on db101 is WARNING: WARNING - load average: 3.13, 5.23, 4.46 [23:34:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 9.38, 6.26, 5.86 [23:34:35] RhinosF1, oh, let me check again if CosmicAlpha's ManageWiki change has been deployed [23:34:43] RECOVERY - gluster4 Current Load on gluster4 is OK: OK - load average: 4.68, 4.84, 3.92 [23:35:55] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 22.03, 23.24, 21.51 [23:35:59] RhinosF1, or CosmicAlpha, do changes to ManageWiki default extensions only affect *new* wikis, or should they affect *existing* wikis, too? [23:36:09] !log [@test3] finished deploy of {'l10nupdate': True} to skip - SUCCESS in 2166s [23:36:12] I'm okay with either, but if the latter, I'm not seeing that [23:36:15] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.10, 5.58, 5.66 [23:36:27] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:36:37] !log [@mw11] finished deploy of {'l10nupdate': True} to ovlon - SUCCESS in 2194s [23:36:49] dmehus: this one will effect all wikis once I enable them on existing, which I am about to do as agent said that is what should be done. But can still be disabled in ManageWiki if wished. [23:36:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:37:14] CosmicAlpha, oh, right, you have to run foreachindblist or whatever on loginwiki, right? [23:37:15] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 2.33, 4.33, 4.34 [23:38:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.91, 6.03, 5.80 [23:39:07] PROBLEM - db111 Current Load on db111 is CRITICAL: CRITICAL - load average: 21.12, 11.40, 7.44 [23:40:29] dmehus `mwscript all`, which runs foreachwikiindblist yeah. [23:40:43] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 4 datacenters are down: 51.195.220.68/cpweb, 149.56.140.43/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb [23:41:11] PROBLEM - db101 Current Load on db101 is CRITICAL: CRITICAL - load average: 6.01, 5.53, 4.80 [23:42:15] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.73, 5.71, 5.72 [23:43:07] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 27.00, 23.36, 21.42 [23:43:10] RECOVERY - db101 Current Load on db101 is OK: OK - load average: 2.50, 4.33, 4.44 [23:43:27] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 4 datacenters are down: 2001:41d0:801:2000::1b80/cpweb, 149.56.141.75/cpweb, 2607:5300:201:3100::929a/cpweb, 2607:5300:201:3100::5ebc/cpweb [23:44:07] PROBLEM - db121 Current Load on db121 is CRITICAL: CRITICAL - load average: 4.28, 6.90, 4.97 [23:44:13] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:44:14] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 8.07, 6.93, 6.01 [23:44:15] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 5.93, 6.03, 5.85 [23:45:04] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.88, 22.69, 21.44 [23:45:07] PROBLEM - db111 Current Load on db111 is WARNING: WARNING - load average: 2.06, 4.66, 5.60 [23:45:22] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [23:45:22] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.01, 6.60, 6.13 [23:46:07] RECOVERY - db121 Current Load on db121 is OK: OK - load average: 1.66, 5.07, 4.54 [23:46:09] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20524 bytes in 0.185 second response time [23:46:13] CosmicAlpha, ah, thanks, yeah that's the one :) [23:46:14] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 7.39, 6.80, 6.06 [23:46:15] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.29, 5.35, 5.62 [23:46:41] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [23:47:19] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.74, 6.16, 6.01 [23:47:55] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 15.71, 18.28, 20.13 [23:48:14] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 4.32, 5.80, 5.78 [23:48:37] RECOVERY - test101 Puppet on test101 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:48:58] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.56, 19.00, 20.14 [23:49:07] RECOVERY - db111 Current Load on db111 is OK: OK - load average: 1.70, 3.03, 4.71 [23:50:39] ok : [RESOLVED] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [23:51:05] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 198.244.148.90/cpweb, 2001:41d0:801:2000::1b80/cpweb [23:51:24] !log set io threads to 32 on gluster3-4 [23:51:26] RECOVERY - mwtask111 Puppet on mwtask111 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [23:52:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:53:18] !log set io threads to 32 on gluster101 [23:53:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:54:39] alerting : [FIRING:1] (PHP-FPM Worker Usage High mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [23:54:54] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [23:58:56] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.97, 3.42, 3.30