[00:02:09] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 33.22, 22.17, 17.75 [00:04:09] PROBLEM - graylog2 Puppet on graylog2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [00:04:17] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.12, 5.45, 4.70 [00:06:08] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.43, 23.28, 19.19 [00:06:17] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.49, 5.03, 4.64 [00:08:09] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 27.05, 25.33, 20.48 [00:10:16] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.90, 6.54, 5.37 [00:11:25] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.46, 3.19, 2.70 [00:13:20] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.21, 2.85, 2.64 [00:32:07] RECOVERY - graylog2 Puppet on graylog2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:07] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 14.65, 18.67, 23.24 [00:56:08] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 14.23, 16.63, 20.37 [00:58:33] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.02, 4.62, 5.97 [01:04:37] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.02, 5.26, 5.70 [01:06:36] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.23, 5.22, 5.64 [01:16:37] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.01, 4.33, 5.00 [01:27:49] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 9.53, 6.49, 5.45 [01:29:50] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.98, 5.63, 5.25 [01:31:51] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.72, 6.32, 5.57 [01:35:14] PROBLEM - studentwiki.ddns.net - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for studentwiki.ddns.net could not be found [01:39:56] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.95, 5.32, 5.50 [01:45:54] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 2.51, 3.86, 4.87 [01:53:19] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.51, 3.35, 3.14 [01:53:59] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.20, 4.96, 4.90 [01:57:10] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.61, 3.06, 3.08 [01:58:00] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.12, 4.80, 4.88 [02:36:16] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.50, 6.44, 5.25 [02:36:20] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.08, 22.26, 18.42 [02:36:20] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 7.05, 5.69, 4.21 [02:38:19] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.72, 21.54, 18.62 [02:38:19] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 4.03, 4.92, 4.09 [02:40:14] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.96, 5.62, 5.18 [02:40:17] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 18.82, 20.32, 18.50 [02:44:13] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.85, 4.65, 4.89 [03:16:20] SRE, 503 backend fetch failed on `onlyonewiki` [03:16:21] ```Error 503 Backend fetch failed, forwarded for , 127.0.0.1 [03:16:21] (Varnish XID 434474458) via cp15.miraheze.org at Wed, 13 Oct 2021 03:15:26 GMT.``` [03:37:05] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 14.32, 8.39, 5.20 [03:39:02] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 4.46, 6.75, 4.98 [03:40:15] A lot of users are reporting issues when accessing pages [03:40:46] moviepediawiki, onibuswiki and closinglogosgroupwiki [04:20:11] PROBLEM - lcn.zfc.id.lv - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'lcn.zfc.id.lv' expires in 7 day(s) (Thu 21 Oct 2021 04:19:31 GMT +0000). [04:22:48] !sre At least 4 wikis have some pages where they receive internal errors while trying to edit [04:25:09] PROBLEM - wiki.landev.vn - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'wiki.landev.vn' expires in 7 day(s) (Thu 21 Oct 2021 04:18:15 GMT +0000). [04:27:56] having the same issue at closinglogosgroup [04:32:05] hb1290: We apologize for the inconvenience. We hope to have this fixed soon, thank you for your patience. [05:04:34] PROBLEM - ping6 on dbbackup1 is WARNING: PING WARNING - Packet loss = 0%, RTA = 165.74 ms [05:05:40] Agent: I raised task to UBN. I'm currently mobile though so no graylog access, to see exact error. [05:06:07] https://phabricator.miraheze.org/T8160 [05:06:08] [url] ⚓ T8160 Fatal exception of type "TypeError" | phabricator.miraheze.org [05:06:16] Oh alright, thanks. That's unfortunate but I'm sure someone else will come soon to deal with this [05:06:37] RECOVERY - ping6 on dbbackup1 is OK: PING OK - Packet loss = 0%, RTA = 92.77 ms [05:09:58] RhinosF1: When possible do you mind trying to look at ^? I'll be unavailable for the rest of the night. [05:11:31] Reception123: mind looking into https://phabricator.miraheze.org/T8160 since your here now? I'll be unavailable till tomorrow. Its been reported on around 5 wikis already. [05:11:32] [url] ⚓ T8160 Fatal exception of type "TypeError" | phabricator.miraheze.org [05:12:28] CosmicAlpha: yes I just saw that and was just logging into Graylog [05:12:35] Thanks! [05:12:41] Argument 1 passed to Wikimedia\IPSet::__construct() must be of the type array, string given, called in /srv/mediawiki/w/extensions/StopForumSpam/includes/DenyListManager.php on line 70 [05:12:50] https://www.irccloud.com/pastebin/pE9yn3Xi/ [05:14:00] ^ CosmicAlpha [05:15:23] I don't see any recent updates for StopForumSpam either [05:18:40] Reception123: yeah I just looked. I don't see anything either, to that or vendor (that would cause this) [05:18:52] yeah, I also check vendor and that was 12 days ago [05:18:59] so I'm really unsure what could cause this if nothing was updatd [05:19:19] test3 on 1.37 is also unaffected [05:20:21] Yeah no idea. [05:20:46] No changes in config that I saw either. [05:21:49] CosmicAlpha: hmm, I just enabled it on testwiki and there's no errors [05:22:43] What schema changes did RhinosF1 do today? It might be database related. [05:23:03] let's check SAL [05:23:33] https://www.irccloud.com/pastebin/zZLNygj8/ [05:24:10] though that should just be for test3 [05:24:24] only flaggedrevs stuff should've been global [05:24:55] Reception123: https://github.com/wikimedia/mediawiki-extensions-StopForumSpam/blob/REL1_36/includes/DenyListManager.php#L66-L67 which in turn loads some database stuff, which is used in L70 (doUpdate) so it might be those schema changes as that's the only change I can think of. But I don't know how really. [05:24:56] [url] mediawiki-extensions-StopForumSpam/DenyListManager.php at REL1_36 · wikimedia/mediawiki-extensions-StopForumSpam · GitHub | github.com [05:25:30] CosmicAlpha: yeah, though they only did happen on test3 according to SAL [05:26:40] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.30, 5.76, 4.50 [05:26:53] Reception123: That's not the impression I got earlier though maybe I misunderstood. Try running https://github.com/wikimedia/mediawiki-extensions-StopForumSpam/blob/REL1_36/maintenance/updateDenyList.php on a single broken wiki, as I'm not really sure what that will do. It would be helpful if it were reproducible on a test wiki. [05:26:53] [url] mediawiki-extensions-StopForumSpam/updateDenyList.php at REL1_36 · wikimedia/mediawiki-extensions-StopForumSpam · GitHub | github.com [05:27:14] CosmicAlpha: ok, let's see. But yeah, too bad we can't reproduce on test3 [05:28:41] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.24, 4.97, 4.39 [05:28:43] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/extensions/StopForumSpam/maintenance/updateDenyList.php --wiki=moviiepediawiki (END - exit=2) [05:28:47] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [05:28:52] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/extensions/StopForumSpam/maintenance/updateDenyList.php --wiki=moviepediawiki (END - exit=65280) [05:28:53] if I had to guess, SFS 1.36 and 1.37 caching formats don't support each other (and SFS does not set cache version to detect it) [05:28:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [05:29:22] CosmicAlpha: running that just gives me the same error [05:29:48] majavah: oh, but 1.37 is only active on test3 so how could it affect production wikis? [05:31:13] https://github.com/wikimedia/mediawiki-extensions-StopForumSpam/blob/REL1_36/includes/DenyListManager.php#L62 [05:31:14] [url] mediawiki-extensions-StopForumSpam/DenyListManager.php at REL1_36 · wikimedia/mediawiki-extensions-StopForumSpam · GitHub | github.com [05:31:28] yeah I figured it was likely schema-related [05:31:32] Yeah that was what I was just getting at. (What majavah said, though not exactly there yet) but it does seem different, so it makes sense. I do know for sure it's related to the 1.37 update on test3 though. [05:32:23] hmm - so what can we do to fix that then? [05:33:00] check RhinosF1's bash history and see what keystrokes he issued to change the db tables, and revert them? [05:33:19] well that's already in SAL [05:33:25] No idea tbh. dmehus: it's not DB related. I was wrong about that. [05:33:26] ah, true [05:33:33] CosmicAlpha, oh [05:34:08] majavah: do you have any idea about how we could fix it? Even as a temporary measure [05:34:13] It is 1.37 related though test3 is overriding the global cache set from 1.36 I think, and 1.37 and 1.36 cache is incompatible. [05:34:46] though how would it override it if there's two separate branches? [05:36:13] Reception123: manually purge the cache key via shell.php, and ensure SFS is not loaded in 1.37 [05:36:41] we might be able to fix it in the extension code too, but that'll take time [05:37:09] I see, how would I purge the cache key with shell.php, I've never done that before [05:37:19] I'll disable SFS on test3 now [05:37:30] Reception123: Because its not branch specific. I'm not to knowledgeable when it comes to Object Cacheing, but I do believe it's because the object cache is set from tesr3, not production because that's what it last expired on, its not set from a production wiki. But yeah you'll have to manually purge the cache key in eval.php, we don't use shell.php as we don't have one of the dev dependencies installed to use that. [05:38:51] CosmicAlpha: I see, I'm looking at docs but can't find the command needed to purge the cache key [05:39:25] There are no docs for it. Let me see if I can find out what it needs to be. I can't think of it right now either. [05:40:33] $wan = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache(); $wan->delete( $wan->makeGlobalKey( 'sfs-denylist-set ' ) ); iirc, although I'm on mobile so can't confirm [05:40:34] Reception123: `MediaWiki\StopForumSpam\DenyListUpdate::purgeDenyListIPs()` possibly. [05:41:07] majavah: ^ shouldn't just running that function do it? [05:41:10] no space inside the quotes though, oops [05:41:48] maybe, but that requires the class loads fine, which is not a guarantee if the cache is broken [05:43:07] would I have to run that on a production wiki then, or would test3 be fine? [05:43:42] If not then yeah `$wanCache = MediaWikiServices::getInstance()->getMainWANObjectCache(); return $wanCache->delete( $wanCache->makeGlobalKey( 'sfs-denylist-set' ) );` should work. Which is just that function without SFS dependency. [05:43:59] Reception123:Any server should work I think (including test3) [05:44:21] ^that command without return. [05:44:27] ok, let me try [05:45:13] CosmicAlpha: should it be \MediaWiki\MediaWikiServices? [05:45:15] And also \MediaWiki\MediaWikiServices [05:45:17] Yeah. [05:45:20] yeah, I thought so too [05:45:23] My bad. [05:45:55] CosmicAlpha: majavah that fixed it, thanks to both! :) [05:46:02] !log > $wanCache = \MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache();$wanCache->delete($wanCache->makeGlobalKey( 'sfs-denylist-set' )); on test3 [05:46:05] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [05:46:20] No problem. Glad it's fixed! [05:48:40] I'm awake [05:49:34] Reception123: I closed task. We can't do anymore on our side unless upstream does. It's not a huge deal though, it will work when we make production 1.37. [05:50:00] Morning RhinosF1 [05:50:12] RhinosF1: TL;DR StopForumSpam on test3 messed with object cache and made wikis with SFS on prod go down partially [05:50:36] Okay [05:50:58] dmehus: for the record: yes they are 2 production schema changes running but they add stuff [05:52:27] We need to raise that [05:52:41] It should use a cache version [05:52:50] It's an upgrade blocker [05:54:24] It'll work when production is 1.37 (almost certain of that). But it can be raised upstream I guess. [05:58:21] Wont work during the update though [06:17:18] RhinosF1, ack, okay, yeah no problem [06:17:30] Reception123: ack [06:17:38] I'll take your word for it :P [06:17:58] dmehus: also no way they'd hit moviepedia wiki [06:18:10] They are taking forever [06:18:17] * RhinosF1 curses the image table [06:18:48] RhinosF1, are you talking about the SFS issue, or something else? [06:18:52] * dmehus is a bit confused here [06:19:02] dmehus: the database changes [06:19:33] yeah, but what's moviepediawiki got to do with it? [06:26:11] dmehus: that was one with the error [06:26:18] Reception123: can you run select * from performance_schema.metadata_locks where db = '1134819wiki'; [06:26:36] the schema change is stuck [06:26:54] dmehus: we've performed the change on 1 wiki in 8. hours [06:28:19] RhinosF1: which DB is that part of? [06:29:01] RhinosF1, oh yeah, makes sense Coco would have SNS enabled [06:29:11] his wiki had a lot of spambots [06:29:22] s/SNS/SMS [06:29:22] dmehus meant to say: RhinosF1, oh yeah, makes sense Coco would have SMS enabled [06:29:27] err [06:29:31] that's B.RC.3 :P - before ReCaptcha3! [06:29:35] s/SNS/SFS [06:29:35] dmehus meant to say: RhinosF1, oh yeah, makes sense Coco would have SFS enabled [06:29:49] R123, yeah [06:31:39] Reception123: c3 [06:32:05] db12 [06:33:11] is that supposed to be a table in 1134819wiki? because I don't see one [06:34:08] Reception123: no it's a metadata table [06:34:18] it's from mysql [06:34:37] ok, night [06:36:24] oh I see [06:36:26] dmehus: good night :) [06:36:50] RhinosF1: hmm ERROR 1146 (42S02): Table 'performance_schema.metadata_locks' doesn't exist [06:37:20] paladox: can you help? my schema change is stuck [06:38:14] yeah there's all kinds of tables in performance_schema but metadata_locks isn't one of them [06:38:23] Reception123: like what [06:38:39] https://www.irccloud.com/pastebin/obN3nc29/ [06:39:03] no idea [06:40:39] Reception123: https://phabricator.miraheze.org/T8163 [06:40:40] [url] ⚓ T8163 Unstick schema change | phabricator.miraheze.org [06:41:52] ack [06:55:52] Reception123: can you deploy https://github.com/miraheze/SpriteSheet/pull/8 [06:55:52] [url] Replace usages of `$wgUser` by Universal-Omega · Pull Request #8 · miraheze/SpriteSheet · GitHub | github.com [06:55:55] to 1.37 [06:57:01] Oh, I meant to do that today, but forgot. [06:57:36] I've got to go in 5 mins so can't as that won't be enough time :( (I'd also have to reclone MW) [06:58:07] [02SpriteSheet] 07Universal-Omega edited pull request 03#8: Replace usages of `$wgUser` - 13https://git.io/JKL64 [06:58:22] [02SpriteSheet] 07Universal-Omega edited pull request 03#8: Replace usages of `$wgUser` - 13https://git.io/JKL64 [07:07:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.24, 20.35, 17.26 [07:07:27] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 20.50, 17.29, 14.63 [07:09:23] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 16.25, 18.86, 17.11 [07:09:27] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 14.40, 16.33, 14.62 [07:21:36] PROBLEM - wiki.elgeis.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:29:42] !sre All requests are timing out. [07:31:17] To what wiki [07:31:30] Any, Phab included. [07:31:46] Ye [07:31:51] Includes icinga [07:31:56] Hang by [07:32:00] We're dead [07:32:05] Reception123: ^ [07:33:10] .op [07:33:10] Attempting to OP... [07:34:30] This outage makes no sense [07:34:38] But I got to get to college [07:34:39] RhinosF1: mobile only for the entire day I'm afraid [07:34:53] Reception123: can you get Owen to message John [07:35:02] I might be able to look in around 3 hours [07:35:19] I'm blaming OVH [07:35:25] RhinosF1: yeah I could DM him [07:35:30] Looks to be complete network loss [07:35:39] Well at least I know that it's not just my site. Whew. [07:35:49] What's down, exactly? [07:38:24] Where did everyone go? [07:38:47] Everything is down. Matomo, Mail, MediaWiki, NS*. [07:38:50] Is this related to the supposed "downtime?" [07:38:57] RhinosF1: back up for me [07:39:11] Able to access phab and Meta with no issues [07:39:17] I cant. [07:39:22] Meta still gives 503 [07:39:28] PROBLEM - thesimswiki.com - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - thesimswiki.com All nameservers failed to answer the query. [07:39:36] My sites are all good [07:39:42] And so does Icinga, Grafana, Webmail, Phabricator. [07:39:50] Apparently, it seems like a whole DNS server might have gone down if my windows troubleshooter is anything to go by. Several other sites not on miraheze I frequent are down as well. Seems like big names like google and youtube are fine though. [07:40:04] PROBLEM - housing.wiki - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - housing.wiki All nameservers failed to answer the query. [07:40:14] Might be an issue with our provider OVH [07:40:28] PROBLEM - wiki.elgeis.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.elgeis.com' expires in 14 day(s) (Thu 28 Oct 2021 03:49:16 GMT +0000). [07:40:29] my wiki works [07:40:44] Ah, my wiki is back up. Probably rolling outage that's just passing through. [07:40:54] Wonder if there was a weird storm surge somewhere. [07:41:12] PROBLEM - pj-masks-info.cf - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'pj-masks-info.cf' expires in 10 day(s) (Sat 23 Oct 2021 08:19:02 GMT +0000). [07:42:03] Reception123: I've never seen an outage like this before. We went completely and entirely down across all (at least web-accessible) services. Usually one or two services is all. Not everything. [07:42:33] PROBLEM - cp12 Stunnel Http for mon2 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:42:36] PROBLEM - cp15 Stunnel Http for mw10 on cp15 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:42:42] PROBLEM - nocyclo.tk - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - nocyclo.tk All nameservers failed to answer the query. [07:42:44] PROBLEM - ping4 on cp15 is CRITICAL: PING CRITICAL - Packet loss = 100% [07:42:45] It's not DNS [07:42:50] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 4 datacenters are down: 51.222.25.132/cpweb, 167.114.2.161/cpweb, 2607:5300:205:200::1c30/cpweb, 2607:5300:201:3100::1d3/cpweb [07:42:51] PROBLEM - cp12 Varnish Backends on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:43:02] PROBLEM - cp12 Stunnel Http for mw11 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:43:02] PROBLEM - Host cp15 is DOWN: PING CRITICAL - Packet loss = 100% [07:43:05] CosmicAlpha: it's unlikely us [07:43:09] PROBLEM - cp12 Stunnel Http for mw8 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:43:17] PROBLEM - ping4 on cp12 is CRITICAL: PING CRITICAL - Packet loss = 100% [07:43:18] I'm assuming a provider incident [07:43:21] This is scary, needless to say. [07:43:26] PROBLEM - cp12 Disk Space on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:43:36] PROBLEM - cp12 Stunnel Http for mw13 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:43:41] PROBLEM - cp12 NTP time on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:43:46] PROBLEM - cp12 Stunnel Http for mw12 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:43:53] PROBLEM - incubator.nocyclo.tk - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - incubator.nocyclo.tk All nameservers failed to answer the query. [07:43:53] PROBLEM - cp12 Puppet on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:43:53] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 4 datacenters are down: 51.222.25.132/cpweb, 167.114.2.161/cpweb, 2607:5300:205:200::1c30/cpweb, 2607:5300:201:3100::1d3/cpweb [07:44:00] PROBLEM - cp12 SSH on cp12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:44:05] DEFCON 1 [07:44:05] PROBLEM - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:44:10] PROBLEM - cp12 Stunnel Http for mw10 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:44:16] PROBLEM - test3 Puppet on test3 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 8 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki config],Exec[git_pull_landing],Exec[git_pull_ErrorPages] [07:44:19] PROBLEM - cp12 Stunnel Http for mw9 on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:44:24] PROBLEM - mw11 Puppet on mw11 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 7 minutes ago with 4 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner],Exec[git_pull_MediaWiki config],Exec[git_pull_landing],Exec[git_pull_ErrorPages] [07:44:25] PROBLEM - cp12 HTTPS on cp12 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:44:41] PROBLEM - phab2 Puppet on phab2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 11 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_phabricator-extensions] [07:44:50] PROBLEM - cp12 PowerDNS Recursor on cp12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:44:50] PROBLEM - mw8 Puppet on mw8 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 9 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner],Exec[git_pull_landing],Exec[git_pull_ErrorPages] [07:44:57] PROBLEM - Host cp12 is DOWN: PING CRITICAL - Packet loss = 100% [07:45:02] PROBLEM - mw13 Puppet on mw13 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 7 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner],Exec[git_pull_landing],Exec[git_pull_ErrorPages] [07:45:23] PROBLEM - en.nocyclo.tk - reverse DNS on sslhost is WARNING: Traceback (most recent call last): File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 148, in main() File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 116, in main rdns_hostname = get_reverse_dnshostname(args.hostname) File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 103, in get_reverse_dnshostname rev_host = str(resolver.query(ptr_record, "PTR")[0] [07:45:23] rip('.') File "/usr/lib/python3/dist-packages/dns/resolver.py", line 1102, in query lifetime) File "/usr/lib/python3/dist-packages/dns/resolver.py", line 992, in query timeout = self._compute_timeout(start, lifetime) File "/usr/lib/python3/dist-packages/dns/resolver.py", line 799, in _compute_timeout raise Timeout(timeout=duration)dns.exception.Timeout: The DNS operation timed out after 30.002227783203125 seconds [07:45:25] PROBLEM - es.nocyclo.tk - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:45:28] PROBLEM - dreamsit.com.br - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:45:29] PROBLEM - es.nocyclo.tk - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - es.nocyclo.tk All nameservers failed to answer the query. [07:45:31] PROBLEM - meta.nocyclo.tk - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - meta.nocyclo.tk All nameservers failed to answer the query. [07:45:37] PROBLEM - mw9 Puppet on mw9 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 10 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner],Exec[git_pull_landing],Exec[git_pull_ErrorPages] [07:45:38] PROBLEM - incubator.nocyclo.tk - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:45:41] PROBLEM - housing.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:45:45] PROBLEM - en.nocyclo.tk - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:46:08] PROBLEM - mw10 Puppet on mw10 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 10 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner],Exec[git_pull_landing],Exec[git_pull_ErrorPages] [07:46:09] Looks like the network [07:46:33] seems like OVH is having issues [07:46:35] PROBLEM - mw12 Puppet on mw12 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 9 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner],Exec[git_pull_landing],Exec[git_pull_ErrorPages] [07:46:49] majavah: yep [07:48:07] majavah: any incident page? [07:48:15] Normal one is down [07:48:27] nothing I'm aware of [07:48:40] Fun [07:48:50] Reception123: can you get on the phone [07:49:46] searching "OVH" on twitter is what I was using to make that conclusion [07:49:58] Ye [07:50:44] Very impactful for a no-op [07:52:13] PROBLEM - incubator.nocyclo.tk - reverse DNS on sslhost is WARNING: Traceback (most recent call last): File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 148, in main() File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 116, in main rdns_hostname = get_reverse_dnshostname(args.hostname) File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 103, in get_reverse_dnshostname rev_host = str(resolver.query(ptr_record, "P [07:52:13] 0]).rstrip('.') File "/usr/lib/python3/dist-packages/dns/resolver.py", line 1102, in query lifetime) File "/usr/lib/python3/dist-packages/dns/resolver.py", line 900, in query timeout = self._compute_timeout(start, lifetime) File "/usr/lib/python3/dist-packages/dns/resolver.py", line 799, in _compute_timeout raise Timeout(timeout=duration)dns.exception.Timeout: The DNS operation timed out after 30.002792596817017 seconds [07:53:29] PROBLEM - es.nocyclo.tk - reverse DNS on sslhost is WARNING: Traceback (most recent call last): File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 148, in main() File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 116, in main rdns_hostname = get_reverse_dnshostname(args.hostname) File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 103, in get_reverse_dnshostname rev_host = str(resolver.query(ptr_record, "PTR")[0] [07:53:29] rip('.') File "/usr/lib/python3/dist-packages/dns/resolver.py", line 1102, in query lifetime) File "/usr/lib/python3/dist-packages/dns/resolver.py", line 992, in query timeout = self._compute_timeout(start, lifetime) File "/usr/lib/python3/dist-packages/dns/resolver.py", line 799, in _compute_timeout raise Timeout(timeout=duration)dns.exception.Timeout: The DNS operation timed out after 30.00310468673706 seconds [07:57:06] PROBLEM - thesimswiki.com - reverse DNS on sslhost is WARNING: Traceback (most recent call last): File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 148, in main() File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 116, in main rdns_hostname = get_reverse_dnshostname(args.hostname) File "/usr/lib/nagios/plugins/check_reverse_dns.py", line 103, in get_reverse_dnshostname rev_host = str(resolver.query(ptr_record, "PTR")[ [07:57:06] strip('.') File "/usr/lib/python3/dist-packages/dns/resolver.py", line 1102, in query lifetime) File "/usr/lib/python3/dist-packages/dns/resolver.py", line 900, in query timeout = self._compute_timeout(start, lifetime) File "/usr/lib/python3/dist-packages/dns/resolver.py", line 799, in _compute_timeout raise Timeout(timeout=duration)dns.exception.Timeout: The DNS operation timed out after 30.00238847732544 seconds [07:59:06] PROBLEM - dreamsit.com.br - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - dreamsit.com.br All nameservers failed to answer the query. [08:00:11] PROBLEM - incubator.nocyclo.tk - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - incubator.nocyclo.tk All nameservers failed to answer the query. [08:01:29] PROBLEM - es.nocyclo.tk - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - es.nocyclo.tk All nameservers failed to answer the query. [08:01:35] PROBLEM - en.nocyclo.tk - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - en.nocyclo.tk All nameservers failed to answer the query. [08:03:39] RECOVERY - test3 Puppet on test3 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [08:03:41] cp12/15 are still down [08:04:04] !log downtimed sslhost for 2 hours to stop flapping alerts [08:04:12] RECOVERY - phab2 Puppet on phab2 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:04:28] RECOVERY - mw8 Puppet on mw8 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:05:36] RECOVERY - mw9 Puppet on mw9 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:05:54] RECOVERY - mw10 Puppet on mw10 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:05:58] We're up but our Canadian front ends are still offline [08:06:08] Please be patient especially if you're in America [08:06:16] RECOVERY - mw12 Puppet on mw12 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:07:14] RECOVERY - mw11 Puppet on mw11 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:07:50] RECOVERY - mw13 Puppet on mw13 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:15:23] * RhinosF1 has to go to a singal blackspot so back later [08:21:05] RECOVERY - Host cp15 is UP: PING OK - Packet loss = 16%, RTA = 85.07 ms [08:21:14] RECOVERY - cp15 Stunnel Http for mw10 on cp15 is OK: HTTP OK: HTTP/1.1 200 OK - 15752 bytes in 0.326 second response time [08:21:26] RECOVERY - ping4 on cp15 is OK: PING OK - Packet loss = 0%, RTA = 79.39 ms [08:22:47] RECOVERY - Host cp12 is UP: PING OK - Packet loss = 0%, RTA = 82.46 ms [08:22:49] RECOVERY - cp12 HTTP 4xx/5xx ERROR Rate on cp12 is OK: OK - NGINX Error Rate is 3% [08:22:49] RECOVERY - cp12 HTTPS on cp12 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 2987 bytes in 0.311 second response time [08:22:49] RECOVERY - cp12 Puppet on cp12 is OK: OK: Puppet is currently enabled, last run 28 minutes ago with 0 failures [08:22:49] RECOVERY - cp12 SSH on cp12 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) [08:22:49] RECOVERY - cp12 Stunnel Http for mw10 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15744 bytes in 0.332 second response time [08:22:54] RECOVERY - cp12 Stunnel Http for mw9 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15757 bytes in 0.312 second response time [08:23:08] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [08:23:09] RECOVERY - cp12 PowerDNS Recursor on cp12 is OK: DNS OK: 0.039 seconds response time. miraheze.org returns 167.114.2.161,2607:5300:201:3100::1d3 [08:23:17] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [08:23:20] RECOVERY - cp12 Varnish Backends on cp12 is OK: All 9 backends are healthy [08:23:34] RECOVERY - cp12 Stunnel Http for mw8 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15743 bytes in 0.323 second response time [08:23:34] RECOVERY - cp12 Stunnel Http for mw11 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15744 bytes in 0.322 second response time [08:23:34] RECOVERY - cp12 Stunnel Http for mon2 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 34599 bytes in 0.343 second response time [08:23:39] RECOVERY - cp12 Disk Space on cp12 is OK: DISK OK - free space: / 12819 MB (33% inode=96%); [08:23:44] RECOVERY - ping4 on cp12 is OK: PING OK - Packet loss = 0%, RTA = 82.30 ms [08:23:44] RECOVERY - cp12 NTP time on cp12 is OK: NTP OK: Offset -0.005567342043 secs [08:23:44] RECOVERY - cp12 Stunnel Http for mw12 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15744 bytes in 0.305 second response time [08:23:49] RECOVERY - cp12 Stunnel Http for mw13 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15744 bytes in 0.329 second response time [08:28:22] That's everything [08:53:46] JohnLewis: we seem back alive [08:54:22] Yeah, seems like schedules maintenance gone wrong [08:54:57] JohnLewis: yep [08:55:11] I haven't checked icinga but cp12/15 came back and that's all actually broke [08:55:20] I downtimed sslhost to stop it flapping [08:58:47] From before I had to do some work [08:58:48] From my last check [09:07:44] PROBLEM - gluster3 APT on gluster3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:11:51] RECOVERY - gluster3 APT on gluster3 is OK: APT OK: 24 packages available for upgrade (0 critical updates). [09:11:52] Everything looks good [09:12:05] We'll get a lot of alerts when the sslhost downtime expires [09:13:40] Network equipment failure in the US [09:20:05] what happened? [09:21:06] 10:13:40 Network equipment failure in the US [09:21:56] https://www.bleepingcomputer.com/news/technology/ovh-hosting-provider-goes-down-during-planned-maintenance/ [09:21:57] [url] OVH hosting provider goes down during planned maintenance | www.bleepingcomputer.com [09:36:05] PROBLEM - mw11 APT on mw11 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:38:03] RECOVERY - mw11 APT on mw11 is OK: APT OK: 17 packages available for upgrade (0 critical updates). [09:56:42] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 11.01, 8.43, 5.18 [09:58:40] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 4.22, 6.81, 4.97 [10:00:38] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 3.03, 5.42, 4.68 [10:02:00] PROBLEM - dreamsit.com.br - reverse DNS on sslhost is WARNING: SSL WARNING - rDNS OK but records conflict. {'NS': ['ns82.domaincontrol.com.', 'ns81.domaincontrol.com.'], 'CNAME': None} [10:02:00] RECOVERY - incubator.nocyclo.tk - LetsEncrypt on sslhost is OK: OK - Certificate 'incubator.nocyclo.tk' will expire on Wed 17 Nov 2021 04:06:46 GMT +0000. [10:02:00] RECOVERY - es.nocyclo.tk - LetsEncrypt on sslhost is OK: OK - Certificate 'es.nocyclo.tk' will expire on Mon 10 Jan 2022 05:03:48 GMT +0000. [10:02:00] RECOVERY - incubator.nocyclo.tk - reverse DNS on sslhost is OK: SSL OK - incubator.nocyclo.tk reverse DNS resolves to cp12.miraheze.org - CNAME FLAT [10:02:00] PROBLEM - thesimswiki.com - reverse DNS on sslhost is WARNING: SSL WARNING - rDNS OK but records conflict. {'NS': ['ns2.vultr.com.', 'ns1.vultr.com.'], 'CNAME': None} [10:02:01] PROBLEM - housing.wiki - reverse DNS on sslhost is WARNING: SSL WARNING - rDNS OK but records conflict. {'NS': ['ns2.dreamhost.com.', 'ns3.dreamhost.com.', 'ns1.dreamhost.com.'], 'CNAME': None} [10:02:01] RECOVERY - nocyclo.tk - reverse DNS on sslhost is OK: SSL OK - nocyclo.tk reverse DNS resolves to cp12.miraheze.org - CNAME FLAT [10:02:02] RECOVERY - en.nocyclo.tk - reverse DNS on sslhost is OK: SSL OK - en.nocyclo.tk reverse DNS resolves to cp12.miraheze.org - CNAME FLAT [10:02:02] RECOVERY - meta.nocyclo.tk - reverse DNS on sslhost is OK: SSL OK - meta.nocyclo.tk reverse DNS resolves to cp12.miraheze.org - CNAME FLAT [10:02:03] RECOVERY - en.nocyclo.tk - LetsEncrypt on sslhost is OK: OK - Certificate 'en.nocyclo.tk' will expire on Mon 10 Jan 2022 06:03:44 GMT +0000. [10:02:03] RECOVERY - housing.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'housing.wiki' will expire on Mon 10 Jan 2022 05:59:53 GMT +0000. [10:02:04] RECOVERY - dreamsit.com.br - LetsEncrypt on sslhost is OK: OK - Certificate 'dreamsit.com.br' will expire on Sat 06 Nov 2021 13:36:51 GMT +0000. [10:02:30] PROBLEM - wiki.minecraftathome.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'wiki.minecraftathome.com' expires in 6 day(s) (Wed 20 Oct 2021 08:57:08 GMT +0000). [10:02:30] PROBLEM - lcn.zfc.id.lv - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'lcn.zfc.id.lv' expires in 7 day(s) (Thu 21 Oct 2021 04:19:31 GMT +0000). [10:02:30] PROBLEM - wiki.cyberfurs.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'wiki.cyberfurs.org' expires in 6 day(s) (Wed 20 Oct 2021 06:31:20 GMT +0000). [10:02:30] PROBLEM - biblestrength.net - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'biblestrength.net' expires in 10 day(s) (Sun 24 Oct 2021 04:16:04 GMT +0000). [10:02:30] PROBLEM - private.revi.wiki - Sectigo on sslhost is WARNING: WARNING - Certificate 'private.revi.wiki' expires in 24 day(s) (Sat 06 Nov 2021 23:59:59 GMT +0000). [10:02:30] PROBLEM - wiki.mikrodev.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'wiki.mikrodev.com' expires in 6 day(s) (Wed 20 Oct 2021 05:16:53 GMT +0000). [10:02:30] PROBLEM - spiral.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'spiral.wiki' expires in 6 day(s) (Wed 20 Oct 2021 01:40:39 GMT +0000). [10:03:43] PROBLEM - mw13 Current Load on mw13 is WARNING: WARNING - load average: 7.61, 5.39, 4.16 [10:04:42] RhinosF1: resolved your problem [10:05:06] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.31, 5.29, 3.28 [10:05:16] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 15.49, 9.37, 6.41 [10:05:39] PROBLEM - mw13 Current Load on mw13 is CRITICAL: CRITICAL - load average: 8.03, 5.80, 4.43 [10:05:54] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 6.17, 6.84, 5.14 [10:06:23] JohnLewis: did it survive the network issues then? [10:06:32] Yeah [10:06:44] It's running across every wiki in screen via foreachwiki [10:07:21] I'll have a look when I get home how far it's moved [10:07:43] RECOVERY - mw13 Current Load on mw13 is OK: OK - load average: 4.96, 5.37, 4.45 [10:07:53] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 3.88, 5.84, 4.98 [10:08:14] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 6.71, 9.04, 6.43 [10:09:24] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 1.53, 3.26, 2.92 [10:12:41] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 10.86, 10.23, 7.03 [10:12:48] PROBLEM - mw13 Current Load on mw13 is CRITICAL: CRITICAL - load average: 10.21, 9.06, 6.28 [10:13:44] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 6.54, 9.78, 7.57 [10:15:02] Well at least that's resolved [10:15:05] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 4.19, 7.62, 6.50 [10:15:14] PROBLEM - mw13 Current Load on mw13 is WARNING: WARNING - load average: 6.13, 7.48, 6.06 [10:15:39] Reception123: it seems quiet [10:15:55] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 3.41, 7.53, 7.04 [10:16:58] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 2.57, 6.08, 6.08 [10:17:11] RECOVERY - mw13 Current Load on mw13 is OK: OK - load average: 4.98, 6.47, 5.85 [10:17:23] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 4.29, 7.65, 7.31 [10:17:56] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 2.93, 5.94, 6.51 [10:18:28] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 2.94, 6.19, 6.97 [10:20:31] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 2.92, 5.09, 6.47 [10:21:19] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 3.36, 5.33, 6.43 [10:27:37] !log [rhinos@mw11] sudo -u www-data /usr/local/bin/foreachwikiindblist /srv/mediawiki/cache/databases.json /srv/mediawiki/w/maintenance/sql.php /srv/mediawiki/config/137updates.sql (END - exit=0) [12:13:15] PROBLEM - mw8 APT on mw8 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:15:17] RECOVERY - mw8 APT on mw8 is OK: APT OK: 17 packages available for upgrade (0 critical updates). [12:19:46] That's not bad [12:19:59] Pre update Schema changes done [12:24:38] RhinosF1: https://github.com/wikimedia/mediawiki-extensions-RandomImage/commit/af0263de4757b17431fd2440fe02bf7c54580716 [12:24:39] [url] Revision::newFromTitle() is deprecated · wikimedia/mediawiki-extensions-RandomImage@af0263d · GitHub | github.com [12:24:49] You can easily migrate the mass* extension [12:26:08] paladox: yeah I have no issues to it being migrated, just noting it needs doing [12:26:17] And if it needs doing, add it to the list [12:26:32] Re that commit [12:36:55] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 7.38, 5.99, 4.37 [12:38:12] * RhinosF1 disagrees that 1 hour is a quick resolution time for a total network failure [12:38:52] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 4.34, 5.31, 4.32 [12:44:19] [02miraheze/puppet] 07JohnFLewis pushed 031 commit to 03master [+0/-4/±5] 13https://git.io/JKYhr [12:44:21] [02miraheze/puppet] 07JohnFLewis 036d6dcbc - dbbackup: remove all code [12:48:05] !log unmount all mount points to dbbackup1 [12:49:50] !log unmount all mount points to dbbackup1 [12:49:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:55:46] !log deleted dbbackup1 [12:55:50] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:56:15] [02miraheze/dns] 07JohnFLewis pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKOfe [12:56:17] [02miraheze/dns] 07JohnFLewis 03f5c1121 - remove dbbackup1 [13:12:28] PROBLEM - mw10 APT on mw10 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:12:59] PROBLEM - cp15 Stunnel Http for mw10 on cp15 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:13:11] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 14.15, 9.97, 6.25 [13:14:27] RECOVERY - mw10 APT on mw10 is OK: APT OK: 17 packages available for upgrade (0 critical updates). [13:15:01] RECOVERY - cp15 Stunnel Http for mw10 on cp15 is OK: HTTP OK: HTTP/1.1 200 OK - 15752 bytes in 0.346 second response time [13:17:08] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 3.17, 6.43, 5.65 [13:20:27] [02miraheze/puppet] 07JohnFLewis pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKOLL [13:20:29] [02miraheze/puppet] 07JohnFLewis 03139bf73 - mariadb: automate daily local disk space backups [13:37:57] PROBLEM - db12 Puppet on db12 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [13:51:26] PROBLEM - db11 Puppet on db11 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [13:53:26] PROBLEM - db13 Puppet on db13 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [14:04:58] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 8.99, 7.27, 5.17 [14:06:54] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 6.30, 7.00, 5.33 [14:07:31] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 8.16, 7.34, 5.24 [14:08:52] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 4.14, 5.83, 5.10 [14:09:27] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 3.76, 5.97, 4.98 [14:11:33] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.87, 5.23, 3.88 [14:13:31] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.25, 4.79, 3.88 [14:38:05] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.23, 3.36, 2.87 [14:41:57] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.19, 2.88, 2.79 [14:52:54] JohnLewis: puppet failing on the db servers [15:06:39] [02miraheze/puppet] 07JohnFLewis pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKOKp [15:06:40] [02miraheze/puppet] 07JohnFLewis 03a56004e - syntax fix [15:11:16] RECOVERY - db12 Puppet on db12 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:12:20] !log running db backups on db* [15:12:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:13:18] RECOVERY - db13 Puppet on db13 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:13:21] RECOVERY - db11 Puppet on db11 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:14:14] !log rhinos@test3:~$ sudo -u www-data php /srv/mediawiki/w/maintenance/sql.php --wiki=test3wiki /srv/mediawiki/w/extensions/CommentStreams/sql/mysql/cs_replies.sql [15:14:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:15:16] [02miraheze/mw-config] 07RhinosF1 pushed 031 commit to 03master [+1/-0/±0] 13https://git.io/JKOP8 [15:15:18] [02miraheze/mw-config] 07RhinosF1 035867ac8 - Create commentstreams_temp.sql [15:15:20] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 16.58, 7.91, 4.05 [15:15:33] PROBLEM - db13 Current Load on db13 is CRITICAL: CRITICAL - load average: 11.05, 6.38, 3.20 [15:16:05] PROBLEM - db12 Current Load on db12 is CRITICAL: CRITICAL - load average: 8.31, 5.42, 2.99 [15:16:06] [02miraheze/mw-config] 07RhinosF1 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JKOPg [15:16:07] [02miraheze/mw-config] 07RhinosF1 0343e8657 - Update ManageWikiExtensions.php [15:16:39] miraheze/mw-config - RhinosF1 the build passed. [15:17:22] miraheze/mw-config - RhinosF1 the build passed. [15:17:32] !log [rhinos@mw11] starting deploy of {'files': 'config/commentstreams_temp.sql'} to skip [15:17:33] !log [rhinos@mw11] finished deploy of {'files': 'config/commentstreams_temp.sql'} to skip - SUCCESS in 0s [15:17:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:17:43] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:18:05] RECOVERY - db12 Current Load on db12 is OK: OK - load average: 6.76, 5.75, 3.41 [15:18:08] !log [rhinos@mw11] starting deploy of {'files': 'config/commentstreams_temp.sql'} to all [15:18:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:18:18] !log [rhinos@mw11] finished deploy of {'files': 'config/commentstreams_temp.sql'} to all - SUCCESS in 10s [15:18:22] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:18:34] !log [rhinos@mw11] starting deploy of {'files': 'config/ManageWikiExtensions.php'} to all [15:18:39] !log [rhinos@mw11] finished deploy of {'files': 'config/ManageWikiExtensions.php'} to all - SUCCESS in 4s [15:18:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:18:45] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:19:45] !log create new commentstreams tables for 1.37 on all wikis [15:19:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:21:40] !log [rhinos@mw11] sudo -u www-data /usr/local/bin/foreachwikiindblist /home/rhinos/commentstreams.json /srv/mediawiki/w/maintenance/sql.php /srv/mediawiki/config/commentstreams_temp.sql (END - exit=0) [15:21:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:22:07] PROBLEM - db12 Current Load on db12 is CRITICAL: CRITICAL - load average: 8.61, 7.19, 4.54 [15:22:29] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 7.67, 7.25, 5.27 [15:24:29] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 5.32, 6.60, 5.28 [15:28:04] PROBLEM - db12 Current Load on db12 is WARNING: WARNING - load average: 6.01, 7.20, 5.45 [15:29:04] PROBLEM - cloud3 Current Load on cloud3 is WARNING: WARNING - load average: 13.81, 13.14, 10.27 [15:30:05] RECOVERY - db12 Current Load on db12 is OK: OK - load average: 5.70, 6.75, 5.50 [15:33:12] !log [@test3] starting deploy of {'config': True} to skip [15:33:13] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [15:33:18] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:33:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:35:37] PROBLEM - db13 Current Load on db13 is WARNING: WARNING - load average: 6.12, 7.44, 7.13 [15:35:37] RhinosF1: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MassEditRegex/+/730566 [15:38:05] PROBLEM - db12 Current Load on db12 is WARNING: WARNING - load average: 7.13, 7.19, 6.22 [15:38:40] seen [15:39:00] RECOVERY - cloud3 Current Load on cloud3 is OK: OK - load average: 11.79, 13.03, 11.74 [15:39:13] RhinosF1: could you get someone to review/merge it pls :) [15:40:35] paladox: currently filing tasks for all blockers [15:40:41] ok, thanks [15:42:04] PROBLEM - db12 Current Load on db12 is CRITICAL: CRITICAL - load average: 8.32, 7.64, 6.62 [15:46:03] PROBLEM - db12 Current Load on db12 is WARNING: WARNING - load average: 7.62, 7.76, 6.92 [15:47:32] majavah: could you be amazing with +2? [15:48:06] I'll look after dinner if you add myself as a reviewer in gerrit [15:49:36] PROBLEM - db13 Current Load on db13 is CRITICAL: CRITICAL - load average: 8.37, 7.66, 7.21 [15:50:23] phan failing with [15:50:25] https://www.irccloud.com/pastebin/cDT5F9tB/ [15:50:33] https://integration.wikimedia.org/ci/job/mwext-php72-phan-docker/142572/console [15:50:33] [url] mwext-php72-phan-docker #142572 Console [Jenkins] | integration.wikimedia.org [15:51:44] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.17, 3.51, 3.15 [15:53:34] PROBLEM - db13 Current Load on db13 is WARNING: WARNING - load average: 7.40, 7.91, 7.44 [15:53:40] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 3.06, 3.25, 3.08 [15:57:12] ok, fixed [15:57:32] PROBLEM - db13 Current Load on db13 is CRITICAL: CRITICAL - load average: 8.29, 7.90, 7.51 [15:58:04] RECOVERY - db12 Current Load on db12 is OK: OK - load average: 4.90, 6.31, 6.67 [16:02:19] majavah: also did https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DisqusTag/+/699509 [16:04:33] paladox: I guess should block [16:04:49] no that shouldn't be a blocker [16:04:57] we don't have that change deployed [16:05:09] but that should have been done a while ago [16:05:35] PROBLEM - db13 Current Load on db13 is WARNING: WARNING - load average: 4.63, 6.90, 7.32 [16:06:15] paladox: so that doesn't impact 1.37 differently to 1.36? [16:06:29] yup as far as i'm aware [16:06:42] Ok [16:06:51] I'll do backports anyway [16:06:58] So it'll need watching [16:07:32] RECOVERY - db13 Current Load on db13 is OK: OK - load average: 1.89, 5.10, 6.61 [16:08:46] For the record if you do see any, any new warning or error in 1.37 should block deployment [16:09:12] New warnings might not be carried forward but we still should try to report them [16:09:24] Errors/Exceptions are more important [16:13:22] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 2.76, 4.36, 7.93 [16:16:52] https://www.irccloud.com/pastebin/lm8coy4V/ [16:16:53] hmm [16:17:20] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 2.91, 3.55, 6.78 [16:19:23] paladox: what doing [16:19:35] There's no req id [16:19:41] i was trying to generate some sql using a maintenance script [16:19:54] `sudo -u www-data php maintenance/generateSchemaSql.php --wiki test3wiki` [16:20:04] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/598104 [16:20:25] paladox: try from prod [16:20:39] If it's broken in 1.37 then block it [16:20:51] i don't need to i just randomly guessed how the sql would look like [16:20:58] basically just the same as below column [16:21:14] PROBLEM - db12 Disk Space on db12 is WARNING: DISK WARNING - free space: / 46804 MB (10% inode=98%); [16:21:29] paladox: it's a broken core maint script [16:21:37] think it's because that scripts requires a composer package which is in the dev column of mediawiki composer file [16:21:39] It definately needs reporting [16:21:43] we only install non-dev files [16:21:44] Oh ok [16:21:51] https://www.irccloud.com/pastebin/6t6bRNMP/ [16:21:54] Yeah we don't do dev [16:22:09] Will be [16:22:49] If you need it, just do dev on test3 and clean up after [16:24:02] ok, i don't need to do that now :) [16:33:05] majavah: addressed [16:34:00] paladox: you bumped it to 26, not 36 [16:34:46] Oh, thanks [16:34:48] fixed now majavah [16:37:04] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 6.89, 5.86, 4.73 [16:38:03] paladox: did you test the MassEditRegex change locally? I'm getting "Edit failed: Main Page does not exist" and I'm not sure if it's just me not knowing how to use it [16:38:16] Nope [16:38:47] ""In (at least) MediaWiki 1.31 and later, when the regex you provide is invalid, it will falsely indicate that all of the pages you selected for replacement are not found."" [16:38:49] ah [16:39:04] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 4.18, 5.17, 4.62 [16:41:09] merged, I'm getting deprecation warnings about "Deprecated: Use of WikiPage::doEditContent was deprecated in MediaWiki 1.32" but I guess they can be dealt with separately [16:41:15] thanks! [16:41:17] oh [16:41:25] and also could you review/merge https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DisqusTag/+/699509 pls majavah [16:42:09] yeah, I'll deal with the MassEditRegex REL1_37 backport first [16:42:20] majavah: could you review https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MassEditRegex/+/730569 please so i can backport the fix to 37 [16:42:26] (as there's merge conflicts) [16:44:31] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 6.35, 8.01, 5.84 [16:45:08] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 5.21, 7.55, 5.78 [16:45:27] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 4.29, 7.75, 5.92 [16:45:33] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 3.74, 6.96, 5.86 [16:46:29] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 3.31, 6.29, 5.47 [16:47:07] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 4.29, 6.39, 5.55 [16:47:25] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 2.80, 6.21, 5.58 [16:47:32] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 3.37, 5.67, 5.51 [16:48:20] majavah: backport https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MassEditRegex/+/730571 :) [16:52:55] majavah: addressed [16:54:31] majavah: backport https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DisqusTag/+/730572 :) [16:55:01] I like waiting for the commit to master to merge first, just in case [16:55:25] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_37 [+0/-0/±1] 13https://git.io/JK3ZF [16:55:27] [02miraheze/mediawiki] 07paladox 035461822 - Update MassEditRegex [16:55:36] oh ok [16:56:10] !log [paladox@test3] starting deploy of {'world': True, 'gitinfo': True} to skip [16:56:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:56:13] RhinosF1: ^ [16:56:44] mostly because in case there is a CI flap or something, it ensures it everything backported to a release branch actually makes to master and does not regress in the next release [16:58:15] paladox: moved to former blockers [17:00:06] !log [paladox@test3] finished deploy of {'world': True, 'gitinfo': True} to skip - SUCCESS in 236s [17:00:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:01:54] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 11.66, 6.45, 4.07 [17:03:02] turns out we no longer have DisqusTag installed [17:03:09] which i didn't realise we uninstalled [17:03:33] No we got rid of it with CSP changes [17:03:38] PROBLEM - db13 Current Load on db13 is WARNING: WARNING - load average: 7.90, 5.90, 3.53 [17:03:45] now the Mass* extension fails with [17:03:45] Explicit transaction still active. A caller may have caught an error. Open transactions: PortableInfobox\Helpers\PagePropsProxy::set [17:03:56] https://www.irccloud.com/pastebin/KuBqQaZY/ [17:04:59] That's CosmicAlpha [17:05:12] I've already filed a bug for 1 issue [17:05:30] CosmicAlpha: ^ [17:05:34] PROBLEM - db13 Current Load on db13 is CRITICAL: CRITICAL - load average: 10.25, 7.17, 4.26 [17:05:45] https://graylog.miraheze.org/messages/graylog_332/20059661-2c47-11ec-a12b-0200001a24a4 [17:05:51] i have no idea how to fix [17:07:45] paladox: that's only on 1.37 right, not 1.36? [17:07:59] i'm not sure if it affects 1.36 [17:08:41] paladox: OK, I mean you didn't notice it to? If not then it's not an absolute priority though I'm looking now, and I do have an idea. [17:08:58] I haven't used MassEditRegex at least if i have not for quite a while [17:19:28] PROBLEM - cloud3 Current Load on cloud3 is WARNING: WARNING - load average: 13.63, 13.40, 11.69 [17:20:21] PROBLEM - db12 Current Load on db12 is WARNING: WARNING - load average: 7.86, 7.59, 6.24 [17:21:25] RECOVERY - cloud3 Current Load on cloud3 is OK: OK - load average: 12.81, 12.92, 11.69 [17:21:34] PROBLEM - db13 Current Load on db13 is WARNING: WARNING - load average: 7.37, 7.93, 6.97 [17:23:25] [02miraheze/puppet] 07JohnFLewis pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JK3RF [17:23:27] [02miraheze/puppet] 07JohnFLewis 03fb79bea - dbbackups: drop to 1 thread + offset by 6 hours [17:26:17] RECOVERY - db12 Current Load on db12 is OK: OK - load average: 2.38, 5.64, 5.92 [17:29:34] PROBLEM - db13 Current Load on db13 is CRITICAL: CRITICAL - load average: 8.02, 7.48, 7.12 [17:31:32] PROBLEM - db13 Current Load on db13 is WARNING: WARNING - load average: 5.07, 6.75, 6.90 [17:33:33] RECOVERY - db13 Current Load on db13 is OK: OK - load average: 4.77, 6.17, 6.68 [17:48:00] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 5.75, 7.06, 7.95 [17:55:56] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.25, 7.85, 7.91 [17:57:54] JohnLewis: db load warnings are new [17:57:56] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.29, 7.66, 7.84 [17:59:55] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 9.35, 8.12, 7.97 [18:04:31] RhinosF1, paladox: https://github.com/Universal-Omega/PortableInfobox/pull/34 might resolve both the issues. Untested, but I'm hoping it does. [18:04:31] [url] Fix `DBUnexpectedError` exception by Universal-Omega · Pull Request #34 · Universal-Omega/PortableInfobox · GitHub | github.com [18:05:36] CosmicAlpha: thank you [18:06:45] No problem [18:08:23] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 8.51, 8.90, 5.86 [18:10:24] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 5.69, 7.57, 5.73 [18:12:22] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 4.36, 6.79, 5.70 [18:22:22] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 7.79, 6.85, 5.88 [18:24:24] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 8.38, 7.17, 6.10 [18:26:23] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 4.92, 6.19, 5.86 [18:31:56] [02miraheze/WikiDiscover] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-1 [+0/-0/±1] 13https://git.io/JK3PG [18:31:58] [02miraheze/WikiDiscover] 07Universal-Omega 0359cc194 - API: Replace deprecated constants [18:31:59] [02WikiDiscover] 07Universal-Omega created branch 03Universal-Omega-patch-1 - 13https://git.io/vhUAp [18:32:01] [02WikiDiscover] 07Universal-Omega opened pull request 03#71: API: Replace deprecated constants - 13https://git.io/JK3Pn [18:33:43] [02miraheze/WikiDiscover] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-1 [+0/-0/±1] 13https://git.io/JK3PV [18:33:45] [02miraheze/WikiDiscover] 07Universal-Omega 03e49a2d4 - Update dependencies [18:33:46] [02WikiDiscover] 07Universal-Omega synchronize pull request 03#71: API: Replace deprecated constants - 13https://git.io/JK3Pn [18:33:53] miraheze/WikiDiscover - Universal-Omega the build has errored. [18:36:28] miraheze/WikiDiscover - Universal-Omega the build has errored. [19:00:18] [02miraheze/WikiDiscover] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-1 [+0/-0/±1] 13https://git.io/JK3MQ [19:00:20] [02miraheze/WikiDiscover] 07Universal-Omega 039b05402 - Update ApiWikiDiscover.php [19:00:21] [02WikiDiscover] 07Universal-Omega synchronize pull request 03#71: API: Replace deprecated constants - 13https://git.io/JK3Pn [19:06:35] miraheze/WikiDiscover - Universal-Omega the build passed. [19:18:05] PROBLEM - db11 Current Load on db11 is WARNING: WARNING - load average: 7.10, 7.43, 7.96 [19:36:03] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 12.71, 6.92, 4.79 [19:37:42] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 8.54, 6.47, 4.64 [19:38:01] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 4.93, 6.19, 4.80 [19:39:38] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.00, 5.69, 4.56 [19:42:05] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 4.55, 5.91, 6.62 [19:48:11] [02WikiDiscover] 07Universal-Omega closed pull request 03#71: API: Replace deprecated constants - 13https://git.io/JK3Pn [19:48:12] [02miraheze/WikiDiscover] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JK37U [19:48:14] [02miraheze/WikiDiscover] 07Universal-Omega 03749ead5 - API: Replace deprecated constants (#71) [19:48:15] [02miraheze/WikiDiscover] 07Universal-Omega deleted branch 03Universal-Omega-patch-1 [19:48:17] [02WikiDiscover] 07Universal-Omega deleted branch 03Universal-Omega-patch-1 - 13https://git.io/vhUAp [19:51:50] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.44, 3.62, 3.08 [19:53:47] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.42, 3.41, 3.06 [19:54:44] miraheze/WikiDiscover - Universal-Omega the build passed. [19:55:42] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.01, 2.98, 2.95 [20:02:22] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_37 [+0/-0/±1] 13https://git.io/JK35C [20:02:23] [02miraheze/mediawiki] 07paladox 03b663623 - Update PortableInfobox [20:02:51] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_37 [+0/-0/±1] 13https://git.io/JK35l [20:02:52] [02miraheze/mediawiki] 07paladox 0331f816d - Update WikiDiscover [20:03:28] !log [paladox@test3] starting deploy of {'world': True, 'gitinfo': True} to skip [20:03:32] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:03:56] Thanks paladox [20:04:01] yw [20:06:24] !log [paladox@test3] finished deploy of {'world': True, 'gitinfo': True} to skip - SUCCESS in 175s [20:06:27] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:07:02] paladox: I hope that the PortableInfobox patch fixes your issue with MassEditRegex, though I'm not certain. I do think it might, since it fixes the fatal exception that is returned, preventing `endAtomic()` from being ran. Though I can't guarantee it. If not I don't know what would. [20:07:09] CosmicAlpha: can you test & move to the done list? [20:07:09] it works! [20:07:22] Just try add a file to a page [20:08:41] Oh great! Glad that's fixed then. Yeah with 1.37 it changed error handling to actual error on that issue. Though technically this has probably been broken in PortableInfobox forever, though wouldn't have fataled or interfered with MassEditRegex. [20:09:40] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_37 [+0/-0/±1] 13https://git.io/JK35H [20:09:42] [02miraheze/mediawiki] 07paladox 038aaee2c - Update SnapProjectEmbed [20:13:14] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_37 [+0/-0/±1] 13https://git.io/JK35x [20:13:15] [02miraheze/mediawiki] 07paladox 03645cbaa - Update SnapProjectEmbed [20:14:49] Marked as fixed [20:15:01] CosmicAlpha: can we get SpriteSheet deployed [20:17:29] [02SpriteSheet] 07Universal-Omega closed pull request 03#8: Replace usages of `$wgUser` - 13https://git.io/JKL64 [20:17:30] [02miraheze/SpriteSheet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JK3dG [20:17:32] [02miraheze/SpriteSheet] 07Universal-Omega 03d62639c - Replace usages of `$wgUser` (#8) [20:17:33] [02miraheze/SpriteSheet] 07Universal-Omega deleted branch 03Universal-Omega-patch-1 [20:17:35] [02SpriteSheet] 07Universal-Omega deleted branch 03Universal-Omega-patch-1 - 13https://git.io/JTPvX [20:18:22] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_37 [+0/-0/±1] 13https://git.io/JK3dl [20:18:23] [02miraheze/mediawiki] 07paladox 037f52203 - Update SpriteSheet [20:18:29] miraheze/SpriteSheet - Universal-Omega the build passed. [20:18:40] Oh thanks again paladox. [20:18:48] !log [paladox@test3] starting deploy of {'world': True, 'gitinfo': True} to skip [20:18:52] yw [20:18:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:19:12] RhinosF1: ^ [20:19:17] CosmicAlpha: testing [20:19:21] paladox: tyvm [20:21:34] !log [paladox@test3] finished deploy of {'world': True, 'gitinfo': True} to skip - SUCCESS in 166s [20:21:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:22:09] CosmicAlpha: https://test3.miraheze.org/wiki/File:GlobalNewFiles_test8.png works [20:22:10] [url] File:GlobalNewFiles test8.png - Test3 | test3.miraheze.org [20:24:06] :) [20:24:11] 2 blockers fixed [20:24:29] 3 to us [20:24:30] [02miraheze/ManageWiki] 07Universal-Omega pushed 031 commit to 03Universal-Omega-patch-1 [+0/-0/±1] 13https://git.io/JK3db [20:24:31] [02miraheze/ManageWiki] 07Universal-Omega 03fc2651e - API: Replace deprecated constants [20:24:33] [02ManageWiki] 07Universal-Omega created branch 03Universal-Omega-patch-1 - 13https://git.io/vpSns [20:24:34] [02ManageWiki] 07Universal-Omega opened pull request 03#304: API: Replace deprecated constants - 13https://git.io/JK3dN [20:24:39] 6 so far left [20:25:31] Great! [20:25:55] https://graylog.miraheze.org/messages/graylog_332/e0cc5f30-2c62-11ec-a12b-0200001a24a4 [20:25:56] hmm [20:26:17] oh i guess that was before the deploy finished [20:30:42] miraheze/ManageWiki - Universal-Omega the build passed. [20:32:18] [02ManageWiki] 07Universal-Omega closed pull request 03#304: API: Replace deprecated constants - 13https://git.io/JK3dN [20:32:20] [02miraheze/ManageWiki] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JK3Fl [20:32:21] [02miraheze/ManageWiki] 07Universal-Omega 0356026ad - API: Replace deprecated constants (#304) [20:32:23] [02ManageWiki] 07Universal-Omega deleted branch 03Universal-Omega-patch-1 - 13https://git.io/vpSns [20:32:24] [02miraheze/ManageWiki] 07Universal-Omega deleted branch 03Universal-Omega-patch-1 [20:35:59] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 8.02, 6.88, 5.01 [20:36:55] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 14.85, 8.58, 5.91 [20:37:03] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_37 [+0/-0/±1] 13https://git.io/JK3F6 [20:37:04] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 14.80, 8.14, 5.63 [20:37:05] [02miraheze/mediawiki] 07paladox 03032c42b - Update ManageWiki [20:38:03] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 3.70, 5.58, 4.77 [20:39:02] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 6.87, 7.98, 5.91 [20:39:06] miraheze/ManageWiki - Universal-Omega the build passed. [20:40:56] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 4.34, 7.07, 6.00 [20:40:59] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 4.61, 6.71, 5.70 [20:42:56] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 3.54, 5.89, 5.70 [20:57:42] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.00, 3.46, 3.27 [20:59:38] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.91, 3.35, 3.25 [21:03:28] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.46, 3.36, 3.26 [21:04:45] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 8.47, 6.34, 4.67 [21:05:23] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.03, 2.88, 3.10 [21:05:45] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 14.91, 8.56, 5.89 [21:06:12] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 23.77, 19.73, 16.03 [21:06:50] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.37, 5.90, 4.72 [21:06:59] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 11.36, 8.61, 6.32 [21:07:46] PROBLEM - mw12 APT on mw12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:09:42] RECOVERY - mw12 APT on mw12 is OK: APT OK: 17 packages available for upgrade (0 critical updates). [21:12:16] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 11.28, 17.22, 16.43 [21:13:41] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 4.02, 7.20, 6.97 [21:14:53] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 5.71, 7.89, 7.16 [21:15:39] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.13, 6.60, 6.77 [21:18:54] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 4.34, 6.02, 6.56 [21:28:32] [02miraheze/mw-config] 07RhinosF1 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JK3xq [21:28:34] [02miraheze/mw-config] 07RhinosF1 032268e34 - LS: remove PasswordCannotMatchUsername, superseded by PasswordCannotBeSubstringInUsername [21:29:14] !log [rhinos@test3] starting deploy of {'config': True} to skip [21:29:15] !log [rhinos@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [21:29:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:29:22] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:29:44] miraheze/mw-config - RhinosF1 the build passed. [21:36:50] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 13.56, 8.84, 6.31 [21:36:51] !log [@mw11] starting deploy of {'config': True} to all [21:36:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:37:03] !log [@mw11] finished deploy of {'config': True} to all - SUCCESS in 12s [21:37:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:38:48] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 5.06, 6.99, 5.92 [21:40:44] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 3.29, 5.73, 5.59 [23:04:51] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 12.25, 7.81, 5.14 [23:04:57] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 12.89, 7.53, 4.90 [23:05:47] PROBLEM - mw8 Current Load on mw8 is CRITICAL: CRITICAL - load average: 12.27, 7.99, 4.88 [23:05:49] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 13.31, 8.32, 5.13 [23:05:55] PROBLEM - mw11 APT on mw11 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:06:46] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 28.72, 21.31, 15.53 [23:07:21] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 12.70, 10.29, 6.40 [23:07:38] PROBLEM - cloud5 Current Load on cloud5 is CRITICAL: CRITICAL - load average: 22.69, 24.96, 18.24 [23:07:47] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 4.87, 6.71, 4.80 [23:07:59] RECOVERY - mw11 APT on mw11 is OK: APT OK: 17 packages available for upgrade (0 critical updates). [23:08:45] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.97, 20.98, 16.08 [23:09:39] PROBLEM - cloud5 Current Load on cloud5 is WARNING: WARNING - load average: 18.60, 22.14, 17.99 [23:09:53] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 3.94, 7.04, 5.49 [23:10:43] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 12.20, 17.80, 15.52 [23:10:56] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 2.89, 6.44, 5.54 [23:11:37] RECOVERY - cloud5 Current Load on cloud5 is OK: OK - load average: 14.32, 19.48, 17.53 [23:11:51] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 3.46, 5.86, 5.25 [23:12:54] PROBLEM - mw10 Current Load on mw10 is WARNING: WARNING - load average: 4.79, 6.89, 6.09 [23:13:11] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 3.65, 6.75, 6.20 [23:14:52] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 3.27, 5.55, 5.69 [23:35:39] PROBLEM - mw12 APT on mw12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:36:44] PROBLEM - mw11 Current Load on mw11 is CRITICAL: CRITICAL - load average: 13.72, 8.10, 5.92 [23:36:46] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 8.23, 6.57, 5.36 [23:36:57] PROBLEM - mw10 Current Load on mw10 is CRITICAL: CRITICAL - load average: 10.59, 8.11, 6.14 [23:37:34] RECOVERY - mw12 APT on mw12 is OK: APT OK: 17 packages available for upgrade (0 critical updates). [23:38:40] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 5.53, 6.97, 5.78 [23:38:43] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.00, 5.78, 5.20 [23:38:57] RECOVERY - mw10 Current Load on mw10 is OK: OK - load average: 4.79, 6.67, 5.84 [23:40:35] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 3.96, 5.96, 5.55