[10:09:41] @paladox / @abaddriverlol any idea what caused that outage? [10:14:27] https://issue-tracker.miraheze.org/T14732 [11:41:03] not sure, but we should really consider setting up DB slowlogs (with logrotate so it doesn't take up too much disk space) so we have a general idea what's causing issues [11:44:47] db/appserver load at the time looks fine though [11:45:11] [1/2] mw171 was the first one that went down, and load didn't increase at all [11:45:11] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1455164743726927914/image.png?ex=6953bb46&is=695269c6&hm=fa37b195b6908603b54decae7d8b6426dae986d4a343e45759382fbc153e929e& [11:46:18] [1/2] Process utilization went up to 100% shortly before [11:46:18] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1455165027475521648/image.png?ex=6953bb8a&is=69526a0a&hm=6e7655db12ffa26a4d33f5de4982b119c421886c2e10ace865aa4317c765d1b8& [11:48:09] journalctl on mw171 has no entries for that time :| [11:54:39] https://cdn.discordapp.com/attachments/1006789349498699827/1455167126632665199/image.png?ex=6953bd7e&is=69526bfe&hm=da4a6d6df47e74ab765f5d9dc65d7e6697e7c0cf7e14e760ed364c41a4a7402c& [11:58:26] [1/2] beta also went down for some reason...? [11:58:26] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1455168078076837988/image.png?ex=6953be61&is=69526ce1&hm=e81eb812a1bd07e7a9771b4775eb51ec9e2ae34189d01de71b3f4de87cd418e2& [12:07:54] I feel like this was a network issue or something like that, I didn't find any servers with increased load; mem* also seemed fine, there were just fewer queries; the DBs were fine; icinga also was fine since the request duration really went up [12:08:27] Also there is data missing from that time period in grafana, but the relevant servers didn't crash/restart [12:10:41] [1/6] In an unrelated note https://issue-tracker.miraheze.org/T14463 is getting out of control with 50+ wikis having a bad custom domain setup. I think RequestCustomDomain can be modified so that it periodically scans for conditions that make the custom domain valid: [12:10:41] [2/6] 1. DNS works as expected. [12:10:41] [3/6] 2. The wiki still exists (is not deleted). [12:10:42] [4/6] 3. Anything else required for the custom domain to continue functioning. [12:10:42] [5/6] If any of the requirements are not met, the wiki will be add to a database table. Admins can view the content of this table on page `Special:RemoveCustomDomains`, where they can either one-click remove a custom domain (and redirect?) or one-click flag a wiki as a false positive. [12:10:42] [6/6] Depending on the false positive rate the process can be fully automated at some point. [12:15:13] That is planned [12:15:20] For @pskyechology [12:15:43] I can probably look in the new year [12:16:51] I was tempted to give it a try but have no means to test it. Glad that others are taking care of this. [12:35:35] I'll try and purge broken ones in the new year [12:36:11] If you could use https://github.com/miraheze/ssl/blob/main/wikidiscover_output.yaml to figure which wiki it matches too, that would make life easy [12:36:22] As I will need a list of wikis to remove custom domains from [12:36:27] Otherwise I'll build a script [12:37:29] I think automating custom domain removal is one of @pskyechology's internship tasks [12:39:27] Blaming networking is quite a good idea [12:39:34] If we have no better answer [12:39:45] It seemed to affect everything [12:39:52] So it's either something at the cloud layer [12:39:56] Or networking could be [12:40:05] yes [12:40:06] I'd expect networking to cause more alerts though [12:40:15] Like a loss of pdns @abaddriverlol [12:40:15] I don't see any other possible reason [12:40:23] Nothing was logged in journalctl [12:40:28] ig that's an option as well [12:41:57] I didnt see any pdns alerts though [12:42:10] The only thing I saw was every db seemed to have slow logs [12:42:31] Slow logs shouldn't be caused by network [12:43:09] Networking is good thing to blame cause it's none of the usual suspects [12:43:22] But I would have expected a networking issue to manifest differently [12:44:23] [1/2] db172 had 2 open connections at one point [12:44:23] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1455179643396292892/image.png?ex=6953c926&is=695277a6&hm=ee4ba64b6a7e93a0a1034862d95b2925a3c2ac69c44c221402001e0426cdeb43& [12:44:30] and test151 went down as well [12:45:00] But I don't see any other reason why test151 would be unreachable for icinga if its load was fine [13:21:17] [1/53] ``` [13:21:18] [2/53] 1455wikiwiki [13:21:18] [3/53] agent1wiki [13:21:18] [4/53] airwiki [13:21:19] [5/53] armoredpatrolremasteredwiki [13:21:19] [6/53] beidipediawiki [13:21:19] [7/53] bunnypranavwiki [13:21:19] [8/53] cdfwiki [13:21:20] [9/53] copper9archiveswiki [13:21:20] [10/53] cs2kowiki [13:21:20] [11/53] darkwaterswiki [13:21:21] [12/53] ddvhwikiwiki [13:21:21] [13/53] disappearingmomentwiki [13:21:22] [14/53] dwarvesrpwiki [13:21:22] [15/53] dysonsphereprogramwiki [13:21:23] [16/53] earthsupremacywiki [13:21:23] [17/53] eggnoswiki [13:21:24] [18/53] estillhistorywiki [13:21:24] [19/53] forestwiki [13:21:25] [20/53] frankfrankultimatewiki [13:21:25] [21/53] globaleducationwiki [13:21:26] [22/53] hacklabsrwiki [13:21:26] [23/53] hymnrowiki [13:21:27] [24/53] ipcamswiki [13:21:27] [25/53] justarandomamericanwiki [13:21:28] [26/53] kwikiwiki [13:21:28] [27/53] lifesprogresswiki [13:21:29] [28/53] lostcompasswiki [13:21:29] [29/53] mtaewiki [13:21:30] [30/53] nazarewiki [13:21:30] [31/53] nefuwikiwiki [13:21:31] [32/53] oasismpwiki [13:21:31] [33/53] pokrewwiki [13:21:32] [34/53] prisonerlifewiki [13:21:32] [35/53] queenscourtgameswiki [13:21:33] [36/53] reactorwiki [13:21:33] [37/53] sanarsivwiki [13:21:34] [38/53] sanarxivwiki [13:21:34] [39/53] sicksciencewiki [13:21:35] [40/53] simpleelectronicswikiwiki [13:21:35] [41/53] simplfiiedcodingwiki [13:21:36] [42/53] starworldwiki [13:21:36] [43/53] stepanovisvwiki [13:21:37] [44/53] stepanovwiki [13:21:37] [45/53] stretwikiwiki [13:21:38] [46/53] takagisanwiki [13:21:38] [47/53] truevanillawiki [13:21:39] [48/53] umerezwiki [13:21:39] [49/53] unixlabwiki [13:21:40] [50/53] vitriolwiki [13:21:40] [51/53] wamiwiki [13:21:41] [52/53] wikisaucewiki [13:21:41] [53/53] ``` [13:22:01] Pop it on the task [13:27:13] Done [17:47:07] one of those I know we deleted anyway [18:02:54] If you see a custom domain when deleting a wiki, do ping us [18:02:59] It's nice for us to know that [18:07:06] It does in theory flag in a report [18:07:12] But that requires us to read it [18:07:27] Or us to automate acting on it