[08:43:35] @paladox https://github.com/miraheze/mw-config/pull/5356 [08:50:47] @paladox also https://github.com/miraheze/mw-config/pull/5245/files [09:19:57] Also https://github.com/miraheze/mw-config/pull/5357 [15:54:10] @paladox please add #announcements for the 3rd in case of user VE errors [16:02:41] @paladox preWarm jobs are taking up a lot [16:03:15] do note that job is on every edit [16:03:20] on every wiki with VE [16:03:25] i know [16:03:26] eventually every wiki [16:03:46] @paladox we need to be able to run it on every wiki after every edit [16:03:49] like asap [16:07:34] @paladox how is this going to work on all wikis? [16:07:40] by whenever 1.42 comes out [16:08:03] given Parsoid will be a beta feature then [16:08:20] by 1.41, we need it as a developer tool on all wikis for reads [16:09:02] we can't even warm the cache [16:09:32] well then we carn't have it simple as with warmCache, [16:09:40] i'm gonna switch warm cache off [16:10:46] done [16:14:14] we can't deploy 1.41 globally without warmCache enabled [16:14:26] we carn't have warmCache [16:14:58] It isn't optional [16:14:58] works on test131 so... [16:15:16] There is zero chance we are going to be able to handle performance as parsoid rolls out [16:16:06] we saw how slow things got with SCSVG on a cold cache [16:16:23] well we either have a bug jobqueue and deal with it or we carn't and carn't upgrade. [16:16:41] parsercache has it's own group so own runner [16:17:13] we'll need more runners than 1 [16:18:21] we don't have the resources to do that. mwtask only has 4 cores and it's already quite conjested. We have a lot of runners with different groups already. I can increase to 2 but we carn't have many. [16:22:28] @paladox we likely need 4 based on my estimations. We can try with 2 and see if it keeps climbing. [16:22:50] ok [16:22:55] but can we not increase cores/memory or another mwtask [16:23:09] 4 cores to handle parsing for 7k wikis is not a lot [16:33:03] @paladox ye this is a disaster [16:33:28] Can we borrow some resources from Cloud12? Its only running at 13% per Grafana [16:33:40] it needs to process an extra about 350 jobs a minute [16:33:50] we need more capacity for intensive jobs [16:33:56] What’s going on with VE?.. [16:34:28] @pixldev we've deployed a change to how it accesses the parser [16:34:32] has something broke? [16:34:57] Ah, unforeseen bug? [16:35:00] 🐛 [16:35:07] @pixldev no, it is planned [16:35:14] Ah [16:35:17] it is prepartion for 1.41 [16:35:30] Ah [16:35:43] VE no longer uses the rest interface because parsoid is becoming part of core [16:35:52] I would have thought 1.41 was still a while out [16:35:58] no, fairly soon [16:36:08] Parsoid is a new way of parsing content [16:36:25] With 1.41, you'll have a new tool on your wikis to see how content will be parsed in future. [16:36:38] Interesting [16:36:45] With 1.42, it will be a beta option and potentially on by default on some wikis [16:36:59] and from 1.43 onwards, slowly rolling out [16:37:17] currently we are struggling to warm cache [16:37:35] @paladox can we leave the jobs a bit and see how many are from reads and how many are from edits [16:37:49] the backlog might go insane but every read and every edit is generating at the moment [16:37:54] it'll be every edit soon [16:37:58] as the cache will be warm [16:38:00] Ah thanks. I may not understand everything fully but appreciate it regardless [16:38:45] @paladox I say we wait a few hours and see how bad it gets. [16:38:51] ok [16:38:55] or at least leave it on for until you sleep tonight [16:39:10] but I do think we'll need more capacity eventually @paladox [16:39:26] is there any scope for more cores / memory / another task instance [16:40:16] We've oversubscribed the cores already. Whilst we have the memory, we don't have the disk space [16:40:19] @paladox unless I say, please leave the job until you sleep tonight but please make sure the last thing before you don't have access for the night is disabling it. [16:40:35] disabling? [16:40:57] yes, do not leave preWarm on without a sysadmin with access available within 1-2 hours. [16:41:09] until we're happy with performance [16:41:18] I want to see if it is just from it being cold [16:41:25] or geniunely this terrible [16:41:29] ok [16:41:50] @paladox can we get more disks? [16:42:01] do we have the space to buy more? how much would they cost? [16:42:57] i don't know right now. Priority is fixing cloud14 disk as it's slow. @owenrb ordered the disks but i don't know what's happening with them rn. [16:43:05] @orduin [16:47:06] @paladox can you manually run jobs on bluepages wiki [16:47:21] of course the wiki with 414k pages is the largest contributor to the backlog [16:48:09] > Fatal error: Allowed memory size of 157286400 bytes exhausted (tried to allocate 20480 bytes) in /srv/mediawiki/w/vendor/wikimedia/parsoid/src/Wt2Html/Grammar.php on line 7000 [16:48:10] hmm [16:49:24] @paladox if that's on bluepages, just run it infinite memory [16:51:37] _is pretty sure bluepageswiki is the issue_ [16:52:39] _is also fairly confident it won't be anywhere near as bad as it looks when the cache isn't completely cold_ [16:55:38] Disks were ordered but no plans to go down - especially now with the plans for Miraheze Limited to cease with Miraheze. Last I heard plans were to look at moving to cloud infrastructure but that was back in June [16:58:01] I thought the outcome was that Cloud was too expensive? [17:03:47] @paladox I am seeing it settle a bit [17:03:56] so that makes me think it's not too bad [17:04:07] but we may need to think about memory limits [17:04:22] let's keep monitoring as it tackles the inital backlog [17:09:07] Unfortunately, Mediawiki is being a bit shitty and changing parser which means every single page on every single wiki has to be reparsed. [17:11:02] @paladox how big is the parser cache table on db131 [17:11:16] that's not where it's being saved [17:11:28] it's being saved i think per db? [17:11:32] it's using db-replicated [17:13:10] @paladox the data is still useful [17:13:43] [1/2] > root@db131:/srv/mariadb# du -sh parsercache [17:13:43] [2/2] > 22G parsercache [17:13:45] it's 22g??? [17:14:05] wow [17:14:09] that's big [17:14:14] we've got 31G left on db131 [17:14:28] we might want to decrease retention [17:15:14] as parsoid parser cache will be similar size [17:15:50] jobs aren't uncontrollable [17:16:23] rainworldwiki is the next possible problem @paladox [17:21:07] @paladox i'm pretty sure this will work [17:21:21] if we get control of the wikis that are huge [17:22:11] i will tell you if we can leave it in place by 9pm [17:22:50] but manual runs of the wikis on https://grafana.miraheze.org/d/GtxbP1Xnk/mediawiki?orgId=1&from=now-1h&to=now&viewPanel=59 would be good @paladox [17:22:57] i am [17:23:00] only on a few [17:23:11] ok [17:39:52] Any tips on what I should start to learn to be an MediaWiki Engineer? [17:41:11] @songngu.xyz how to fight the job queue [17:41:33] @paladox newusopediawiki is a problem [17:42:45] @songngu.xyz the job queue is genuinely an awful pile of [17:43:28] like for the jobrunner? [17:44:30] yes [17:44:38] it is horrid software [17:44:47] that @paladox is currently fighting to control [17:44:51] and it's semi working [17:45:03] ok... [17:45:52] well I know the basic things and stuffs, just a little bit more in the rabbit hole like that [17:47:16] i get "Redis server error: socket error on read socket" quite frequently and i don't know how to fix [17:47:32] it means redis is starting to hate you [17:47:38] and you need to run them a lot faster [17:48:00] get the script running in a loop if you can [17:48:06] so it doesn't get killed [17:48:17] <:skull_c:1137720188607926322> [17:48:38] at least I will only learn to be a MWE- [17:49:27] @paladox pokeclickerwiki and newusopediawiki are the biggest [17:49:32] tackle them first [17:50:39] i don't see memory high on jobchron which is good [17:52:25] @paladox whatever you just did, keep doing it [17:52:37] levels are going down [17:52:38] https://github.com/miraheze/puppet/compare/9c8db35accd9...a146170ee0e2 [17:53:13] @paladox that may have worked [17:53:19] i'm nipping to the shop [17:53:23] hopefully it lasts [17:54:03] i'm off to dinner [17:54:38] Ok [17:55:03] If it carries on trending down, we can leave it overnight [17:55:24] As long as it's mostly down or stable, it's fine [18:08:20] it is much flatter [18:08:26] i will check again at 7 [18:13:08] @Stewards @originalauthority says avid can be deleted as they moved [18:14:05] I would recommend it; there's no point in warming the cache for that wiki particularly since it's moved to WikiTide; there have been no recent edits. [18:14:47] @originalauthority I mean it should have community consent [18:15:25] We have to warm the cache for all wikis [18:15:32] We can't close them for that sole reason [18:18:12] Either community consent or actual dormancy enforcement [18:34:51] We're hosting them on WikiForge, in fact. [18:35:38] same difference [18:38:41] Only mention because they generated enough traffic to require their own personal paid server. Deletion could be a substantial savings for MH. 🙂 [18:40:39] @notaracham we shouldn't charge some wikis [18:40:47] Certain ones have a half a million pages [18:41:03] looks like they're down anyway btw [18:41:17] 90% automated [18:41:43] Up on my side. 🤷 [18:42:01] https://www.avid.wiki/Special:RecentChanges works for you? [18:42:13] Yep [18:42:35] interesting [18:42:48] Oh for sure, that one that uses MH as a backend for their game is a fascinating use case [18:43:03] And for me [18:43:55] works now [18:44:08] and down again [18:44:16] that's fin [18:44:47] same here [18:47:59] @paladox i think we should turn it off for bluepageswiki unless you can control it [18:48:30] Pretty much happening for a lot of big wikis [18:48:48] Don’t think switching it off for bluepages will fix it [18:48:53] the other big wikis are nowhere near as big @paladox [18:49:13] I mean [18:50:47] @paladox yes but in terms of pages [18:50:55] bluepages + att are the biggest [18:51:01] i don't think we can handle warm up [18:51:01] the number of pages is more important [18:51:09] we'll have to disable [18:51:16] @paladox We can, just not all at once [18:51:20] it keeps getting killed for bluepages [18:51:24] can we try turning bluepages + att [18:51:26] off [18:51:29] ok [18:53:17] @paladox are you doing a patch for them wikis or me? [18:53:57] me [18:54:07] ok [18:54:16] We're still a long ways off from formally considering cloud infra. If you could get the disks down to the DC (even if you have to mail them, and we get remote hands to install them), that would be great. [18:54:20] let me know once you've done @paladox [18:55:32] @paladox that patch looks no-op, $disableWarmup isn't used [18:57:01] done [18:57:59] let's see then [18:58:06] huge numbers are expected [18:58:11] we just need them controllable [18:58:30] @paladox can you do your best to run the existing jobs for ATT/bluepages [18:58:38] yes [19:03:21] it's looking fairly stable at moment [19:10:37] @paladox it's dropping [19:10:46] slowly but it is [19:11:32]