[15:05:16] Vulpix: looks like with the increasing of the rootjobs, it leads to increased cpu / latency and also more memory usage (although the later isn't as much of a problem as the former). Looking at some of the other classes db for instance, it doesn't set a rootjob key in the db. So I think that's how this bug came about. [15:08:31] paladox: the 28000+ rootjob entries I have in redis don't take more than 10MB of memory [15:09:35] But yes, having the job queue in redis has a very complex logic not present when using the database for jobs (the default) [15:10:22] including the requirement of a dedicated jobrunner service to manage the queue itself [15:11:34] does the increased complexity solve any problem or achieve something useful? I can't tell you, though :D [15:14:43] I wouldn't bother unless I have multiple separate jobrunner containers/vms that would be causing lock contention if a db was used for the job queue [15:17:38] I'm not worried with lock contention on the database, but disk I/O. My database is on a spinning disk, and anything I can do to reduce disk writes would be an improvement [17:13:13] In theory, how would you shard MediaWiki databases on large wikis? [17:15:01] https://www.mediawiki.org/wiki/Manual:Shared_database [17:16:03] thats shared tables, not sharded tables [17:17:44] Eytirth: https://www.mediawiki.org/wiki/Manual:External_Storage [18:06:55] Eytirth: in theory or practice there is not currently a way to shard tables for a single wiki. External Storage can spread out the load of revision content horizontally, but other things have no partitioning function in MediaWiki's core. [18:08:19] It might not be horribly difficult to add support for sharding by namespace in some places. I don't personally know any wikis that would actually help though. [18:09:32] A common shard discriminator is "customer" in multi-tenant systems, but that is also a mostly useless distinction in the wikis I interact with [18:10:43] What we do support is read vs write partitioning. That can be used to add a number of read copies to spread load over more cores and ram, but all copies are still full copies. [20:25:27] How does Wikipedia's databases work? [20:29:05] Vulpix: don't know about this: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/972717 [20:34:27] paladox: have you looked if there's any other place where root job cache keys get deleted? I guess it's fine to delete them, but it would be good to see if deletion should take place somewhere else and it's not being done because of some logic error, or because nobody though about deleting them (which I doubt it could be missed) [20:34:53] No, looking I can only see a get/set for the rootjob. [20:35:06] I don't see any where, where it tries to delete it [20:35:49] According to the description at the top of the file, a root job is supposed to be removed when the job is completed. [20:36:01] the ack() description is of it when it is completed [21:13:50] Ok, It would be interesting to see if such a deletion existed back in the day when redis was used by WMF and it was *deleted* :) it seems like a very obvious omission to be unnoticed for so long [21:35:51] Vulpix: seems to work now with that patch.