[07:36:54] godog: there's a lot of root cron noise from thanos-fe1001 swift-account-stats failing to auth correctly. [07:55:35] Emperor: ack, I'll take a look [10:45:59] And another "installer trashes an hdd, now you must wait 8 hours for it to backfill" [11:08:33] https://phabricator.wikimedia.org/P27770 awwww [11:31:03] PROBLEM - Check unit status of swift_ring_manager on ms-fe1009 is CRITICAL: CRITICAL: Status of the systemd unit swift_ring_manager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:16:47] RECOVERY - Check unit status of swift_ring_manager on ms-fe1009 is OK: OK: Status of the systemd unit swift_ring_manager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:37:22] marostegui: hmm? [12:37:53] just scanning 97M rows, isn't that lovely! [13:23:43] marostegui: holy moly, what is the query? [14:24:19] Amir1: qq, do you know: is there a page content size limit in external storage or anywhere? [14:24:39] not in ES but inside mediawiki I think it's 2MB [14:26:02] ottomata: let me grab the exact value but ES itself should not have any limit but mw tries (emphasize on try) to impose some limit [14:30:28] aha it's $wgMaxArticleSize [14:32:39] yup, 2MB [14:33:39] but it's also used for image metadata that has values way over that, last time I checked the biggest one was 17MB [14:37:53] oh, 2MB is great [14:38:04] image metadata 17MB? [14:38:08] obviously other limits come first, but I think technically no more than 4GB can be stored on a longblob [14:38:09] what is that metadata? [14:38:24] jynus: 4MB is also great, i'm asking and comparing to our current kafka message size limits [14:38:27] which are 4MB [14:38:45] but, iiuc there are pages with content that don't fit in that 4MB limit (maybe this is not true) [14:38:50] answering the "no limit inside ES" [14:38:53] (or maybe, that is refering to html rendered contentt) [14:38:58] jynus: right, okay [14:39:06] the MW limit is probably more useful for my purposes anyway [14:39:19] for parsed content, I would look at memcache/parsercache [14:39:40] Amir1: is wgMaxArticleSize the same for all wikis? what about wikidata? [14:40:24] ottomata: That's 4GB :D [14:40:44] we migrated image metadata to for pdf and djvu files to ES [14:40:50] https://phabricator.wikimedia.org/T275268 [14:40:59] It actually unblocked sqooping of image table [14:41:21] T285783 [14:41:22] T285783: Sqoop image metadata - https://phabricator.wikimedia.org/T285783 [14:42:04] let me check the value for all wikis [14:43:11] parsercache's max storage size is 16MB, if parsed data is wanted [14:46:05] I can't find any diff values in other wikis [14:46:34] don't see a dedicated max size in wikibase settings either [14:50:09] jynus: thank you [14:50:27] Amir1: what was 4GB?! [14:52:48] longblob limit jynus was saying [15:24:39] ah parser cache you mean? [15:32:09] parsercache is 16MB it seems. the MySQL limit of lonblob (where ES is using it), is 4GB [15:32:23] ottomata: but in reality nothing can get this big [15:34:29] ah okay. [15:35:02] so ES storage limit is 4GB, but all other limits are software enforced [15:35:28] milimetric: was tetlling me that $wgMaxArticleSize is not always the correct answer, there are cases there this is not true? not sure if I understtood that. [15:35:51] ottomata: are you looking at regular page wiki text, or some other cases? [15:36:09] currently am considering page content for any wiki [15:36:27] not-rendered, just wikitext or whatever [15:36:29] as ES wiki is a more or less generic "content" store, like for image metadata [15:36:48] right, so not necessarily wikitext, but whatever the content is for a page [15:37:21] yeah, that is the problem- with multicontent, that may be blurred [15:37:31] and, it sounds like the ES connection is actually irrelevant; we won't be gettting this content directly from ES, we'd just get it from mw. we just need to know what the max content size mw will ever give us is [15:37:32] even if not in use right now [15:38:12] the 2MB of wikitext is probably the best ansewer you will get right now [15:38:32] but it could change in the future or have edge cases [15:39:23] as a page could "contain" multiple types of data [15:39:26] so the mentioned 16MB limit is not for 'page content' (commons?) [15:39:37] oh MCR? [15:40:03] the 16 MB is the hardcoded limiit for parsed html (it may be smaller, but that is a hard limit on the db) [15:40:53] oh parsed html okay [15:41:00] I was going to say that data engineering may have more stats, but then I realized... [15:41:18] oh right i was wondeirng about the 17MB [15:41:19] > but it's also used for image metadata that has values way over that, last time I checked the biggest one was 17MB [15:41:22] amir mentioned [15:41:36] heheh yes joseph was going to look that up :) [15:41:58] the 17MB is probably the limit of metadata stored on ES- so derived data from files [15:42:19] but not wiki page content, technically [15:42:32] ahhh okay, that sounds okay then. [15:42:47] maybe it will be relevant one day, but probably not for this 'content stream' we are experimenting with [15:42:54] thank you. [15:43:14] sorry to not give a definitive answer, more like "here be dragons" ;-) [15:43:37] the dragons were expected :) [15:43:41] as in the past we found out "what, we store this kind of content?" too late 0:-) [15:44:27] when i started at wmf 10 years ago, i remember going something that a lot of newbies go through: "Wait, I can write a better dumps generator lemme just....OhHhhhh uhh nevermind see ya later" [15:49:10] mmmm, you may not like this [15:49:33] but I got this result: wikidata: COVID-19 pandemic in Colombia (Q87483673) ‎[4,349,121 bytes] [15:50:32] ah well, that should fit the 4MB limit, so seems ok [15:58:28] hehe, i think david also mentioned something about a large covid article [15:58:46] we can increase the kafka limit if needed (currently 4MB), we just need to be aware of how and why we do it [15:59:19] so if it is for editting messages [15:59:29] that is probably not going to change soon [16:00:08] but I cannot forsee how MCR will change what content is [16:00:14] *foresee [16:00:20] aye [16:00:24] yeah we should investigate that [16:47:19] !bash ottomata > when i started at wmf 10 years ago, i remember going something that a lot of newbies go through: "Wait, I can write a better dumps generator lemme just....OhHhhhh uhh nevermind see ya later" [16:47:19] Amir1: Stored quip at https://bash.toolforge.org/quip/tZm3qYABa_6PSCT91onq [16:49:27] ottomata: FWIW, I went through the same phase as well :D [16:59:12] it's more exciting when a new CTO goes through that phase of assuming that everyone who has been working on the projects for the last 20 years is dumber than they are. ;) [17:04:56] hah