[03:34:51] TimStarling: I'm looking into our MimeAnalyzer, in context of T291750. My understanding is that it is entirely intended for our built-in detection to override that of php-finfo given known bugs there in the past. It seems for example that php's effective mime handling (deferred to OS or not?) of zip/office files was worse than ours (possibly still is worse) and so it made sense for us to override that. What I'm less clear on is whether [03:34:51] this still makes sense in 2021, and whether we have a way for site admins to (temporarily) override this. The code comments suggests that "guessCallback" (hook onMimeMagicGuessFromContent) is where extension can handle the case where "core is wrong about a type (false positive)". [03:34:51] T291750: Docx files created using LibreOffice are incorrectly detected as zip files - https://phabricator.wikimedia.org/T291750 [03:35:05] But.. it seems in practice this hook is not reached most of our code paths return early [03:35:37] so afaik one could not use onMimeMagicGuessFromContent to, for example, swap $mime of application/zip to bool(false) in order to let it fallback to mime_content_type in the next step. [03:35:53] or even call mime_content_type directly etc. [03:36:15] for ZIP it's not really a matter of finding out what type a file is, since a file can have multiple valid types [03:36:40] I also could not find an earlier hook or config var through which one could one e.g. force certain file extensions or guessed mimes to false and/or otherwise fully handle them through a hook by calling mime_content_type directly in local settings hook. [03:38:02] I think our handling of zip files is likely to be better than any built in thing -- most file type detection methods are trying to identify the type of a file, not scan it for security issues [03:39:25] that phab link for me says "Error: 502, Next Hop Connection Failed at 2021-10-06 03:38:36 GMT", phab seems to be down [03:40:04] up for me. might be specific to your POP? [03:40:28] most likely specific to eqsin [03:40:42] loaded now (but slow) [03:43:13] I might write a high-level description for the MimeAnalyzer class capturing some of this, e.g. explaining its primary and secondary objectives and such. [03:43:16] detectZipType() is not my code, so I can't really speak for it [03:43:50] this should use ZipDirectoryReader, which is my code [03:46:20] assuming Content_Types.xml is supposed to be a file in the archive, ZipDirectoryReader is definitely the right way to find it, it shouldn't just be running regexes over unparsed data [03:47:17] I thought this shared code with MSCompoundFileReader but it doesn't. [03:49:47] ok, so microsoft's docx and "open office doc" are actually comparable. I didn't realize they used the open format in later office versions. [03:50:48] I knew both used docx file extension but had never thought about them at the same time. [04:06:07] ah, not exactly, despite containing the words "open" and "office" in "Open Office XML" this is not the "Open Office" format but the "open" "Office XML" format. [04:06:30] open/libreoffice uses .odt [04:14:19] Pchelolo: *waves* after this, what's next? https://gerrit.wikimedia.org/r/c/mediawiki/core/+/725441/ It looks like the thumb size, that can be post cache like redner? [04:14:22] *render [04:15:21] Amir1: thumb size had a lot of comments not to remove it T284920 [04:15:22] T284920: Remove "thumb size" preference - https://phabricator.wikimedia.org/T284920 [04:15:48] and before we expand that parameter list in ParserOutput::getText we probably need a better plan [04:15:49] Pchelolo: yeah but we can pull off another action=render with it [04:16:21] I really want to have a 'fake' parser option [04:16:55] like, you have a parser option that's not applied during parse, but 'wrapped' into a returned ParserOutput and applied when 'getText' is called [04:17:26] but that's very theoretical now. [04:18:30] yeah [04:18:41] let me know if you need help on moving that forward [04:18:46] do we have numbers on how common parser-related preferences are? [04:18:56] my guess is the most common is user language. [04:19:33] as someone browsing with en-gb on most wikis, I generally always get a parser cache miss [04:21:17] are we worried about storage, attack/security or perf? [04:21:36] hmm, now thinking, this only matters for multilinugal wikis like meta, commons, wikidata. We probably should be able to safely delete user from parsercache [04:21:42] *lang [04:22:02] lang is not default in key [04:22:18] but the most popular templates all incorporate int-lang hacks somewhere in them [04:22:23] with removals of properties we just removed 2 that needed a ton of code and tied things together and almost never used. That was mostly for core cleanup [04:22:29] Krinkle: very good point, my plan is to an analysis once the action=render stuff is properly cleaned up [04:22:45] ack, for things we can remove entnirely, that's fine. [04:23:11] but for things we'd re-implement in some other way, it might not be worth right now depending on what we're trying to improve. [04:23:22] Pchelolo: is there a ticket for the overarching work? [04:23:49] not really. T54807 I guess [04:23:50] T54807: Identify and remove legacy preferences from MediaWiki core (tracking) - https://phabricator.wikimedia.org/T54807 [04:24:11] I saw it but I wonder who that ties into performance :D [04:24:31] https://www.mediawiki.org/wiki/User:SKim_(WMF)/Performance_Dependent_User_Preferences [04:24:52] again, we removed stubthreshold and numbered headings mostly for code cleanup, not performance of parser cache size [04:25:04] I'm only readinng this page now fwiw [04:25:47] seems potentially disconnected, but anyway, if the work is done now I guess there's no point discussing it now :) [04:25:54] hehe [04:26:35] The thing is that at least https://gerrit.wikimedia.org/r/c/mediawiki/core/+/725441/ doesn't remove much, I hope I didn't miss something [04:26:51] or it's that after deleting the three preference we can remove a lot [04:27:20] stub threshold was my idea, headings was all Daniel [04:30:20] Pchelolo: speaking of ideas, do you want to take a look at the jobqueue patch 🥺 [04:32:14] Amir1: ok. I'll remove my -1, but it's a bit to late to hit the +2 button. I guess can do it tomorrow morning of fresh hea [04:32:16] head [04:32:27] sure! [04:34:11] Thanks