[09:05:22] !log admin [codfw1dev] root@cloudcontrol2004-dev:~# wmcs-makedomain --project testlabs --domain testlabs.codfw1dev.wmcloud.org --orig-project cloudinfra-codfw1dev (T391325) [09:05:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:05:29] T391325: openstack: improve networktests for newer network setup - https://phabricator.wikimedia.org/T391325 [10:26:32] !log wikisource rebooted wsexport-prod02 as an attempted quick fix for T391563 [10:26:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikisource/SAL [10:26:36] T391563: Error 500 when attempting to download any document - https://phabricator.wikimedia.org/T391563 [10:28:42] * TheresNoTime prefers when 'turning it off and on again' Just Works(tm) :(( [11:06:46] !log wikisource cleared `/ws-export/var/file-cache` for T391563 [11:06:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikisource/SAL [11:06:49] T391563: Error 500 when attempting to download any document - https://phabricator.wikimedia.org/T391563 [12:53:37] !log lucaswerkmeister@tools-bastion-13 tools.ranker deployed 0f074e3dd6 (l10n updates: es) [12:53:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL [15:39:34] Hi, so, my question is, is there any CDN thing or any way to load wikimedia commons content faster? It takes about 2-3 seconds or more to load a thumbnail version (maximum dimension 512px) which, in my opinion, is not up to that great performance. And, hosting the images myself is not an option as of now. [15:40:16] main thing is to make sure you're using a pre-generated thumbnail size [15:58:46] @nokibsarkar: This might be a different context, but https://www.mediawiki.org/wiki/Extension:QuickInstantCommons has some benefits. [16:00:35] So, i have to install it? I was hoping for an API or something. For more context, I am using golang. [16:04:17] @nokibsarkar: sounds like a different context. QuickInstantCommons is for a local MediaWiki install's access to Commons [16:05:21] I am wondering, is there any shortcut way to generate a thumbnail without the restrictions put on the API? [16:16:12] @nokibsarkar: maybe you can share more about what you are doing now and folks can help think about alternatives? Links to source code if you have them might be helpful. [16:18:21] So, the thing is, I have a jury tool which loads images for the jury to evaluate. I present each images from pre-saved database thumbnail url that I fetched from wikimedia commons during the import. And, thumbnail is not being very responsive here. [16:19:44] Well, if you're requesting sizes that aren't already generated, and therefore cached... [16:20:03] the size is the main helper to look for there I think. if there is already a generated thumb to serve from disk (swift) then it should be faster than making a new thumb via Thumbor. [16:23:13] https://noc.wikimedia.org/wiki.php#wgThumbLimits are the sizes we pre-generate. [16:25:03] or even from frontend caches too [16:25:11] depends if you're loading well used photos [16:25:47] a jury review tool is probably not hitting a lot of hot cached images [16:42:46] ``` [16:42:46] wgThumbLimits: [ 120, 150, 180, 200, 220, 250, 300, 400 ]``` [16:43:01] Does these represent the height or width? [16:45:23] @bd808 [16:46:19] @nokibsarkar: https://www.mediawiki.org/wiki/Manual:$wgThumbLimits is not explicit, but if I'm remembering correctly it is the max of the width or height depending on the aspect ratio. [16:48:09] So, essentially, max dimension? Right? Also, can I put these numbers into mediawiki api imageinfo height and width and get a url that would hit a hot cache? [16:48:09] Maybe it's width though? It would be the equivalent of `[[File:example.jpg|thumb|50px]]` and I think that the `50px` there is width [16:50:04] The numbers are the numbers in URLs like https://upload.wikimedia.org/wikipedia/mediawiki/thumb/a/a9/Example.jpg/250px-Example.jpg, which again I am now thinking are likely widths. [16:51:34] any large gallery is likely to feel slow because images that are not served from the CDN edge cache ("hot" images) are going to be rate limited even if they are being served from swift storage. [16:52:23] Popular articles end up with images in the CDN edge cache (ram + disk) which makes them serve up faster [16:53:07] Your gallery tool is competing with all the bots trying to download commons to feed to an LLM :/ [17:05:59] `````` [17:14:02] Use the most common thumbnail size (to increase the chance of it already being there) and do background loading (re @nokibsarkar: Hi, so, my question is, is there any CDN thing or any way to load wikimedia commons content faster? It takes about 2-3 seconds o...) [17:14:51] ``` [17:14:52] x-cache: cp5032 hit, cp5032 hit/2 [17:14:54] x-cache-status: hit-front [17:14:55] ``` [17:14:57] Does this header mean anything when I try to fetch an image? [17:15:15] I also recall running a robot to force the generation of the thumbnails [17:15:51] @nokibsarkar: yes. the "cp5032 hit" and "x-cache-status: hit-front" parts mean that request was served from the CDN edge cache [17:16:39] meaning that one was as fast as things are likely to get [19:17:08] [3edee91d-8b44-409f-8d5c-b2b8f22e7580] Caught exception of type Wikimedia\RequestTimeout\RequestTimeoutException [19:17:20] what does it mean? [19:37:47] @nokibsarkar: "The maximum execution time of 60 seconds was exceeded" is the plain text message for that stacktrace in our logs. The stacktrace shows that the parser was in the middle of handling magic words in some wikitext when the timeout hit. [19:39:11] Image simply using gcmlimit=max and imageinfo (without the expensive thumb part). My limit is 5000 [19:41:03] Your request wants data that is extracted from templates in the pages. Asking for 5000 pages to be parsed likely took too long. Calling the same query now returns almost immediately for me, but that would probably be because the parser cache is now warm for those titles. [19:45:21] If it was timeout, how could that query be cached (I am assuming failure on timeout) [19:46:16] the query is not cached, but the expensive task of parsing the wikitext to extract the iiprop=extmetadata contents could be [19:46:56] My query i`s [19:46:57] params := url.Valu`e`s{ [19:46:58] "action": {"query"}, "format": {"json`"`}, [19:47:00] "prop": {"imageinfo"}, "generator": {"categorymembers`"`}, [19:47:01] "gcmtitle": {category}, "gcmtype": {"file`"`}, [19:47:02] 5000 at a time is likely to hit timeouts. paging over the cursor in smaller chunks would be advisable [19:47:03] "iiprop": {"timestamp|user|url|size|mediatype|dimensions|extmetadata|canonicaltitle"}, "gcmlimit": {"max`"`}, [19:47:04] // "iiurlwidth": {"640"}, // "iiurlheight": {"480`"`}, [19:47:06] "iiextmetadatafilter": {"License|ImageDescription|Credit|Artist|LicenseShortName|UsageTerms|AttributionRequired|Copyrighted"}, ` } [19:47:55] so, the bottleneck is `iiprop=extmetadata` ? is it on the fly? [19:49:32] Yes, all of the extmetadata would be scraped from the wikitext of the File: page when you ask for it [19:50:05] there are some layers of cache that would hold on to the answers for a bit, but not too long actually [19:50:12] Let me try without the extmetadata [19:50:29] less then 5000 per iterator chucnk will help too [19:50:33] *chunk [19:51:47] Now seems like faster