[00:53:16] [[Tech]]; 2A02:C7F:5082:D300:EDB6:7046:A570:1983; [none]; https://meta.wikimedia.org/w/index.php?diff=22147785&oldid=22138111&rcid=20113951 [00:54:09] [[Tech]]; Defender; Reverted changes by [[Special:Contributions/2A02:C7F:5082:D300:EDB6:7046:A570:1983|2A02:C7F:5082:D300:EDB6:7046:A570:1983]] ([[User talk:2A02:C7F:5082:D300:EDB6:7046:A570:1983|talk]]) to last version by 1234qwer1234qwer4; https://meta.wikimedia.org/w/index.php?diff=22147786&oldid=22147785&rcid=20113952 [00:57:08] [[Tech]]; Mutante; /* Directory Structure */; https://meta.wikimedia.org/w/index.php?diff=22147790&oldid=22147786&rcid=20113970 [03:29:39] Hi, there. [03:29:40] I am trying to upload a pdf of 100MiB+ size to Commons with mwclient chunk by chunk. Everything works well until the final chunk, where the requests always timeouts or returns a 504 error. My script: https://pb.nichi.co/opinion-announce-injury ; the file: https://we.tl/t-fJI9BW7C6U ; error: https://pb.nichi.co/acid-fix-action , https://pb.nichi.co/hurry-cry-exile . [03:34:56] In the past, I've found increasing the chunk size makes chunked upload more reliable [03:38:51] I looked through the source code of mwclient, to find its default chunk size is 1MiB. So I tried to increase it to 16MiB / 32 MiB / 64MiB. But no luck. [03:39:45] The script works well for the first a few chunks. But anyway it fails at the final chunk. [03:42:33] I did try to test the script on different network environments (my desktop PC with IPv4 egress in Hong Kong and IPv6 egress in China mainland; my VPS in LAX). It just fails "steadily". [03:46:18] last chunk is when MW recombines everything, so if its failing at that point, i would guess (but can't totally rule it out) that its not your network's fault [03:46:27] chunked upload is really sketchy [03:46:45] if you absolutely can't get it to work, you can ask for a server side upload at phabricator [04:05:53] Hmmmmmm [04:08:30] Thanks for the suggestion. But I have a huge amount of files to upload. I have not yet organized them properly. So I have to manually monitor the uploader running and tunes those files (filenames / metadata / generated wikitext). [04:11:05] I find some discussions about the file size limits and chunked upload around MediaWiki / Commons, which dates back to 2010~. It is pretty a surprise to me that chunked uploads are still not fully stable nowadays. [04:11:06] How big is the text layer of the pdf. I think i remember an issue from the distant past where chunked uploads wasn't playing well with files that had really huge text layers (like many megabytes) [04:11:32] I don't think they've changed much since 2010 [04:11:43] they've never really worked very well [04:15:12] I just check a few of those pdfs. There seems to be no text at all in them. They are scanned copies of ancient Chinese books without OCR. [04:18:20] there's still basically no one working on that sort of thing right now [04:21:07] It sounds strange and somewhat unreasonable. 🤦 [04:21:21] * bawolff is thankful he's no longer working on that sort of thing. It was frustrating [04:22:11] I think some work has been done on it since that point so it might not be as bad now, but I think back in the day I somewhat had the opinion we should throw it out at start again. [04:23:23] Then how do people actually upload big files to Wikipedia / Commons nowadays? My file is just around 130 MiB, which is not so big. I think there are a lot of bots on Commons that upload files of such size every day. [04:24:55] a lot of it is done by upload-by-url, which has its own instabilities [04:28:50] I also tried that. But it requires domains to be allowlisted. [04:30:02] Is it possible to ask for my own domain to be included in allowlist so that to transfer files from my VPS? [04:32:31] probably, i don't think we have any specific requirements. File a phab task and see [04:37:58] https://phabricator.wikimedia.org/source/mediawiki-config/browse/master/wmf-config/InitialiseSettings.php$15672 is the specific configuration [17:04:07] "maybe the page creation log..." <- Perfect, thanks!