[11:18:05] [telegram] oh, maybe I was doing that. Either way it works now [11:29:49] [telegram] https://wiki.openstreetmap.org/wiki/File:%C3%96lbohrplattform.jpgIs there some way to list all such files?https://wiki.openstreetmap.org/w/index.php?title=Special:ListDuplicatedFiles&limit=500&offset=0 is limited to local ones : https://tools-static.wmflabs.org/bridgebot/29cbfaa0/file_10177.jpg [11:42:59] [telegram] Hello everyone [11:45:59] [telegram] I guess you can use https://wiki.openstreetmap.org/wiki/Special:ApiSandbox#action=query&format=json&prop=duplicatefiles&generator=allimages&dflimit=max&gaiprop=&gailimit=500 and follow continuation until you’ve gone through all files (the API includes non-local duplicates by default) (re @matkoniecz: ) [11:46:10] [telegram] it’ll take a while but I’m not sure there’s a better way [11:46:24] [telegram] (link uses gailimit=500, with gailimit=max – 5000 – I already got a timeout) [13:20:17] [telegram] https://wiki.openstreetmap.org/w/api.php?action=query&format=json&prop=duplicatefiles&generator=allimages&dflimit=50&gaiprop=&gailimit=50 shows for example https://wiki.openstreetmap.org/wiki/File:002.png which is not duplicated with Commons (re @lucaswerkmeister: I guess you can use generator=allimages + prop=duplicatefiles and follow continuation until you’ve gone through all files (the A...) [13:29:50] [telegram] Even access https://wiki.openstreetmap.org/wiki/Special:FileDuplicateSearch/%C3%96lbohrplattform.jpg with API would be nice [13:32:45] [telegram] https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bquerypage seems to indicate that it is impossible [13:33:32] [telegram] I think you're getting all images, and including the duplicatefiles prop if there is one, but the ones without it aren't excluded. So you need to filter out only the ones with that prop (i could be wrong though; on mobile now, so it's hard to check properly) (re @matkoniecz: https://wiki.openstreetmap.org/w/api.php?action=query&format=json&prop=duplicatefiles&generator=allimages&dflimit=50&gaiprop=&ga...) [13:36:41] [telegram] yes, you'll have to check if the files actually have duplicates in the result or not [13:38:52] [telegram] A bit silly to iterate through every single image instead of getting just list of duplicates, but better than doing it manually. [13:39:02] [telegram] It's https://wiki.openstreetmap.org/w/index.php?title=Special:ListDuplicatedFiles&limit=100&offset=0 right? (re @matkoniecz: A bit silly to iterate through every single image instead of getting just list of duplicates, but better than doing it manually.) [13:39:47] [telegram] "Only local files are considered." for start (re @MaartenDammers: It's https://wiki.openstreetmap.org/w/index.php?title=Special:ListDuplicatedFiles&limit=100&offset=0 right?) [13:40:23] [telegram] query with prop found for example https://wiki.openstreetmap.org/wiki/File:Military-loading-class.jpg [13:41:04] [telegram] 50.000 files isn't too bad for a one time run. You can just retrieve the hash and do a lookup on Commons [13:41:41] [telegram] Pretty sure we wrote some code to do that when we moved files from Wikipedia to Commons [13:43:42] [telegram] Let's see, I found https://www.mediawiki.org/wiki/Manual:Pywikibot/nowcommons.py . But that's after finding the dupes to replace usage and delete the local ones [13:44:57] [telegram] The tricky part is that people can upload new ones. But one time run and handling what was found would at least allow me to avoid asking people where answer is trivial. (re @MaartenDammers: 50.000 files isn't too bad for a one time run. You can just retrieve the hash and do a lookup on Commons) [13:45:50] [telegram] I used https://github.com/multichill/toollabs/blob/master/bot/tag_nowcommons.py back in the day, but I had database access so retrieving the hash of all files was a cheap operation (re @matkoniecz: The tricky part is that people can upload new ones. But one time run and handling what was found would at least allow me to avoi...) [13:46:59] [telegram] And we still had cross database joins for Commons. Good times..... [13:47:46] [telegram] Sadly, I have no direct database access. Well, I will just make 1000 calls or something (re @MaartenDammers: I used https://github.com/multichill/toollabs/blob/master/bot/tag_nowcommons.py back in the day, but I had database access so re...) [13:52:52] [telegram] Getting all images with the hash is a relatively cheap operation AFAIk. https://wiki.openstreetmap.org/w/api.php?action=help&recursivesubmodules=1#query+allimages . You need the sha1 aiprop, so something like https://wiki.openstreetmap.org/w/api.php?action=query&list=allimages&aifrom=B&aiprop=sha1 . [13:52:52] [telegram] On Commons you can look up the file like https://wiki.openstreetmap.org/w/api.php?action=query&list=allimages&aisha1=e568559a00eea95232624a466586aaddc28f0a9c [13:53:22] [telegram] Don't go to fast and please set a proper user-agent [14:43:11] [telegram] in the end following seems to work fine: [14:43:12] [telegram] continue_index = "" [14:43:13] [telegram] while True: [14:43:15] [telegram] url = "https://wiki.openstreetmap.org/w/api.php?action=query&format=json&prop=duplicatefiles&generator=allimages&dflimit=50&gaiprop=&gailimit=50&gaicontinue=" + continue_index [14:43:16] [telegram] response = requests.post(url) [14:43:18] [telegram] data = response.json() [14:43:19] [telegram] pages = response.json()["query"]["pages"] [14:43:21] [telegram] keys = pages.keys() [14:43:22] [telegram] for key in keys: [14:43:24] [telegram] if "duplicatefiles" in pages[key]: [14:43:25] [telegram] page_title = pages[key]["title"] [14:43:27] [telegram] # PROCESSSSSSSSSSSSS [14:43:28] [telegram] if "continue" in data: [14:43:30] [telegram] continue_index = data["continue"]["gaicontinue"] [14:43:31] [telegram] else: [14:43:33] [telegram] break # crawled through all