[03:35:16] hello, I would like to ask a question regarding: Wikimedia Enterprise HTML Dumps [03:36:09] I want to extract images out of html dump, and check their license [03:36:23] The dump contains the following information: https://dumps.wikimedia.org/other/enterprise_html/ [03:37:10] I am extracting the images from the article_body html, where the specification is given by: https://www.mediawiki.org/wiki/Specs/HTML/2.4.0 [03:38:07] Here, they discuss how the images can be obtained but not on how to obtain the licese type of the image: https://www.mediawiki.org/wiki/Specs/HTML/2.4.0#Images [03:38:31] Please can someone tell me the right way to obtain the license type information in images? [04:15:03] developerPerson: you're going to have to make a separate API request, let me see [04:15:27] developerPerson: https://www.mediawiki.org/wiki/Extension:CommonsMetadata#Usage [04:16:08] Thank you for responding, I appreciate it [04:16:11] but if I did this for every image in the dump, I would be spamming the API right? [04:21:01] yeah, but if you follow https://www.mediawiki.org/wiki/API:Etiquette you'll be fine [04:21:31] you should be able to batch requests and get multiple images at once [04:21:43] off the top of my head I can't think of a faster/more efficient way of doing it [04:21:57] (maybe someone else in here will speak up) [04:24:31] For a normal dump, copy right information is contained in the image description page: https://dumps.wikimedia.org/legal.html [04:25:58] I am experimenting with the data, and will likely need to recompute the dataset over and over again, so it would be preferable if I did not have to send APIs [04:28:38] oh yeah, you could just parse it out of the dump [04:28:51] you'd need to look at the Wikipedia and Commons though, since images can come from both [04:32:40] But is there a dump for commons? [04:33:06] The reason why I was using the enterprise dump in the first place is because I wanted the html dump for Wikinews [04:33:12] and the normal dump did not provide it [04:33:41] I don't see an enterprise dump [04:33:46] You can ask for it on Phabricator [04:34:24] ah, someone already has: https://phabricator.wikimedia.org/T300907 [04:35:42] I see, so it has been requested but not completed? [04:37:21] yep [04:38:58] Cool that was really informative, thank you as always