[11:48:46] hello. when I download a page export from Special:Export, the XML contains two sha1 fields per revision, one in an sha1 element and one in an sha1 attribute, with apparently the same value. how is this value encoded? it seems to contain letters and digits only, but it's not hexadecimal nor either of the base32 alphabets that I tried. [11:49:31] I'm asking because I'm getting an sha1 checksum for revisions of a page where the content is hidden, and I'm trying to see if that's leaking hidden information [11:50:58] I looked in https://www.mediawiki.org/wiki/Help:Export and https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export but they don't seem to answer my question [11:51:52] hi. why is content hidden? [11:54:16] a wiki admin hid some revisions deliberately, and (as I don't have privileges) I can't retrieve the contents of those revisions with either the normal html API nor with api.php?action=query&prop=revisions , but both the normal html API and api.php shows the date and user for those revisions, and the latter shows the revision ids [11:54:45] so they are like semi-deleted revisions, which I hadn't even known until today existed on mediawiki [11:55:23] here's a specific api query showing a hidden and a non-hidden revision https://esolangs.org/w/api.php?action=query&prop=revisions&revids=149894|150530&rvslots=*&rvprop=ids|flags|timestamp|user|userid|sha1|contentmodel|tags|roles|content [11:56:23] ok. i know what you mean. i deleted page content befofe like that. i never knew they had a hash. [11:56:44] I don't know if they're really hashes for the content, I'm still trying to figure out the format [11:56:48] can you ask an admin to email the dsleted revision to you. [11:57:01] I could, but I can look at non-hidden revisions first [11:57:23] as i think the software makes it hidden properly. this hiding in 80% of cases is because of leaked personal info. [11:57:42] so they are not quite meant to be available to users who are not sysops. [11:58:00] how do you even view that hash again? [11:58:07] with Special:Export [11:59:25] in this case it's not for leaked personal info, but because a bot is trying to query a diff between every pair of revisions of a page, so like a quadratic number of revisions, and this is a page with a particularly long history, and the wiki admins are trying to figure out some workaround for the large amount of bot traffic [11:59:55] I'll check what characters the encoded ids can contain to see if I can figure out the format [12:07:02] i got it i willl look a bit later as it nearly midnight here now [12:07:18] thank you [12:07:30] i think you can hide bot revision [12:07:42] they just not shown in rc [12:08:10] what does "in rc" mean? [12:08:25] wait, sorrh,i misread [12:09:04] why is bot querying diff for all page revisions? [12:09:57] we don't know, it's not a bot we control, it's suspected to be some kind of generic spider [12:10:02] it sounds partially like system engineering. is it cpu load? bandwidth? what is being maxed out? [12:10:23] you can just block bot traffic in .htaccess, theoretically [12:10:24] probably downloads them because there's a form with option buttons on the history page [12:10:48] I don't remember what resource it's consuming too much of [12:11:02] it would be interesting to know [12:11:36] you could setup wwbserver to disallow bots to query page history. they can have current page and thats it [12:11:42] webserver [12:12:46] gry: if you want to know more, you can ask the admins on either #esolangs on this irc network, or on https://esolangs.org/wiki/User_talk:Ais523 [12:19:06] ok [12:34:35] ok, so the hidden revisions don't seem to have the value in the sha1 element, but they have it in the sha1 attribute [12:55:55] the encoded checksums in the export have an inventory of 36 characters: all ascii digits and lowercase letters [14:05:06] so it turns out that the checksums in the export are encoded as big-endian base 36 numbers, with "a" to "z" meaning the digits 10 to 35, and this matches the big-endian hex checksums in api.php?action=query&prop=revisions&rvprop=ids|sha1 [14:15:04] very creative. [14:15:06] hmm