[19:52:49] @isaacj one thing I forgot to ask about... Wikidata! Is your diff-tagging project going to work for Wikidata items at some point as well? We have even less well-structured data about what human-meaningful changes occur in a given diff, and as I understand it, diff parsing is pretty much the only option to get anything better. [19:54:56] ragesoss: i had not thought much about wikidata because i thought the edit summaries had a fairly good summary of what each edit did that could be relatively easily parsed -- e.g., `Added [fr] description: ...`. what did you have in mind? [19:55:57] there are big gaps, in particular for multi-content revisions. you can't accurately tell how many references were added by parsing edit summaries, among other things. [19:56:26] eg, https://www.wikidata.org/w/index.php?title=Q111269579&diff=1596238100&oldid=1596236983 [19:56:54] the edit summary says 'created claim', but that's just a fraction of what happened in that rev. [19:59:30] I've chatted with Wikidata folks about this briefly, and from what I understand, comparing the JSON representations of subsequent revisions would be the only decent way to get good data. [20:00:17] (The Dashboard currently use edit summaries as the basis for the dedicated Wikidata stats we provide, eg, https://dashboard.wikiedu.org/courses/Wiki_Education/Wikidata_Institute-January_2022_(Spring_2022) [20:00:45] but this approach definitely misses a fair amount) [20:01:48] it'll be a fun library for someone to write at some point. [20:01:50] :D [20:34:58] @ragesoss oh thanks for these details -- i haven't worked much with wikidata edit history so didn't realize how much nested content could sit within a change like that. because it's JSON, that should greatly simplify doing the description... (maybe i'll end up eating those words) really just a matter of deciding the vocabulary and what to look for. i don't have plans myself but maybe a good hackathon project or if you end up having [20:34:58] a student who wants to work on it, i'd be happy to help out. fyi ^^ to @dsaez too who i know has worked more with the edit summaries on wikidata [20:36:11] yeah, could make a great internship project for GSoC or Outrechy. [20:36:30] of course, if I mentor it, it'll be a Ruby library. :) [20:48:44] * bd808 sends ragesoss a book on Python ;) [21:07:45] ooof yeah, then i'd be helping out thinking-wise but no use when it comes to code review. bd808 captures where i'm at :) [21:35:56] I mean, I know Python well enough to write a library like that. But then I'd have to live with myself for putting more Python into the world when it could have been Ruby.