[02:23:45] so did you know one of your en.wikipedia admins is a russian agent abusing admin buttons? [11:41:03] https://mastodon.technology/@magnusmanske/109069636136476536 "What I don't get is why they can't provide a simple compatibility view that thousands of volunteers can point their tools to, instead of having to rewrite every query..." [11:45:26] I mean, it seems like it would be reasonable to have some sort of "stable" version of the toolserver replicas where best effort is made to have views that give a stable schema [14:31:56] We have dumps and HTTP APIs for "(close to) no maintenance" use cases. [14:32:50] I'm not sure we can have a schema that is as powerful as what raw database use cases need that can still remain stable. We do make logical changes over time that require thinking about the data differently eventually, or where it'd be significantly better performing for complex tools that use it if they use the same schema as prod. [14:33:06] We also do have a built-in migration period of usually several months with core doing write-both. [14:34:35] I'm sure there's plenty of cases where it doesn't make sense, but given we're already doing complicated things with views, it doesn't seem like its that much more effort [14:34:55] * bawolff says that, but also is not volunteering ;) [15:42:32] I think a templatelinks_compat view would probably be accepted as a patch; I imagine WMCS would expect a timeline to go a long with that, though; and it's not clear to me that there are tools other than Magnus' still unmigrated. I don't think we can solve the "few people maintain many of the tools" by having WMCS advertising/maintaining DB views indefinitely. And a larger 'new stable' schema seems too costly for such a relatively small [15:42:32] issue. It'd be cheaper presumably to help migrate the handful of affected tools instead, at which point we're back to documenting migration patterns and setting a deadline, which is effectively what we did already with the prod compat period. [15:42:49] maybe it should have taken longer, I don't know what the timeline was or how well communicated it was. [15:43:22] One other thing that was proposed during a chat in Berlin recently was that we could offer a PHP library that makes querying things a bit easier. [15:43:32] E.g. a "stable" interface in the form of PHP accessor classes [15:43:50] toolforge does tend be a place where people make one off tools. Actually maintained tools like Mangus's stuff sometimes seems to be a bit in the minority [15:44:21] to make a tool one-off and virtually stable for long periods of time, you'd need to base it on MW API requests only or dump files. [15:44:44] A bunch of my tools work like that now, I've reduced a lot my reliance on replica DBs directly. [15:45:10] I've also moved a number of them to be gadgets instead and use the Special:Blankpage/xxx pattern. [15:45:53] but this is somewhat incompatible with our direction to tigthen timeouts and plug DOS vectors :D [15:46:29] so yeah, a PHP library might be a good middle ground to reduce districuted effort of re-inventing queryies and joins etc. [15:52:40] the dumps and the Mediawiki API work well, I used them a few times, but in addition to those there's also Special:Export which seems decently stable [15:54:56] the problem is that the dumps can be hard to use on large wikis like commons if you need ones that are huge [15:57:23] but often you can use some of the non-huge ones [16:01:00] eg. the complete dump of commons pages with history is over 50 gigabytes, but often you only need the latest versions [16:01:42] and it gets worse because these days a lot of the information is just transcribed from wikidata, so you may need a wikidata dump too, and those are HUGE [16:04:05] and I still don't understand how the djvu pages work on wikisources [16:05:22] the Page pages [16:05:30] pages in the Page namespace [16:09:10] O_o don't use special:export programmatically please [16:09:50] «I don't think we can solve the "few people maintain many of the tools" by having WMCS advertising/maintaining DB views indefinitely» <-- I guess the question would be whether the migration is a one-off cost or not [16:10:44] Some queries of old just became impossible, afaik [16:11:42] Nemo_bis: If you mean pre 1.5 schema old, yeah that's pretty hard, but i don't think we've ever had a schema modification as extensive as the 1.5 one was [16:13:17] I mean queries which join a lot of data from different tables and wikis [16:16:15] lol, oh, not the literal "old" table [16:16:49] I think the having all dbs in one server thing was planned to go away at some point (if it hasn't already) [16:17:15] Which i mostly don't care about, except for centralauth and globalimagelinks [16:18:15] it has [19:33:17] except for analytics/research db where afaik we stil do all wikis on one [19:38:01] The wiki replica databases are a horrible hack that should never have happened IMO. They are a work around against providing a more complete and useful API or a designed OLAP schema for answering analytic questions. [19:39:04] There have been high level discussions about starting work on an OLAP schema, but so far they have not reached the level of having any official funding/staffing [19:44:30] "Should never have happened" but the alternative was to reject Sun's donation in 2005 ;) [19:56:09] that $10K of hardware or whatever it was has a much more expensive legacy [21:38:17] I think that T215858 is the most recent discussion ticket about an OLAP replacement for the replicas. [21:38:18] T215858: Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema - https://phabricator.wikimedia.org/T215858 [21:42:00] is there any (public) documentation about that donation from Sun? It sounds like there were strings attached? [21:42:24] (entirely from a personal curiosity standpoint, nothing more) [21:47:42] Depends what sort of docs you want ;) [21:47:44] https://www.mediawiki.org/wiki/Toolserver:Servers [21:47:51] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2012-10-01/Technology_report [21:48:44] not all of the un hardware was from external grants as things like https://meta.wikimedia.org/wiki/Grants:PEG/WM_DE/Improve_toolserver_reliability show [21:48:52] *Sun hardware [21:49:54] * bd808 sees that grant is in the signpost article too [21:54:31] Oh, right! [21:54:56] I think I got the wrong end of the stick entirely there, assuming that the donation was production servers and had some schema design strings attached