[03:14:47] anyone here know about the sites table? [03:16:05] why is there a separate table on every wiki? why is it not shared? [03:20:15] why is there 1053 rows in enwiki.sites when there are only 1044 wikis in all.dblist? [03:28:47] what is an equivalent identifier? what is it equivalent to? [03:33:40] "Beware: The script sometimes fails with a duplicate key conflict. In that case, go to the wiki's master database and empty the sites and site_identifiers tables, then run the script again." [03:33:46] such a great system [03:46:32] so it purports to store interwiki prefixes for sites, but it avoids using any code or data from the interwiki system [03:47:33] consistency between the two systems being ensured by a combination of never changing anything and crossing fingers [10:31:48] TimStarling: it's got something to do with wikibase I believe. It was added during early Wikidata development. [10:32:34] I don't recall what they ran into but I'm guessing something about interwikis and/or wgConf wasn't enough [14:25:38] Yeah, it's Wikidata/wikibase related [15:38:55] Krinkle: we have been debugging a CentralAuth issue with urbanecm where the CentralAuthUser object holds outdated data immediately after account creation (more specifically, on the GET request that follows after the signup form submit results in a redirect). [15:40:13] We are trying to figure out whether it's replag or a WAN cache issue. Do you know what consistency guarantees we have for the WAN cache? If the POST request deletes a key, and then responds with a redirect, is it guaranteed that the subsequent GET request won't see the old, invalidated value? [15:41:01] (In this case the two requests were to the same DC, and still seeing old data; but in general, that cannot be assumed.) [16:22:01] (the relevant task is T380500) [16:22:02] T380500: CentralAuthUser returning outdated data after user creation - https://phabricator.wikimedia.org/T380500 [16:29:05] tgr|away: WAN cache is expected to be in line with a replica database, meaning it is subject to a lag that is generally imperceivable, but can in edge cases be noticed. For example, purges/deletes/touches are broadcasted across DCs and may take e.g. 100ms or in the case of a down memc server, a gutter pool may be active for ~5 seconds. Any lag beyond 5 secondx results in the db replica being downed/depooled. We dont' have a depool [16:29:05] trigger in memc (because lag is unknowable) but generally speaking we're structured such that thigns cannot be stale for more than 5 seconds (e.g. gutters clear data older than 5 seconds, and tombstones last the same amount as well). [16:30:15] there is in practice really no such thing as "immediate" at scale, in so far that requests can overlap, so you can have request 1 set something in memc, and request 2 may start before it but still on-going for several seconds after req 1 is finished and continue to act on that information etc. [16:30:32] In relation to this task: [16:31:14] * POST rewquests, and any request for login.wikimedia.org (incl GET) is always routed to the primary DC. https://gerrit.wikimedia.org/g/operations/puppet/+/741f356e6294467c88c183bb01c3339e0abb27e3/modules/profile/files/trafficserver/multi-dc.lua#145 [16:31:41] This means that any delay of WANCache broadcasting purges to secondary DC should not affecting. [16:31:53] You can think of any CentralAUth request as being effectively a POST request. [16:32:08] If the second request is not routed to the primary DC, however, then cross-dc delay is of course a possibility. [16:33:07] * ChronologyProtector should make it so that you always see your own writes. We set a cookie after a POST request that writes DB data, that makes you 1) pin to the same DC for the next few seconds, and 2) stores some data in local memcached that informs Rdbms to pick or wait for a db replica that has caught up to your own previous writes. [16:33:59] * If the write and the read are on different domains, then we rely on the redirect carrying a cpPosIndex query parameter, instead of a cookie. This is enforced by MediaWikiEntryPoint.php based on OutputPage redirects. [16:36:02] If I understand correctly, T380500 is about a same-domain scenario (i.e. you locally created and ended that interaction on the local domain, even if sso/loginwiki is involved, it ends on the local domain), with the next request not seeing it. That should in theory not be possible as far as multi-dc and Rdbms is concerned, short of fringe events where writes got lost or replication was unusually high. [16:36:02] T380500: CentralAuthUser returning outdated data after user creation - https://phabricator.wikimedia.org/T380500 [16:36:50] So yeah, it might be that the WAN cache is used incorrectly. [16:37:18] It can't "lag" in a local DC scenario, since each key is only stored on 1 memc host, and that's where you'll have deleted the data. [16:37:57] but it might be that the purge is not done correctly, or that some other request has re-populated the data again if it's deleted without holdoff (on by default). [16:39:23] onTransactionPreCommitOrIdle is before the commit not after. Why do you think that might be too late? [16:40:15] In any event, I would do it without such callback. deletes include a 5 second hold-off by default, and since it is a cache, it is presumably safe to delete even in the hypothetical case where the request might have its db changes rolled back due to some kind of fatal exception. [16:40:29] no need to have it be conditional on precommit (i.e. writes sucesssful). [16:41:07] holdoff in this context means that we don't actually delete but set an "empty" value that rejects any writes for a few seconds (forced cache-miss) so that no other requests store potentiallly lagged data there. [16:41:21] not something you have to think about normally, but FYI :) [16:57:38] I've summarised the above on-task. [17:10:04] thx!