[16:18:33] tgr|away: could we merge https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/968382 (the monitoring patch) today for the train this week? or do you have different plans? [18:56:46] perhaps someone knows, does anything different happen with load balancing / db selection when using shell.php w/FauxRequest vs a web request? I'm looking into an issue where WikiPage::isRedirect has the previous revisions value during the PageSaveComplete hook, it's looking like a timing problem in primary/replica but curiously the error does not occur when using shell.php [19:57:18] ebernhardson: in terms of selection logic, no, I don't think so. Not between e.g. an index.php web request and plain invocation of a maintenance script. But, many code paths hit inside api.php do pass a query group to getConnection() of "api" and so do real with a different (overlapping) set of replicas. And some maintenance scripts such dumps on snapshot hosts, specify --dbgroupdefault for a similar effect that overrides all connection [19:57:18] attempts in that process. [19:57:55] ebernhardson: more likely I think is a difference between primary and replica (eg. POST requests tend to involve primary dbs), or a difference between having a session snapshot vs not. [19:58:50] so during a POST request you may be able to see uncommitted data, or if you have a very long running maintenance script you may be seeing fairly stale replica data if you've had the same mysql replica open for a while (repeatable-read sessions). [20:15:10] Another possibility may be that there is LinkCache at play for one of them. We should be clearing that before deferred updates run (I believe this hook is deferred right?) but it seems probably that a bug might cause isRedirect() to return a stale value if it was populated before the write happened early on in the request. [20:20:39] Krinkle: thanks! I think the stale value is two part, one is that WikiPage::getRedirectTarget always queries DB_REPLICA, so in the process that just ran WikiPage::updateRedirectOn to insert into the primary that's probably going to get the late value [20:21:37] in my testing i can get the correct value by tweaking a mwdebug host to always use DB_PRIMARY in getRedirectTarget, along with forcing wiki page to load from primary. Currently evaluating how that should work [20:24:12] ebernhardson: hm.. well, generally if something deals with modified data it should either not use general access layers, or ensure the layers in question support a "for-write" or "latest" flag to be passed down to skip caches and replicas, or if the compont doesn't/can't support that to not react until later (i.e. job queue). [20:24:26] WikiPage and Title both support a flag to indicate where you want to load data from [20:24:40] it does look suspicious that this method bypasses that flag [20:25:09] yea, i'm thinking maybe getRedirectTarget needs to take into account mDataLoadedFrom [20:25:17] * - "fromdb" or WikiPage::READ_NORMAL to get from a replica DB. [20:25:17] * - "fromdbmaster" or WikiPage::READ_LATEST to get from the primary DB. [20:25:17] * - "forupdate" or WikiPage::READ_LOCKING to get from the primary DB using SELECT FOR UPDATE [20:25:21] yeah, precisely [20:25:51] Can you confirm that your caller is otherwise dealing with a fromdbmaster or forupdate flag? [20:26:34] Krinkle: it is not, thats the other half of the hacked up fix, having the hook handler call $page->loadPageData( 'fromdbmaster' ) [20:26:48] which also seems not ideal, particularly since this should be the WikiPage object that was used to perform the edit [20:27:40] I'm not sure we support changing it after the fact, loadPageData may no-op if called directly. Usually you'd get a new instance for your purpose. [20:27:54] maybe updateRedirectOn, or updateRevisionOn, should set mRedirectTitle instead of letting it round trip the database in this case, although it looks like Tim did that intentionally and would have to recreate some validation that exists elsewhere [20:28:32] but yeah, if it's passed in by for post-edit hooks, it's fair for that to perhaps be such an object already or (wrongly) deduce that it doesn't matter if instance state is updated already which yeah, it isn't for this piece of information [20:31:01] any suggestions for how to replicate this kind of thing locally? It's quite tedious to debug from the mwdebug hosts. Something that makes it easy to have a lagged local replica? [20:31:33] wgDebugToolbar distinguishes between replica and master queries afaik, even if they are intenrally the same host locally. [20:31:41] same for wgDebugLogFile [20:32:14] the information doesn't actually have to be lagged right? We can expect that if during an edit and this hook invocation, the data was queried at all, and queried from not-master, it's wrong. [20:32:56] curiously I don't get the wrong value in my local dev or in shell.php in prod, only from real requests (via curl or whatever) [20:33:39] specifically i'm looking at events that EventBus emit's after edits [20:33:52] when you say shell.php in prod, you mean producing an edit via shell.php and then calling $wikiPage from a hook? [20:34:09] I imagine the time between pressing return on a copy-pasted line is enough for replication to happen. [20:34:12] Krinkle: i mean new'ing up a FauxRequest, passing it to ApiMain and running execute [20:34:33] i had to use an extended FauxRequest though, because it always returns 127.0.0.1 which abuse filter doesn't like [20:35:35] like this, pasted into `mscript shell.php --wiki testwiki` https://phabricator.wikimedia.org/P53067 [20:37:11] hm.. and you're seeing it consistently fetch the right data there and consistently fail if you make a similiar API request via curl to an unmodified mwdebug? [20:38:00] yup, watching the `codfw.mediawiki.page_change.v1` stream in kafka to see the result, it includes an is_redirect field [20:38:59] perhaps try eval.php with -d2 to get more verbose logging and confirm that it is indeed using separate connections and queries. [20:39:17] maybe the call isn't taking place the way we think [20:41:42] youc an then compare that with WikimediaDebug "verbose logging: on" (or via curl XWD with "log". And walk through the rdbms channel for your reqId in Logstash compared to the ones you got from eval.php [20:42:13] oh wow that does emit a lot more info :) I'll have to parse over this