[09:53:24] Reedy, seems like https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1200005 is fine without Phan complaining. [10:04:54] ooh [10:05:20] xSavitar: Should the line from AutoLoader be removed because it's no longer needed? [10:10:10] Yes, that can go away for sure. Thanks [10:10:26] I suspect in reality it doesn't make much difference... But.. :) [10:10:40] If we can do a handful of smaller ones like this though, that would be nice [10:43:32] That's a stack of 6 more... [10:44:25] of course it fails a few in :D [11:01:34] phan passes on all of them [11:01:37] I am suprised [11:02:19] heh [11:06:43] ReviseTone.cy.ts can go in the bin [11:09:27] zabe, someone probably hinted phan of our conversation yesterday :D [11:12:21] Have fun re +2 everything until it merged lokl [11:12:41] least the failures seems to be unrelated stuff... [11:13:46] yes [11:14:43] marking a test as failed due to a random error during the PostBuildScript feels very useful [11:17:36] selenium can be sometimes flaky so if that's the only failure then we can retry gate-and-submit [11:26:12] 2 merged at least [12:09:08] That was relatively painless [12:34:55] A few more up... [13:06:32] I'm on a treasure hunt, need help. Varnish uses the X-Analytics response header here, so it must be set by MW somewhere: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/varnish/templates/analytics.inc.vcl.erb#233 [13:07:06] but I can't find anywhere that we set page_id in that: https://codesearch.wmcloud.org/search/?q=%22X-Analytics%22&files=&excludeFiles=&repos=&i=fosho [13:15:28] milimetric, maybe https://gerrit.wikimedia.org/g/mediawiki/extensions/WikimediaEvents/+/59f724559d86779507b62221cd6e11b39466b690/includes/WikimediaEventsHooks.php#149 [13:17:07] thanks xSavitar, that's it! So, how come codesearch didn't find that line in the doc block above that function? And how did you find it? [13:17:57] I think because your search was \"X-Analytics\" instead of X-Analytics [13:18:28] Also, I found it looking at https://wikitech.wikimedia.org/wiki/X-Analytics docs and then saw `page_id` -> WikimediaEvents as the origin [13:18:46] Then I new somewhere in that extension, a hook is probably doing something with the headers [13:19:13] Then I found the onXAnalyticsSetHeader hook [13:19:27] ok, thanks, makes sense [13:19:47] *knew [13:20:04] ok, so now the mystery is how can line 152 there set the wrong page id. We're seeing in some cases lots of mismatched titles and page ids [13:20:18] so presumably `$title->getArticleID();` is wrong [13:21:24] or another possibility is that hook fires with bad data [13:23:54] Maybe, is there a task that I can read to understand your treasure hunt? [13:25:46] no, we just found some weird data, I'll make a task and link here in a minute [13:30:36] milimetric, I was just reading https://gerrit.wikimedia.org/g/mediawiki/extensions/XAnalytics/+/d417192bc6943ea01c91f80b55781d68be27429a/includes/XAnalytics.php#29 and it looks like injecting headers is not 100% reliable? [13:31:58] for page views, that's not a problem but for other uses-cases, it's problematic because PHP output buffer is flushed in multiple places, maybe worth confirming if that's still true today [13:32:51] ok, I'll return to that after I draft this, could be what's going on. Somehow an old header is getting stuck and getting sent with other responses? [13:38:44] https://phabricator.wikimedia.org/T408798 [13:39:13] Ack, thanks! [14:04:32] (if it helps anyone reading up, the set of requests that resulted in these incorrect page_id values was done by a user agent that was flagged as automated, I wrote the details in the task as comments) [16:12:09] xSavitar: zabe Current stack of 5 are probably ready to go now (fixed the phan issue on the composer patch) [16:13:32] nice [16:40:27] tgr|away: The editsitejs/editinterface grants in https://meta.wikimedia.org/wiki/Special:OAuthManageConsumers/e51b6e0aa6bc2a5c60a315102e88d2ee may be unsure of accepting. I'll leave it in the queue and differ to your reasoning. :) [16:41:06] s/may be/make me/ [17:05:53] milimetric: you mention om task they're diffs, that means they have a uri_query field not shown here, right? [17:06:22] The oldid parameter has precedence over thd title parameter [17:07:20] w/index.php?title=Banana&oldid=1 will show rev 1, idem for diff [17:07:52] The title is most likely what it was at the time but could also be fake [17:08:32] If they're diffs, maybe they shouldn't count as pageviews? [17:09:02] Or maybe it should emit title from MW instead of the url [17:09:27] ie add to x-analytics [17:10:56] I don't see a bug here (yet) but the data from different sources may be combined in a confusing/deceptive way [17:45:08] Krinkle - woa, ok, TIL a few things, thank you, will be back with more data [18:30:08] someone wants to quickly +2 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/FlaggedRevs/+/1200133 ? [19:15:02] Krinkle: ok, I have returned with more data. The theory I checked was: "in pageview_hourly, when a single page_id has many distinct page_titles, there are two explanations: diffs to other pages and client-side redirects that we see as 200s in webrequest" [19:15:40] I spot checked lots of data and that claim held. Most of the "diff" pageviews were done by automata, so it's easy to filter them out [19:16:46] The only way to 100% verify the theory is to join to the page table by title, exclude redirect pages, and see if there's any remaining cases. But I'm satisfied for now, it looks like there's no bug [19:18:23] well, maybe there's a bug, so here's a question for you: shouldn't MW only allow folks with elevated privileges to do these weird cross-page diffs? It seems like the use cases for this would mostly coincide with folks who can move pages and have other rights. Restricting it like that would protect us from garbage requests like we saw here, no? [19:18:57] (I'm following up with research to maybe change the pageview definition to exclude these diffs as well) [19:40:04] milimetric: they are not cross page diffs. We try make all query strings with a title for clarity. But on diffs and old revs, the title parameter is usually unused, it serves as fallback if the rev is hidden/deleted. [19:41:19] Banana, view history, diff or old: title=Banana&oldid=1. Then rename banana, the old url still works but will show current title. The param is not used, if the rev is valid. [20:04:08] It's a bit like slugs in a blog like example.org/123-my-headline/ where the text in the slug isn't strictly used, eg may be cut off, or malformed, or have renamed since, and therefor could be faked as 123-totally-not-a-rickroll [20:05:08] Perhaps if we can spare an extra 255 bytes, we can have MW emit page_title alongside page_id in the X header and prefer that when present. [20:05:26] But also diffs shouldn't be counted as pageviews maybe? [20:21:38] Note that oldids are not diffs without diff parameter, then they are a pageview for a previous revision which is a bit more common and presents the same ambiguity. So we may want to fix it from the MW side regardless. [22:08:15] Krinkle - the diffs I saw were completely different pages, not old names. Here, example of a pageview from the pageview table from that hour, 2025-09-30T23: [22:08:22] https://en.wikipedia.org/w/index.php?diff=1286192464&oldid=1124654206&title=The_Graduate [22:08:56] I see, so the diff rev and the old rev belong to different pages. [22:09:10] same as https://en.wikipedia.org/w/index.php?diff=1286192464&oldid=1124654206 [22:09:14] (without title) [22:09:40] I assumed you meant that diff/old were page A and the title page B [22:09:41] yep [22:10:17] yeah what I'm suggesting is a bug here is allowing that without being part of some special permission group [22:10:27] Right so that's definitely a cross-page comparison. It's not common but has a few use cases. For example, I might fork a gadget to my user space with an edit, and then you can compare the two. Special:ComparePages allows you to do this via the GUI as well. [22:10:31] because it's an easy vector for DDoSing or just plain being a nuisance [22:10:51] right, I can see the use cases, just maybe good to limit it [22:10:55] through history merge and split such urls can also retroactively become that way even if they don't start that way [22:11:04] With diffonly=1 there isn't even a page content below the diff [22:11:11] so +1 for not treating them as pageviews. [22:11:47] although if we emit page_title from X-Analytics, we'll at least consistently attribute them to the (current) title of the right-hand side and (unless diffonly=1) thus the title of the content preview below the diff [22:12:04] k, that's one good outcome, so then I'll close this. Do you want me to open up a feature request for limiting cross-page diffs to certain user_groups? [22:12:17] idem for non-diff cases like https://en.wikipedia.org/w/index.php?title=The_Graduate&oldid=1124654206 which is actually [[Rhacophorus reinwardtii]] [22:12:32] I don't think we want to restrict that. [22:13:01] k, then I'll close the task as invalid. Thanks for the many TILs :) [22:13:19] https://en.wikipedia.org/w/index.php?title=Special:ComparePages [22:14:50] It comes up at times after someone manually moves a page by copying and pasting, Tools like this can help untangle the mess. Granted, not likely used by anons by proxy of being less experienced, but I don't think we want to restrict that per-se either. Anyway, interesting to think about if it adds up with other downsides. [22:16:32] We do at least robots=noindex such diffs. And that GUI is probably not discovrable by anons. But yeah manual URL thinkinering and people hitting it on purpose will still come through. [22:17:02] but those are hard to distinguish from cases where the revs did belong to the same page at some point, or when e.g. someone authorized made the link and then shares it with you [22:34:21] closed with all the context from here: https://phabricator.wikimedia.org/T408798#11329946 [22:37:55] cool frog <3 [23:34:32] milimetric: if you edit that phab comment and make the fence "```lang=irc" it will get syntax highlighting. That might make it a bit more readable.