[10:43:27] errand+lunch [14:13:57] \o [14:24:56] o/ [15:31:21] got most of the ab test metrics from the old report worked out, although not sure if they are correct :P Maybe will have to figure out how to write some tests in the notebook [15:31:37] last bit to do is the interleaved analysis, but i have some old code and it shouldn't be that hrad [15:44:42] nice! [15:59:19] also some definitions are squishy...i probably have to rework the return rate metrics. I can't decide if "Among users with at least a click in their search, the proportion of searches that return to the same search page" is the same as "number of sessions with at least 1 click and 2 serps for the same query' [15:59:39] in part, because the old number was 20% and the new one is 6% [16:00:21] and then "Among users with at least a click in their search session, the proportion of sessions that return to search for different things" is the same as "percentage of sessions with at least 1 click and 2 unique serps" [16:00:38] thats 30% in mine, and 20% in the old data. Definitions are hard :P [16:02:03] reading from these sentences you can certainly interpret different things, I guess what matters if what we're computing now and make sure we understand what it measures [16:02:16] s/if/is/ [16:02:58] inflatador: waking up sick so gonna head back to sleep. Can you let dpe sre know in triage that we’re punting graph split rollout to jan? (Logged out of slack on my phone) [16:03:23] yea, i suppose at least in the ab context the important bit is the difference between tests, as long as they are measured the same and the metric is sensible [16:03:27] the shift in numbers probably means that we were computing something slithly different? [16:03:53] my assumption is the different % is different calculation, yea. user behaviour could have changed but i'm not sure how much [16:04:09] unless something odd happened to the UI and/or how we collect metrics I can't believe that user behavior changed that much [16:04:25] not sure how best to keep historical data for this kind of thing...but it would be nice to be able to get a graph of ab test metrics over time to compare against [16:04:40] we could calculate it daily over the 90 days available, maybe can do something [16:05:32] yes that'd be ideal for sure [16:05:57] ryankemper Y can do [17:12:59] ryankemper, inflatador: a quick patch to ease the analysis of sparql queries coming from multiple suggraph (if/when you have a moment): https://gerrit.wikimedia.org/r/c/operations/puppet/+/1084193 [17:42:18] something odd with the internal graph split hosts... [17:42:45] curl -XPOST --data-urlencode "query=SELECT * {wd:Q675 wdt:P31 ?o } LIMIT 10" http://localhost:6041/sparql (6041 is wdqs-internal-main I think) is OK to return an entry [17:43:17] but should not return an entry from port 6042 [17:45:35] was curious and went poking around for sessions that have a special:search request followed by an external referrer pageview. Actually still has a bunch of false positives (pv not matching search intent), but some interesting excerpts: https://phabricator.wikimedia.org/P71689 [17:49:02] for the autodesk one I think were good? [17:49:45] i guess i didn't try them, indeed trying that one the results look fine [17:50:01] but they still chose to come in to same page from google, curious [17:50:02] for the bordella one we're not too bad either [17:50:30] yea that actually looks good [17:50:32] the noble gas one clearly we're not giving what they found from google [17:50:55] i added two more at the end, not sure when you loaded the page [17:51:04] looking [17:51:15] teva looks fine [17:51:52] the ascker bilk one we indeed don't correct. So curiously, users still come to the page from google even if our results are decent and they (presumably) see them [17:52:39] Usher Belk -> Acker_Bilk is a hard one, I guess they typed what they've heard [17:56:07] yea google and bing don't manage to correct the first, only the second mr asher bilk -> acker bilk [17:56:53] i'm mildly surprised it wasn't more prevalent. Could poke through more, but most sessions the return pageviews are unrelated to the search they ran on our side [18:33:47] re wdqs-internal-(main|scholarly) oddities, done some testing at T376150#10394864 [18:33:47] T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly - https://phabricator.wikimedia.org/T376150 [19:40:56] ACK, will look into the internal hosts' data problems as time permits. Sounds like we need some sanity checks for the data [20:04:16] taking my son to the doc, back in ~2h [20:33:32] Back around now, will start doing some data xfers