[10:33:15] errand [11:10:16] sigh... the hits array in cirrus backend logs is not populated for completion searches, it expects a "classic" search response, will need to fetch them from the elasticsearch_requests array I guess [11:45:52] lunch [13:17:09] o/ [13:20:47] \o [13:21:47] .o/ [13:32:28] o/ [13:36:21] i glanced at the case sensitive undelete thing again yesterday, i'm relatively certain the query itself is case insensitive, i think we are missing the data. Ran ForceSearchIndex.php --archives for codfw yesterday, running in eqiad now. but only on enwiki (even with --queue it takes awhile) [13:37:06] sadly doesn't seem to have changed results in codfw :S [13:37:25] but i think the extra results are coming from the sql LIKE query...probably needs more investigation [14:01:22] hm.. something's probably wrong in how I extract the data, dwim would be responsible for 21% of the clicks on ruwiki autocompletes which'd be a lot higher than what I'd have expected [14:04:28] hmm, yea that's surprisingly high [14:05:35] yes likely something I'm missing in the backend logs... [14:17:13] 21% would be very impressive! [14:22:12] trying to disprove this number but failing so far... [14:23:59] getting interesting searches with only second try results, for instance: https://ru.wikipedia.org/w/api.php?action=opensearch&search=%D0%B0l [14:25:59] ah silly me, forgot to join on page_id, 21% is the number click event I can find in backen logs [14:26:37] which turns out to be quite bad if I can't join backend logs and click events [14:43:56] joining searchsatisfaction search_token with backend logs search_id which I believe both comes from Util::getRequestSetToken() [15:06:37] yea search_id should be the same [15:06:49] ah but search_token only applies to fulltext, it's being set via onSpecialSearchResults [15:07:01] it's also in api requests, as X-Search-Id header [15:07:10] i think searchsatisfaction.js still manages to capture that [15:07:16] ok [15:09:17] getting many search satisfaction events without a searchToken on some wikis, click/visit events with action=autocomplete have 9493 events without a search_token and only 882 with one on hewiki [15:11:58] :S [15:12:36] i suppose click/visit is expected, iirc we only log the search_id for serp events [15:13:19] not sure how to best handle...maybe left join on search_token, then window over session_id to bring in data to each row? [15:14:34] I see some logic on submit-form that seems to keep track of the token via lastSearchId but perhaps not working as I expect [15:15:03] hmm [15:15:51] or perhaps I join on session id + search query [15:16:31] yea it does look like it's supposed to log that with clicks :S [15:17:13] j [15:18:26] even on searchResultPage actions I see a big loss, surprisingly very high on some wikis, only ruwiki has more events with a search_token [15:24:16] that's really odd :( [15:37:39] dcausse: maybe we should look into that "аl" (Cyrillic а, Latin l) example.. it's searching "fl", but it could search "ад" instead... [15:39:27] Trey314159: yes this one sounded surprising [15:42:18] not sure what I can use instead of search_token to join backend logs and search satisfaction... [15:45:06] getting enough data with search_token only on ruwiki, for which I see 1% of clicks due to dwim [15:46:11] this is one day of logs, expanding the range but not super optimistic having enough on "small" wikis [15:46:28] yea doesn't sound great :S I'm not really sure what else we have though [15:47:20] 1% wrong-keyboard queries was the estimate on ruwiki, so that sounds reasonable [15:49:49] it's only autocomplete affected by search_token issue, at least fulltext gets it all the time [15:51:15] i suppose it works off mw.track events which iirc are emitted from multiple implementations, maybe one isn't sending it. [15:52:36] ok will gather more data but caveat will be that it's only a small subset of the actual clicks, if the search_token bug is kind of "random" this should still give us a rough idea [16:08:55] looks like mjolnir completed a run, haven't looked at outputs yet [16:49:16] Was curious so pulled up some graphs of the final ndcg@10 since 2018, t clearly shows falling off a cliff when the training data declines, but not sure what to make of it overall: https://people.wikimedia.org/~ebernhardson/mjolnir_training_history_20260317.html [17:16:03] this is because of the query clicks logs missing? [17:51:42] errand [18:18:58] yes when the size of the training data declined the ndcg@10 went down until it stopped working because some wikis ran out of data entirely [18:39:39] Trey314159: I shared with you some early numbers, commented on the cells about things I find very odd, let me know if you spot other oddities [18:41:59] dinner [18:59:22] dcausse: thanks! looks interesting!