[10:49:27] lunch [13:25:46] \o [13:27:36] o/ [13:27:43] i'll still look into it..but this kind of rhetoric really makes me not want to engage: some parties at English Wikipedia seem to have made the decision to exclude "United States" as a search result. [13:30:36] search endpoint is oddly missing the fuzzy queries: https://en.wikipedia.org/w/api.php?action=opensearch&search=un&cirrusDumpQuery [13:36:07] ebernhardson: Huh, how can they decide to exclude them? Do we filter search results in such way? [13:36:49] pfischer: the user is implying we specifically decided to not return united states in search results, presumably implying we are putting politics ahead of maintaining wiki appropriately [13:37:20] at least, that's my take. maybe not as generous as it could be :P [13:40:55] pfischer: oh as for how, we don't really have a mechanism in cirrus that would do that. comp suggest scores are just math, the page title only comes it at the end when discounting results based on the query analysis chain and fuzziness [13:42:33] i guess that still doesn't consider the title, but i mean the earlier math doesn't even have the title [13:42:34] * inflatador is working why we have a lag alert for WDQS when there's only a single host lagging and it's ~11m behind [13:44:16] ryankemper looks like wdqs1022 is still depooled? Is that from the data reload? [13:46:20] oh well, depool-restart BG-repool did the trick. It might be worth auto-remediation for those [13:59:08] oh wow i'm forgetful...the reason the query looks funny to me is because we have `min_query_len` portions of the profile that control which queries get issued, and for a query with length of 2 (like `us`) most of them are disabled [14:06:26] hello hello! I have a patch that I would like someone from the search team to take a quick look. we're trying to make stemming work for simple item/property search and came up with this: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseCirrusSearch/+/1164448 [14:06:38] David said this is the place to cry for help while he and Guillaume are out :) [14:55:24] Looks like our WDQS alerts aren't set up to handle getting https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/QueryServlet.java#L655 error response from BG [14:56:40] I assume that's accompanied by a 5xx, let me check logstash [15:07:37] * ryankemper fights his daily battle with okta [15:08:43] based on the nginx logs on wdqs1013, it looks like we're returning a 403 in response to our typical monitoring queries? Hmm [15:13:11] * ebernhardson also apparently has to re-log okta ~8am on tuesdays... [15:14:00] maybe we just need to use the same health check as pybal, it seems to be getting a 503 when it's supposed to [15:15:35] yeah, that looks like the way to go: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/query_service/templates/nginx.erb#181 [15:31:41] hmm, it turns out the problem with united states in the completion suggester is that united states has lost it's popularity_score field [15:32:06] i don't know how that happens :S We upsert so it shouldn't really be possible [15:33:00] wow [15:34:04] Database ninjas. That's the only logical explanation. [15:34:53] one possibility is if the page was deleted/moved, and then restored/moved back. Those would result in a delete and a re-index, which loses things like popularity_score, but not finding evidence of that [15:35:49] i wonder how many other things might have lost that field though...should be curious :P [15:41:16] incoming links is also empty...gotta be a delete/reindex somehow [15:41:26] * ebernhardson scoures kafka logs... [15:41:37] s/logs/events/ [16:03:45] no related delete in the SUP update_pipeline.update.v1 events :S [16:54:07] I merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1165562 which touched off some alerts, looks like it doesn't like the regex. I've acked the alerts and will fix once I get back ~40m [17:37:34] * ebernhardson notices the `popularity_score_weekly` dag runs daily...so much for naming :P [17:38:09] oh no i'm just a failure at reading, it does run weekly :P [17:47:51] it's a rolling weekly popularity score that updates daily ;P ? [17:48:12] also back, the wdqs alerts are gone and I'm not seeing any evidence of probes failing on the prom servers [17:48:18] no, it's just that we tagged the dag with 'daily', but the actual schedule is still weekly [17:48:21] it's just the wrong tag :) [17:50:35] lies, damned lies, comments, and tags ;P [18:09:26] incoming_links turns out to be more annoying to validate...for popularity_score i can check the files we ship to swift, we ship popularity every time. But incoming_links we check and if the count in the dump is the same as the count we make from the dump, and if same we skip the update. [18:09:57] so maybe the reason there is no incoming_links count update for `United States` is that the incoming_link count didn't change over the week...but takes a bit to know for sure :s [18:10:41] * ebernhardson basically has to re-run the same weekly job, but in a notebook and inspect intermediates... [18:14:20] separately i wonder while looking at this if we are missing incoming links via redirects, but for another day :P [18:36:28] on the other hand, saneitizer looks the happiest it's been in awhile. only ~10k docs fixed in last 2 weeks vs 30k mid may and 500k in april (due to wikisource content namespace issue) [19:39:03] back [19:40:43] i just don't know how this page lost the popularity_score and incoming_links fields...The only two explanations i have are that the page was deleted and recreated, or we did a direct 'index' instead of through super_detect_noop, but I can't find anything that reasonably suggests either actually happened [19:44:51] the closest thing i have to a hint is that the _version (incremented each time elasticsearch updates a doc in same index) is 63 in eqiad and 78 in codfw, which seems low (France is 279, UK is 475) [19:46:11] a properly busy page like Main_Page has a _version of 6421, that's probably the count since oct 2024 when the index was created [20:30:37] and the answer is...https://en.wikipedia.org/w/index.php?title=United_States&oldid=1296923565 [20:31:02] the page was turned into a redirect, deleted from the search index, then the redirect was removed moments later but we had already decided to delete it (and then re-create [21:05:07] ryankemper we're in pairing if you wanna join [21:05:28] inflatador: 2 mins