[07:56:19] o/ebernhardson: I created an MCP for wikimedia code search [07:56:50] https://gitlab.wikimedia.org/pfischer/codesearch-mcp - you can use that to check usage for config props too [08:36:18] dcausse: hi! [08:36:27] good news, reindexing has finished [08:36:43] atsukoito: hey! [08:36:46] nice! [08:36:58] not so good news: some documents are missing [08:37:10] hm... a lot? [08:37:42] second is the k8s https://www.irccloud.com/pastebin/TT5UbVSJ/ [08:38:41] oh these are delete docs, you need to run a endpoint:port/ttmserver/_count to get a precise count [08:38:41] i'm pulling the logs now [08:40:57] dcausse: 3rd and last number are still the same in the `_count`, it feels like one database was missing or something [08:41:23] (also, last time i did indexing, the sizes were matching) [08:42:40] you're right... thought that the _cat/indices would return live docs + deleted docs, my bad [08:43:02] yes hopefully it's a failed db when running with mwscript-k8s on the dblist? [08:45:50] weird that both eqiad & codfw have the roughly the same counts in eqiad & codfw (dse-k8s) [08:46:39] I mean if the cause is a random failure from a maint script on a random db... [08:48:20] or maybe i did wrong dblist [08:54:56] disk usage is low, could happen that if you reach 90% disk usage indices will automatically start blocking writes but that's not the case here [08:56:12] trying to run an agg to get the count per wiki [09:01:43] i checked from the log, it seems like all the database were indexed [09:02:46] could it be we removed a database from the list? [09:04:31] i think we can still proceed with switchover, and then fix the missing documents [09:04:42] (thanks for checking the agg) [09:11:20] here is the dump from logs https://phabricator.wikimedia.org/P94360 [09:12:18] hmm... actually dse-k8s have more wikis... the agg makes no sense so far [09:13:47] atsukoito: in case you spot something obvious: outputs are in deployment.eqiad.wmnet:~dcausse/[dse|prod]-eqiad.json [09:14:22] checking.. [09:16:02] dcausse: i accidentally destroyed prod-eqiad.json file, could you please re-generate it? [09:16:11] (i copied dse to prod) [09:16:11] sure [09:17:19] atsukoito: done [09:17:32] thanks! [09:18:31] counts are generally lower in dse-k8s but can't spot a "big" single gap [09:19:08] `collabwiki` is missing, but i was removing it from the index last time, too [09:21:59] metawiki&mediawiki are the big ones, ~400k and 200k diff between the two clusters [09:22:35] weird... [09:24:03] could be stale data in the prod indices? [09:24:18] but no since we had roughly same counts last time... [09:25:19] i checked the last attempt and it was kind of the same [09:25:20] https://phabricator.wikimedia.org/T425377#12005913 [09:25:36] so we are probably good to go (good thing i added it to the task) [09:27:32] oh, good to know [09:27:42] so possibly some stale data in prod [09:39:04] going to upgrade the sup to flink2 [09:39:09] uploaded numbers for posterity https://phabricator.wikimedia.org/T425377#12044283 [09:39:25] gonna make a switchover diff and put it for todays backport [09:39:35] atsukoito: sounds good [10:26:26] dcausse: could you please look are the difference on this https://meta.wikimedia.org/w/index.php?title=Special%3ASearchTranslations&query=hello+world between prod and `k8s-mw-experimental-eqiad` mwdebug setting [10:26:43] looking [10:27:22] seems like alright, just new translations from wikimania added [10:27:47] same with https://www.wikifunctions.org/w/index.php?title=Special%3ATranslate&showMessage=Wikifunctions%3AStatus_updates%2F2026-05-15%2F1&group=page-Wikifunctions%3AStatus+updates%2F2026-05-15&language=de&filter=&optional=1&action=translate [10:27:52] seems like alright [10:28:00] so we can finally switchover [10:28:47] atsukoito: yes I think so... since the docs&wikis are different this page is likely to return different sets for most queries [10:29:24] cool! i wanted to double check the config change before [10:29:58] * atsukoito reverted mw-experimental [10:31:32] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1305062 [10:32:26] +1 [12:54:09] \o [12:54:54] o/ [12:56:07] deployed flink2 everywhere, looks to be running fine [12:56:14] nice! [12:59:51] trying to understand recent flakyness in cindy, ForceSearchIndex might not run an explicit refresh and creating the completion suggester index right after it might possibly miss docs? [13:00:42] at least on https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1304901 it seems all completion suggester related [13:02:04] perhaps throwing a quick call to refresh if ForceSearchIndex runs unline would not hurt [13:02:11] s/unline/inline [13:02:36] hmm, seems plausible i suppose [13:08:51] ebernhardson: hii! [13:08:58] atsukoito: mornin :) [13:09:26] dcausse: i'm in the end of the deployment queue, will ping you if something will go wrong [13:10:02] atsukoito: sure, I'll be around [13:29:54] next up is "Converting a redirect back into an ordinary page makes it findable with page type primary", but this one is pretty new, looking [13:31:17] hmm, perhaps waitForOperation is unsufficient there? We should essentially be waiting for an update to apply there [13:41:07] very possible, esp. if the operation is susceptibe to index refresh times [13:42:15] hm.. seems to default to 1s for cindy so perhaps not? [13:48:05] hmm, yea 1s should be more than long enough :S [14:00:28] dcausse: i started merging, will ping if something will break [14:00:43] atsukoito: sounds good [14:02:53] hm wondering if there's something in the output that tells me exactly what line breaks... there are many "I wait for" in the same scenario [14:05:26] dcausse: all is working fine (i tested it on experimental anyways), i proceed with merge [14:05:37] dcausse: congrats, it is done [14:05:38] ah possibly a ranking issue? searching for "RedirDocGamma1782220748645" RedirDocGamma1782220748645 returns RedirDocTarget first... [14:06:23] atsukoito: awewome, thanks for all the work on this!! if nobody screams I'll drop the indices from our cirrus cluster and unblock the opensearch2 migration [14:07:26] ah but RedirDocTarget should not match RedirDocGamma1782220748645, both RedirDocTarget & RedirDocGamma1782220748645 are returned... [14:07:36] hmm [14:07:49] could be a refresh problem tho [14:08:07] because we wait for using the doc API if I'm not mistaken [14:08:39] so possibly the redirect is promoted but the old target is still in an old state [14:10:28] still trying to think through this one, one possibility is the wait for api doesn't really understand redirect-first mode. I believe it will still wait on the target page but not the redirect itself [14:10:44] but not sure that's relevant here, having a hard time thinking through it exactly [14:12:50] yes it does not wait for anything regarding RedirDocTarget to make sure its redirect array is empty [14:13:51] oh right, it's also in the other direction clearing from the redirect array [14:13:58] we could relax the assertion from "is the first" to "is in" but that does not test that RedirectDocTarget is not found searching from a removed redirect [14:16:04] hm.. but even "is the first" is possibly not strong enough, RedirDocTarget could still be second [14:16:42] yea we might want "only" hit (or just check the totalhits) [14:17:34] or we add another doc check helper like "RedirDocGamma1782220748645" is not in the redirect array of RedirectTarget, possibly not using the search api at all [14:18:11] hmm yea that makes sense and is a useful testable property [14:18:26] ack, will try [15:16:24] I did a pass on https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1304902 and found few issues but it's probable that I missed some others [15:16:53] "issues" I mean not really issues but some inconsistencies [15:22:29] doh, yea looks like i missed a couple things [15:29:50] heh, because CirrusSearchServers isn't in extension.json and i built the sed commands from that [15:30:39] yes I feel like extension.json still has a couple issues :/ [15:31:30] in a super quick review claude found 44 CirrusSearch* strings that aren't in CirrusConfigNames [15:31:37] will have to review and ponder [15:33:02] yes some are annoying because we can declare a config override directly in a profile setting like 'CirrusSearchOverrideWindow' in provideRescoreProfilesWithWindowSize [15:34:03] it was mainly useful in relforge to quickly tune this value without copying the profiles... [15:34:22] but that makes the cirrus config even harder to comprehend :( [15:34:34] hmm, seems reasonable to define them as constants but leave out of extension.json. At least then we would have a central document of all things [15:34:36] ? [15:36:36] sure, for CirrusSearchOverrideWindow tho, it's used in test fixtures, actual profile has CirrusSearchPhraseRescoreWindowSize, CirrusSearchPhraseRescoreBoost and CirrusSearchFunctionRescoreWindowSize [15:54:54] claude is often finding small things i just never notice when looking at code: Pre-existing typo (untouched, out of scope): RescoreFunctionChains.config.php:114,164 have 'uri_param_override' => 'cirrusIncLinkssW' (double-s) [15:55:26] i suppose i just read the whole word as a symbol and don't look at the letters [15:55:30] sigh... [15:55:35] same [16:27:19] errand