[11:00:17] lunch [13:10:04] \o [13:14:54] hmm, guess something broke with drop_old_data, will look into it [13:17:39] o/ [13:20:01] ebernhardson1: had a look and some partitions were out of range, I cleaned them manually and it went back to green [13:20:25] dcausse: oh! i'm surprised out of range causes errors, it's whole point it to drop out of range? [13:20:41] no clue why that happened tho but the default allowed-interval did prevent some [13:22:38] oh there's a big gap between scheduled__2026-03-09T00 & scheduled__2026-03-18T00 [13:23:03] looks like the airflow scheduler kind of skipped this dag for 9 days :/ [13:23:07] dcausse: yea i paused that dag so it wouldn't drop data while backfilling and forgot to unpause for a week :( [13:23:13] ah ok [13:23:27] makes total sense then :) [13:24:39] the allowed-interval is nice to avoid accidental mistakes on backfills, I guess it did its job [13:29:52] yea i suppose [14:25:23] wondering a search keyword to vary the sort options could help as a stop-gap to the fact you can't manually pick up those from Special:Search without AdvancedSearch [14:26:25] indeed choosing sorting from Special:Search is tedious, i just type them in. I guess maybe? Keywords that update meta-configuration seem a bit awkward, but we also already have them [14:27:58] true... not a fan of these keywords that mess up with the global context from the ui, you always have to ponder which one wins (ui context vs syntax)... [14:30:51] do we even have a defined precedence? I suppose i would think, from least to most, perhaps global config -> url args -> keyword? [14:32:32] the namespace prefix would be one of them I guess: https://en.wikipedia.org/w/index.php?search=help%3Atest&title=Special%3ASearch&profile=advanced&fulltext=1&ns8=1 [14:33:01] you select Mediawiki in the UI but search "help:test" you end up searching the Help ns [14:33:12] yea i suppose that's a decent example [14:33:37] prefix is roughly similar, it bypasses the ns selector [14:38:44] school run [14:53:57] ebernhardson: quick fix when you have a minute https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/2118 [15:00:44] +2 [15:02:12] thanks! [15:05:15] are we doing search stand up? I was in a maintenance and just joined [15:05:38] inflatador: still daylight confusion time, the triage/planning meeting is on eu time (so, in 1 hr) [15:06:42] ebernhardson ah, good catch. I was looking on the Meet page instead of the calendar [15:24:24] I think we need to swith semantic search to codfw? (T414484) [15:24:24] T414484: Upgrade DSE clusters to kubernetes 1.31 - https://phabricator.wikimedia.org/T414484 [15:25:04] hmm, we should be able to but i haven't configured he codfw cluster in prod, will put that together [17:07:46] now it seems to be eqiad failing for the opensearch semantic search cluster [17:09:16] inflatador: ^ [17:11:26] a bit random tho, some requests still pass [17:21:22] i wonder if this kind of thing (if cleaned up more) would help the insource wish: https://people.wikimedia.org/~ebernhardson/insource-search.html [17:26:22] could be? at least nice to get your regex started [17:49:29] hmm, wonder what i did differently..in opensearch mandatory plugins most are analysis-icu, opensearch-extra, etc. But then esperanto and serbian from the other plugins have their names (with spaces) [17:49:44] it's the separate repo, so something different in pom's i imagine [17:52:27] :/ [18:27:59] dcausse inflatador :eyes [18:28:21] thanks! [18:29:32] cc ebernhardson , didn't mean to ping myself . Have y'all done anything yet re: semantic-search failures? I'll start looking now [18:32:03] not really, just seeing CERTIFICATE_VERIFY_FAILED on curl requests for both sites [18:34:06] My best guess is that we'll need `k8s-ingress-dse-aa.discovery.wmnet` certificates on ingress, ref https://gerrit.wikimedia.org/r/c/operations/dns/+/1250063/5/templates/wmnet#1035 [18:34:35] seems like a 503 by "server: istio-envoy" [18:34:45] yeah, that must be it, iPoid is not affected because it doesn't have istio in front [18:35:27] err....envoy that is [18:39:49] dinner [18:52:44] ah, nope, not related to the active/active stuff at all [18:52:55] it's because we hid the pod-level certificates behind envoy and they expired [18:53:21] ref T419289 [18:53:21] T419289: Ensure OpenSearch on k8s clusters can safely use envoy TLS termination - https://phabricator.wikimedia.org/T419289 [18:53:33] Fixing, will update shortly [19:03:12] I didn't know that we had enabled the tls termination on the ingressgateway for this cluster yet. [19:31:56] btullis: it was actually done for a secondary reason, i mean yes we need tls for prod services, but opensearch also validates the x-search-id header and the one mediawiki sends doesn't match their expectation (and there is no rfc, so their expectation is arbitrary) [19:50:20] and envoy has a place to strip that [20:34:18] semantic search should be back for everyone now