[11:29:48] Hi folks - what's the difference between the first 2 options here https://www.mediawiki.org/wiki/Special:Preferences#mw-prefsection-searchoptions ? [11:29:59] Default versus subphrase matching? [12:53:22] o/ [13:17:51] @cormacparcle there's more info here: https://www.mediawiki.org/wiki/Extension:CirrusSearch/CompletionSuggester [13:19:44] so the difference between default and subphrase: subphrase resolves redirects, default doesnt. subphrase matches subphrases in titles (e.g. if your query is part of a title, it will match your query to the title. [13:20:36] if you type in 'during the russian invasion of Ukraine', subphrase search will also find things like 'Timeline of the...' 'Economic impact of the....' 'List of deaths during the...' for that query [13:32:49] \o [13:34:01] note that subphrase matching is only avaliable on some wikis, particularly wikisource mw.org (and maybe wikitech?) [13:34:27] 👍 [13:34:34] how come it's not available everywhere? [13:34:50] it's very expensive, we essentially have to index all the possible subphrases as separate things [13:36:13] maybe we could re-evaluate, we made that decision when we had less memory availability. But the data structures that support autocomplete are strictly in-memory, and before our estimate was we didn't have enough memory to turn that on everywhere [13:36:29] kk cool [14:02:27] * ebernhardson wonders if mw.hook in javascript is as tedious as php side hooks [14:41:10] Trey314159: have we ever thought about using stemmed analysis for autocomplete? re: https://meta.wikimedia.org/wiki/Community_Wishlist/Wishes/Better_add-new-wikilink_searches [14:41:31] the short of it is, our `text` analysis chain would find the result they want, but we only use plain and plain+stop [15:27:43] ebernhardson: it's an interesting question. You could get a lot of non-exact matches competing with exact matches. You might also have false positives where a partial word looks like an inflected word and gets "stemmed". (English examples are hard, but if you type "nighting"—working toward "nightingale"—you'd get matches for "night" and "nights", like "nightclub" and "nightstar".) [15:27:50] It would also allow more than 2 letters "typos" in some cases.. in English, almost any final "s" could be ignored. For Chinese it could get weird, since adding or removing a character at the end of the query could change how things are parsed earlier in the query, making apparent prefix matches work or not work. [15:27:51] For languages (or analysis chains) that aren't all suffix-based, the results could be more unexpected. Polish removes some prefixes, so searching for "antimat"(ter) would match "mat"(hematics). I definitely wouldn't use it for the Go box completion suggester without a lot of testing. For adding wikilinks it might be more reasonable since it wouldn't be matching as you type. [15:29:26] adding wikilinks (via VE) does actually search as you type, using generator=prefixsearch [15:29:49] alrighty then.. [15:30:11] I have an errand I have to run; I'll be back in 45 minutes or less, I think. [15:30:22] kk [16:04:39] quick break, back in ~20 [16:15:58] back [16:45:05] sorry, ben back [17:12:15] ebernhardson I just restarted the CODFW cirrus hosts. Based on the curl you put in the task it looks like we're good, but LMK if not [17:16:59] inflatador: looks good! thakns [17:44:18] np, sorry I missed it yesterday [17:44:24] Lunch, back in ~40 [18:40:27] back [18:49:46] ryankemper ebernhardson I have to miss pairing today, dentist apt [18:50:03] ack [18:50:12] kk [19:20:26] looks like the OpenSearch k8s operator has built-in support for rolling operations. Should be fun to test out https://github.com/opensearch-project/opensearch-k8s-operator/blob/main/docs/designs/upgrade.md [19:28:26] curious, never seen a container build fail with "unknown: blob upload invalid" : https://gitlab.wikimedia.org/repos/search-platform/cirrussearch-opensearch-image/-/jobs/558531 [19:28:39] * ebernhardson hopes turn it off and back on is the appropriate solution :P [19:31:29] nope :S [19:39:03] poking around in logstash, found the error. No clue what to do with it though... err.detail="blob invalid length" err.message="blob upload invalid" [19:55:42] I wanna say that has something with docker-registry garbage collection [19:56:51] and yeah, probably just try again [19:57:05] tried a few times, sadly hasn't helped :( [21:27:51] ebernhardson: just emerged from JAR hell and apparently the discolytics script does what it’s supposed to (write to a kafka-test topic). I’ll remove the `.limit(10)` safety leash and then that part should be good to go. For airflow-dags I have to release a new version of event-utilities-spark first. [21:41:58] pfischer: sounds fun :) awesome that it's working now [21:45:45] ebernhardson: yeah, it’s satisfying but very much annoying in hindsight… all that seems fragile and overly complex [21:47:09] * pfischer will dream of single-language-containerized applications [21:49:45] lol, i suppose it can be