[08:40:17] errand [10:15:12] lunch [12:36:36] \o [12:39:21] o/ [13:00:42] .o/ [13:01:23] o\ [13:21:09] o/ [13:28:30] not sure it's worth fully digging through yet, but i've been trying a series of skills from matt pocock on bulding development plans with claude, this is what i've come up with so far on first-class redirects: https://phabricator.wikimedia.org/P93401 [13:28:33] might be curious to peruse [13:30:09] mostly it involves making claude ask you a bunch of clarifying questions (dozens) and summarizing the responses [13:37:52] nice, haven't read though it all yet but what I saw so far seems reasonable [13:54:22] huh, i got an email `You are invited to join Wikimedia in Miro`. Followed link, can see a board but it says i'm logged out. If i tell it to login as ebernhardson@wikimedia.org okta says i'm not assigned to miro [14:08:19] hm... looking into the updater to see how we handle redirects whose targets change from one page to another but not seeing anything explicit about that [14:09:09] context is T421237 where namespace_id is enforcing a positive ns but some events are failing validation because you can have a redirect to a special page [14:09:10] T421237: `mediawiki.page_change.v1`: two schema validation errors causing events to be silently dropped by EventGate - https://phabricator.wikimedia.org/T421237 [14:10:48] checked the code and we do not ignore negative ns explicitly and thus the behavior of the updater might be unexpected if we start seeing special pages in the redirect targets [14:17:11] hm... mediawiki seems unable to tell what was the previous state of the redirect target when emitting page-change [14:17:57] hmm [14:18:56] i suspect we should allow the events with -1, but filter in the updater. [14:19:25] although, i could see arguments for indexing the new first-class redirects with -1...will have to ponder [14:20:49] indeed I see no reasons not to have such redirects indexed, in our current model we should definitely ignore them because the target page can't be indexed [14:26:22] ebernhardson: regarding https://phabricator.wikimedia.org/P93401: did you use claude *code* (with access to the sources, if so which ones?) or was it just the chat with the md files for context? [14:26:40] pfischer: it also had access to CirrusSeach and streaming-updater sources [14:26:56] was part of how it came up with questions, but also following up on things i told it [14:27:28] it was using the /grill-me, /grill-with-docs, and /to-prd skills at https://github.com/mattpocock/skills [14:27:54] ebernhardson: thanks! [14:43:17] BTW, I spoke with Martin from Research about query-routing experiments for the next semantic search phase. Research is interested and would be happy to support. I also mentioned dcausse's idea in T427519 (using LLM-generated labels to compare lexical vs. semantic results and potentially train a routing model), which resonated well. [14:43:18] T427519: Compare lexical and semantic search results with LLM ranker/judge - https://phabricator.wikimedia.org/T427519 [14:43:39] ow would you feel about Search taking the lead on this line of work, with Research collaborating and supporting? It strikes me as a fairly search-centric problem and potentially a good opportunity for us to drive a visible piece of the semantic search roadmap. [14:44:09] +1 [14:45:11] yea, sounds good to me [14:48:43] sounds good [14:49:31] Thanks! [14:55:57] * ebernhardson wonders if subclassed SearchContext's, like FullTextSearchContext, would make things less messy...or more. but seems tedious to even start working through [14:56:32] the general problem is we have to pass around if the keyword is used, that probably goes in SearchContext somewhere, but that is only relevant in full-text and not everywhere [14:57:27] currently was going to key off the syntaxUsed array, but not fully traced if the order of operations works for everything [15:07:58] ebernhardson: you mean a keyword like withredirect? [15:08:17] dcausse: yea, because the keyword is a state change that effects other things (intitle shouldn't hit redirect.title) [15:08:30] I initially planned to drop SearchContext but not sure that's realistic [15:08:39] naively, `withredirects:true intitle:foo` could result in different queries if the order was flipped [15:08:46] and i don't want that :) [15:08:52] for global state like that I thought that SearchQuery would be a good fit [15:09:09] but it might not be available everywhere [15:09:09] yea i remember, it's a bit of a god object that holds everything that needs to be used somewhere far away [15:09:53] if SearchQuery has the info I have no issue copying that bit to SearchContext waiting for a better place to have it [15:10:05] seems reasonable, i'll see if thats possible [15:10:41] SearchQuery should deal with 'namespace:' prefixes so hopefully it should be doable [15:10:58] although I'm not totally sure that the namespace prefix thing is a keyword [15:11:37] yea, PrefixFeature is a SimpleKeywordFeature, it adjusts the SearchContext [15:11:39] but for simplicity I think it's preferable if 'withredirect:' if a global keyword (not usable anywhere else) [15:12:04] you mean it has to be the firstone basically? i suppose that would make it a little simpler [15:12:39] because I don't want to think about the possible mess once we support parenthesis, e.g. (intitle:foo AND withredirects:true) OR intitle:bar [15:13:04] lol, yea fair. I mean i could imagine some way to handle that, but it's complexity that's totally unnecessary [15:13:14] would need per-level contexts or some such [15:13:32] yes... [15:15:22] it could even be a keyword without values, the true in 'withredirects:true' is kind of already implied, I think queries like 'withredirects:intitle:foo insource:/bar/' be explicit enough [15:16:37] there could be a 'isredirect:(true|false)' keyword usable in this mode but I'm not sure that's necessary [15:16:39] hmm, yea i suppose that's reasonable. I was pondering that there might be more than a true/false in the future. withredirects:true vs withredirects:orphan or some such, but hadn't really pinned down if anything was necessary [15:18:54] but requireing it strictly as a prefix to the query keeps the whole thing simpler [15:22:43] yes, at least for me, I think it makes the reasoning a bit simpler to think this in two different search modes [15:33:20] ebernhardson: if you have a couple min (https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/202), should be relatively straightforward [15:35:02] sure, looking [15:38:29] yea super simple, looks great [15:39:23] looks like the project fell out of sonarqube, kinda annoying [15:43:26] yes, it's been a while, no clue how to force that, I guess you need to be admin there? [15:53:42] yea i've never done it, i think geh.el always did [16:02:30] i wonder if there is some way to make gitlab auto-build it once a month just to keep sonarqube happy [16:04:48] yea, there is a build -> pipeline schedules. Going to set it up to run at least once mid month, might need to adjust schedule later. [16:05:39] thanks! [16:32:09] heading out, have a nice week-end [16:59:09] huh, just noticed claude has a `/team-onboarding` that probably does what i asked it to do, but with prompts that have likely been iterated on. [17:00:51] .o/ [23:52:50] heading out. I still have to ponder what happens with the all field. It's a bit awkward that withredirects:foo will match the all field in a page where the only place foo appears is redirect.title, even though we wont be querying redirect.title [23:53:23] but i maybe we just allow it and expect it to have a very low score since all the should's wont match