[09:02:40] errand [13:03:52] o/ [13:32:26] \o [13:33:12] o/ [14:26:39] reindexing .ltrstore into .ltrstore_ltr-temp-2025-05-26 for codfw (9243), just wanted to see how much time it takes [14:27:10] definitely not immediate :/ [14:35:30] :S was hoping it would be quick [14:35:34] maybe we prune some old models? [14:35:40] (not this second, but in general) [14:42:10] ebernhardson: we're in https://meet.google.com/aco-odmr-cpw?authuser=0 if you want to/can join [14:42:30] .ltr is 2.8Gb so definitely not small [15:04:09] inflatador: the .ltr index should be ready, you can switch traffic back to multi-DC (or codfw only if you want me to upgrade indices in eqiad) [15:05:57] back, timing is hard as that's when i do a school run. School is out for summer starting next wed though, so should have more flexibility [15:07:54] np! [15:08:23] checked https://search.svc.codfw.wmnet:9243/_ltr/_model/enwiki-20241122-20180215-query_explorer and it's working OK [15:08:33] dcausse OK, I have repooled CODFW. I can depool EQIAD now if you are ready to update the indices now [15:08:53] small annoyance is that the index is marked as "hidden": true, not clear how that'll interfere with some various scripts we have [15:09:31] inflatador: thanks, checking that everything works ok, will ping you to depool eqiad [15:17:53] copying data to .ltrstore_ltr-temp-2025-05-26 in eqiad [15:25:14] hm.. reindex failed with "ReleasableBytesStreamOutput cannot hold more than 2GB of data"... probably need to tune some batch size or the like [15:40:25] will see what comes out, since ya'll probably also got the claude emails i'm asking it to review my history and try and assemble some info about where it's worked, where it might have struggled, etc. [15:40:42] guessing we might want to talk about it a bit at wed-meeting [15:42:09] just tried it on barrybot.py to make it loop over a list of images, still reviewing what it wrote [15:44:34] inflatador: I should be ready, feel free to depool eqiad whenever you want [15:46:22] https://code.claude.com/docs/en/common-workflows is a decent starter on usage, in particular the bits about using plan mode and iterating the plan before letting it go work if the task has any complexity [15:46:32] dcausse OK, I just depooled eqiad [15:46:42] thanks! [15:54:20] ok doing the cleanup, search thread pool almost flat near 0 in eqiad [15:55:43] ebernhardson: yeah, some pointers on claude would be useful. Did you use Kosta's wmf-claude or set up your own? [15:56:24] Trey314159: using claude code directly, not sandboxed. But i run it in the mode where it constantly asks me about tool invocations, mostly only auto-allowing editing files within the current base dir (where i can easily see changes via git diff) [15:56:59] ryankemper ^^ Just FYI, eqiad is depooled ATM [15:57:00] the problem with sandboxing is i need random java versions, it needs to jump into mediawiki containers to run unit tests, etc. [15:57:24] used wmf-claude but I need a better way to show diffs [16:00:14] using wmf-claude I had to signup with plain caude before running the sandboxed wrapper, because obviously it could not interact with my browser for the signup [16:06:39] inflatador: all done from my side, feel free to repool eqiad [16:07:27] old index list is down to only non-cirrus indices (ttmserver & toolhub) [16:07:34] nice! [16:07:51] dcausse ACK, just repooled eqiad [16:08:28] perhaps unsurprisingly, claudes summary of usage is "Most successful sessions are characterized by clear problem definition, iterative refinement, and comprehensive testing strategies." [16:11:39] :) [16:13:29] inflatador: thanks! [16:13:58] this is what it came up with, seems generally accurate: https://phabricator.wikimedia.org/P93048 [16:16:19] it's very impressed with itself for helping with the trigram algorithm, but i would probably score it a bit lower than it scored itself [16:19:18] the places it really shined with trigram was the property based testing, and writing the regex generator, the algorithmic port from golang was helpful but i had to go in and rework a variety of parts (and some i should have reviewed more carefully) [16:19:44] "It wasn't wrong in a dumb way; it was wrong in ways that required domain knowledge to catch", wondering what domain knowledge it required here [16:20:42] I thought that this task was relatively self-contained [16:20:59] FYI I'm going to punt on Relforge for now. Going to reimage it back to its traditional baremetal config, but running OpenSearch 2/Trixie [16:21:26] inflatador: sounds good [16:21:52] sec lemme find that MAX_SET code, i remember it wrote the test and when i looked at it it was missing the main purpose of the MAX_SET handling. One could argue i should have better defined what i wanted beyond "write test cases that validate MAX_SET works as expecteD" [16:26:50] so the part it missed is that simplifySet() has functionality to reduce set size by shortening strings, but the only test it wrote was one that resulted in the expression `true` [16:27:57] basically assertEquals(True.instance(), extractBigrams("[a-u]z")) [16:28:52] when the test we wanted was more like `assertEquals(..., extractTrigrams("([a-u]a)bc"))` which demonstrates it can move prefix/suffix between AST constructs [16:29:39] or similarly with ([a-j]a|[a-k]b)cd [17:32:29] Do our Elastic exporters capture anything related to cluster state? I'm guessing not, but I was wondering if we could see banned nodes [17:36:59] inflatador: hmm, no i don't think so [17:37:10] it's either from _stats or {index}/_stats iirc [18:42:28] ebernhardson ACK, I'm thinking of a way to monitor for banned hosts. Extremely low priority, but we might be able to do it indirectly with shard counts or something [18:44:46] Come to think of it, we might already have a monitor like that [18:47:37] cirrussearch reboots are done. It looks like we have a few hosts (apifeatureusage and search-loader) left to do. Should probably get those off Bullseye too, will make a ticket [19:29:17] hmm, of the sessions with > 1 identity, they have a mean of 13 identities with 4.2 ip's, 13.2 x-forwarded-for, and 3.5 user-agents :S [19:30:13] It seems like x-ff is probably the main driver, but this suggests to me that our method of using those to build a "unique" token and then select ab test buckets is falling apart in the modern environment [19:30:24] which further pushes the idea we should integrate test-kitchen [19:31:05] for the top 20 sessions by number of identities, they all have one x-ff per identity