[07:08:33] o/ [07:13:26] o/ [09:20:51] errand + lunch [09:46:07] lunch [13:21:42] o/ [13:53:49] Trey314159: I have a bit too much going on today, can we skip our 1:1 this week? [13:54:20] gehel: sure [13:54:46] thanks! [14:14:21] \o [14:14:25] o/ [14:14:29] .o/ [14:32:33] o/ [15:05:01] ryankemper we're in standup if you wanna join [15:18:17] something's off with the unified highlighter... I can't make the "matched_fields" bit to work the way I expect, i.e. when searching something with a stop word it should rely on the plain field to obtain the offsets but I'm only getting matches from the text field :/ [15:22:05] sigh... was looking at the elastic doc which says the unified highlighter does support matched_fields but opensearch says: "Valid only for the fvh highlighter" :( [15:32:32] :S [15:36:56] elastic refactored a bunch of things between https://github.com/elastic/elasticsearch/tree/v7.10.2/server/src/main/java/org/elasticsearch/search/fetch/subphase/highlight & https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/search/fetch/subphase/highlight/ opensearch stayed almost the same :( [15:38:12] ;( Is that true for OpenSearch 2 as well? [15:38:41] yes... [15:39:45] damn. Does that mean we'll have to build our own highlighters? [15:40:01] we already do :P The hope was to get rid of it [15:40:09] ^ :) [15:51:23] workout/errands, back in ~90 [15:51:32] seeing a 3.0.0-beta1 in opensearch github repo [15:51:47] just in time for us to always be behind :) [15:52:12] :) [16:10:26] dcausse: do you think it matters if we do two passes of rewriting the regex? I have some code mostly finished to add char classes (like \w+) support, but currently i have it mixed into the replaceAnchors() loop. Thinking it might be simpler and more obvious as two separate passes [16:10:48] but it kinda does the same thing... [16:11:26] ebernhardson: I think 2 passes on a string would be barely noticeable in this context? [16:12:20] i'm also not sure yet what to do with something like `[abc\D]`, right now it turns into [abc^0-9] which is wrong, but then again what would [abc\D] even mean? [16:12:45] yea i suppose in the context of the expense about to be incurred, and knowing these regex's have to be reasonably short, should be unnoticable [16:14:32] \D is hard indeed if you transform to the negated \d class... :/ [16:17:47] i was trying to think of an equivalent construction, maybe ([abc]|[^0-9])? But something about that seems off [16:22:25] pc crashed while building opensearch with gradle :/ [16:22:30] lol [16:24:55] well it's been quite frequent these days that my pc freezes on heavy loads, hope it's not about to die... [16:26:11] Hmm, hard to diagnose :( I would randomly guess power supply, but have no clue how to test other than throwing the parts bin at it [16:26:22] could be bad memory [16:33:40] yes... I changed the power supply couple years ago, could be mem I could let it run a check tonight [16:35:20] i guess memtest86 could run overnight to check memory, something like prime95 could probably tax the cpu while doing minimal memory pressure to tax the other side [16:35:30] (not at same time) [16:39:21] back [16:48:30] oops, looks like I have to run errand after all...back in ~1h [17:35:39] * ebernhardson is realizing handling negations, and negated negations, is rather non obvious when expanding character classes :S [17:35:46] that or i don't understand regexp as well as i hope :P [17:48:43] oh my...i just looked up what utf8 digit character classes are...there are a lot of number variants :P [18:36:23] i guess we go with the simplified answer...don't support negated shorthands inside a character class. so [\d\w] is fine, but [\D\w] is both weird and hardto convert [18:55:51] well, that took quite a bit longer than I though [18:55:51] t [18:56:40] I'm not feeling great...going to take the rest of the day off. See ya tomorrow! [19:13:04] ebernhardson: yeah, digits aren't always as simple as "౧, ٢, ꤃"... lol [19:16:20] i think i'm going to ignore those and use the mdn definition of "0-9". It's a bit unsatisfying, but it's better than \d doing nothing [19:16:35] similarly for \w being "A-Za-z0-9_" [19:17:39] * ebernhardson hasn't decided what to do with "[A-\d]"...currently it's "[A-0-9]"...but if it was wrong to start, it can come out wrong too? :P [19:23:34] Apparently what cirrus does today with that is `An error has occurred while searching: Regular expression syntax error at: unknown: unknown" [19:31:39] Those are plenty useful definitions of \d and \w. [A-\d] being an error is fine, too, IMHO. Bare basic character classes are still going to be a big improvement. The 80/20 solution is good! [19:49:38] hopefully :) But tracking down all the edge cases....for example "alpha" matches the regex "\W", because "alpha" is really "\uE000alpha\uE0001" :S [19:50:02] but maybe that doesn't matter [20:05:42] oh my...actually the problem is there is a .toLowerCase() call before expansion... [20:06:17] maybe easy answer is no \D at all, they can just use [^\d] [20:07:30] well, no, because if they send \D they still get \d...i guess i have to reorder [20:42:58] well, i guess good thing i went with char class expansion, because it turns out at a bare minimum anything like [^abc] needs to turn into [^\uE000\uE001abc] with the anchor handling [20:45:30] hmm, there are so many edge cases :S What about "abc" being matched by "bc." because of the trailing anchor? :S