[10:16:28] lunch [14:01:50] \o [14:26:45] o/ [15:05:56] dcausse would you like me to take T406656 and do reloads on the categories graph, or would you prefer to keep troubleshooting? [15:05:56] T406656: Reimage failed after prompt...is prompt needed? - https://phabricator.wikimedia.org/T406656 [15:07:05] surprising to me...take the last 100k events from update_pipeline.update.v1, `grep -v TAGS_UPDATE`, how many should be left? I have 2 [15:07:06] inflatador: o/ I'm done troubleshooting, if you could fix the puppet discrepancies and do a reload this would be great :) [15:08:10] ebernhardson: this is totally unexpected? did you scan the codfw topic? [15:08:41] eqiad topic should be almost empty [15:08:48] dcausse: oh!! eqiad has 2, codfw has 99,390. [15:08:51] hmm [15:09:22] oldest event in eqiad is 2025-10-17T00:14 [15:09:36] sorry, newest. [15:09:39] well, both [15:10:07] hmm, i don't know this is related to the existing weighted_tags problem...but maybe another problem :P I guess some batch update shipped through eqiad? [15:10:09] mw primary db is in codfw so all writes should happen from mw@codfw [15:10:39] yes, I think we forcibly push weighted_tags for image_rec to kafka-main@eqiad [15:11:03] should happen weekly on fridays [15:11:03] ahh, yea that would make sense [15:11:14] and 17th was friday [15:11:19] should be it [15:13:13] hm.. the truncate filter we use to fix extremely long keyword fields can't be used as a normalize apparently... option would be to use ignore_above but that might throw away some data rather than keeping some of it... [15:13:39] hmm, it ignores the whole thing? [15:13:56] yea..." If a string’s length exceeds the specified threshold, the value is stored with the document but is not indexed." [15:14:25] well I could check what the impact would be [15:14:50] i guess a pattern replace char filter? But stuffing regex everywhere seems meh [15:15:26] actually I need to understand what they use to determine if a filter qualifies for as normalizer or not, I'm not super clear on this [15:16:19] initially I thought they forced you to use char_filters but looks like you can use token_filter as you said, but not all of them apparently :/ [15:19:00] random claim: a normalizer accepts only filters that are instances of either NormalizingTokenFilterFactory or NormalizingCharFilterFactory [15:21:35] sigh... [15:22:43] as far as i can tell...not having an equivalent truncate is perhaps an oversight...if we don't like pattern_replace i suppose a custom normalizing char filter could work, but i dunno if necessary [15:23:35] ah cool IcuFoldingTokenFilterFactory is NormalizingTokenFilterFactory was mainly afraid of this one [15:24:34] I think I can live with a pattern_replace, but could add a quick custom truncate or upstream a change [15:28:36] actually IIRC we have a fix from Trey about the regex highlighter to deploy, might be a good time to add that custom truncate? [15:31:00] yea there is already a .deb update waiting in the wings, we can add to it [15:31:15] i don't think it's made it to apt.wikimedia.org yet, just sitting in the gitlab CI [15:32:24] ok, pushing a patch hopefully won't take long [15:35:14] ah and the "length" as well we use to remove empty tokens will have to be supported [15:35:59] dcausse ACK, have taken the ticket [15:36:07] thanks!