[13:16:08] \o [14:01:31] I posted in phabricator https://phabricator.wikimedia.org/T413319#11860597 but am interested to work at the Milan hackathon on this task for adding a date range filter to MediaSearch on Commons. It requires adding a new field to the Cirrus mapping with the TimeValue, and wonder the feasibility of this? I have patches as a proof of concept [14:02:22] like is adding a field to the mapping difficult? or something we don't want to do? it would be for commons only, based on defining it in WikibaseMediaInfo [14:02:26] aude: at a general level it should be pretty feasible, getting new fields on commonswiki is a little tedious (it's the slowest to reindex) but not a problem. I'm not sure on the right level of generic-ness for the field [14:04:02] we already have some date fields already, so cirrus should have the appropriate support to add them, if not it shouldn't be too complicated [14:04:18] all the other Wikidata-based fields use keyword fields and compact it into a string (e.g. 'p31=Q5') [14:05:05] I see "timestamp" : "2026-04-25T23:41:17Z", [14:05:05] "create_timestamp" : "2026-04-25T22:52:06Z", [14:05:32] hmm, actually the time fields we define currently are defined with static values in MappingConfigBuilder, so you would have to define a new "field" type through the field factory iiuc [14:05:53] i am thinking type=date and format=date_optional_time [14:06:46] tricky part is we might only have a date and not a precise timestamp, but think that still works. but maybe not if we know just the year [14:07:07] aude: should be fine, we use date_optional_time for the revision timestamps i believe [14:07:13] great [14:07:52] for now, I would look at dates with precision of a day or more, but maybe there is a way to expand it to support years and make them be January 1 (out of scope for now) [14:10:55] looks like we do have the bit for datetime's generically, via SearchIndexField::INDEX_TYPE_DATETIME [14:11:06] that does date_optional_time [14:14:43] I did see that "dateOptionalTime" (that I see in MappingConfigBuilder) is deprecated and we should use "date_optional_time" [14:18:51] yea, sadly opensearch doesn't do the deprecation warnings that elasticsearch did, so we have to poke around and find all those things one at a time [14:19:05] in elasticsearch you could run the test suite and review the logs [14:25:08] thanks for the reminder, submitted a patch to snake_case that one [14:44:44] Trey314159, ebernhardson: It’s just the three of us today and I’d like to say a few words on the DPE Staff Meeting. Would you mind starting the Triage ~15’ late? Or do you want to attend the whole Staff Meeting? [14:45:07] i'm open to either, no preference [14:51:58] either is fine [14:58:48] There doesn’t appear to be anything urgent in the backlog, so let’s skip triage [14:59:04] cool [14:59:10] kk [17:37:20] getting `java.lang.IllegalArgumentException: Unknown tokenizer type [nori_tokenizer] for [nori_tok]` on the new cloudelastic1010 , I think it has to do with puppet lookups/merge [17:37:42] we might have to change it for cloudelastic in general like we did for our first trixie host (1012) https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/hosts/cloudelastic1012.yaml [17:42:31] inflatador: that's weird! if the nori plugin isn't available, it shouldn't try to use it. If plugins are intentionally unavailable, reindexing should generate a config that doesn't look for things that aren't there. [17:44:50] hmm, i see analysis-nori in the plugins list [17:45:09] maybe the new version has changed the name? looking [17:47:05] according to the 2.19 opensearch branch, it's still called nori_tokenizer. how weird [17:48:26] inflatador: oh! actually i misspoke, i only see the 1.3.20 nori plugin, the 2.19.5 hosts don't have it. [17:50:51] curiously, i do see the nori plugin on-disk on cloudelastic1010 [18:09:35] i got nothing :P It looks like it should work, nothing in startup logs suggests it failed, but it also never mentions analysis-nori. weird. [18:30:08] ebernhardson actually that's quite helpful. I had to do some hackery to hide the plugins we don't want on OpenSearch 2 hosts, ref https://phabricator.wikimedia.org/T423327#11832002 . I should probably update the code to allowlist all plugins from our own pkg [18:45:39] turns out, labtestwiki was added to deleted.dblist in late 2024, but we still have indexes. Cleaning them up manually (turned up when verifying some mappings, it wasn't updated). [18:46:18] nice [19:03:39] Still getting a warning `[cloudelastic1010-cloudelastic-chi-eqiad] [jawiki_content_1764816632][0] no index mapper found for field: [_type] returning default postings format` ...which plugin am I missing this time? ;( [19:06:01] hmm, no that sounds like us not finishing a deprecation. looking [19:06:50] long long time ago you could have multiple mappings for a single index, differentiated by _type. They got rid of that but it took awhile, we have done single-type for a long time but maybe missed something [19:13:47] Looks like shards are landing on that host now, so probably not urgent. I created some pastes of plugins on 1 vs 2 at https://phabricator.wikimedia.org/P91690 and https://phabricator.wikimedia.org/P91691 if it helps [19:14:02] it might actually be the queries, rather than the indexes. Not quite sure yet [19:17:59] I see it on `kowiki_content` shards too if that helps at all [19:37:55] still not sure...captured some requests flowing into cloudelastic to see if anything jumps out, but i'm only seeing saneitizer in my sample [19:38:50] Also, I'm not sure how I missed it, but it looks like OpenSearch has a terraform provider: https://registry.terraform.io/providers/opensearch-project/opensearch/latest/docs/resources/cluster_settings . Maybe we could use it to manage some of those cluster dynamic settings [19:56:07] from what i can tell this error shouldn't block anything. I wish i could find a reproduction though... [20:07:45] OK, so setting `node.concurrent.recoveries: 40` might have been a little aggressive ;) https://grafana.wikimedia.org/goto/bfkdv0apepjb4f?orgId=1 [20:07:57] I see some suggestions that it may just be noise we have to live with during the transition, and it will go away once we reindex [20:08:58] That's fine, most of the problems were related to me screwing up the list of allowed plugins in hiera [20:10:01] it did remind me though, i should probably double check streaming updater. I'm almost certain it sends "_type": "_doc" in the updates, but maybe we can drop those since that's been the default (and only) value for awhile [21:02:26] something odd is going on in cloudelastic chi, maybe an instance is flapping [21:06:24] looks like it's complaining about primary shards being too old [21:08:03] might be related to my aggressive recovery settings from earlier [21:08:37] tuning it back down to 20 [21:13:18] yeah, that fixed it. not 100% sure why [22:17:32] I set recoveries down to 10, but 12 are actively migrating and that's definitely too many for cloudelastic [22:17:48] the API is taking several seconds to respond [22:19:28] yup, our hardware def can't handle that many recoveries at once https://grafana.wikimedia.org/goto/cfke6sslb0phcf?orgId=1