[08:59:09] o/ [09:22:07] o/ [09:47:34] errand+lunch [12:51:12] \o [12:55:29] o/ [13:13:21] o/ [13:19:19] o/ [13:48:20] hmm, we might have to finally fork Elastica to change some strings [13:51:30] oh never mind, i'm blind. it's fine :P [13:53:46] phew :) [13:53:51] * inflatador always thinks of https://www.youtube.com/watch?v=ilKcXIFi-Rc when I hear 'Elastica' [17:56:04] i did not know there was a band named elastica [18:06:57] I don't know why, but I remember Henry Rollins showing the video on MTV. Just one of those random things taking up space in my head ;P [18:07:33] In other news, we're getting closer to OpenSearch 2: `Caused by: java.lang.IllegalStateException: index [.ltrstore/vCo9DZu5Qt-3QbtmBy1d7Q] version not supported: 6.5.4 minimum compatible index version is: 7. [18:07:33] 0.0` [18:08:18] that means we haven't reindexed that index, which makes sense since it's a plugin index and not an index maintained by cirrus. I'm not actually sure how that ltr index is supposed to be migrated, would have to look into it. [18:10:17] ah, I was thinking it might be related to a plugin name mismatch, I think it's called `ltr` in 1.x and `opensearch-ltr` in 2.x [18:12:43] I wonder if we add our custom ltr plugin to the opensearch 2 plugins packages, would it make a difference? Would it cause jar hell if we ran `opensearch-ltr` and `ltr` on the same cluster at the same time? [18:13:17] no it's a common thing on opensearch, basically when upgrading major versions all index need to have been re-created in the last major version. The basically limit how old of an index format they are willing to open. [18:16:00] cool, no rush. Headed to lunch but will take a look once I get back [18:25:18] in my quick review, it looks annoying :P Maybe david will have better ideas. It looks like we could create a new named store, reload the data (maybe via reindex api, needs testing), then repoint queries at the new feature store and get rid of the old one [18:25:31] each named store gets it's own index iiuc [19:13:03] i suppose on the upside, cloudelastic doesn't use .ltr. In that cluster we can simply delete the index, but it will require a plan in prod. [19:18:39] Maybe we failover to a single DC and delete/regenerate the index in the offline DC? [19:20:32] hmm, probably possible. Might still be 80% of same work though [19:20:45] probably worth it to not have a custom named store hanging around as tech debt though [19:22:31] If I delete the index in cloudelastic, will it cause problems? Like is there some sync process between prod and CE that might get jammed up or something? [19:24:59] I guess it all comes from mjolnir? [19:25:27] In cloudelastic it should do nothing, it's only accessed when issuing a specific kind of query. Lemme double check what's in the index though [19:26:21] sure, no rush [19:27:33] yea it has 5 docs, all related to MediaSearch. It looks like some experimentation was done in 2021 but can easily be killed today [19:32:29] cool, I deleted it on psi, using it as my guinea pig. Let's see what the next error is ;) [19:34:01] `java.lang.IllegalStateException: index [mw_cirrus_metastore_1659365741/ugKwuXOpRjiti8dY67m9OA] version not supported: 6.8.23 minimum compatibl` [19:35:05] hmm, well that one we have a maint script for iirc, lemme double check. Also please take notes, we will need to do this in prod too :) [19:36:04] Good catch, I will add to T422860 [19:36:04] T422860: Migrate Cloudelastic to OpenSearch 2.x - https://phabricator.wikimedia.org/T422860 [19:38:35] hmm, well we have a function for it, MetaStoreIndex::upgradeIndexVersion, but afaict we only run that function if the METASTORE_VERSION in the code doesn't match the index itself. I suppose we could bump it just because, or we could add some sort of --force option. [19:38:46] I'm surprised we can't run that function directly...it seems like something we must have done before [19:41:19] yea the --upgrade option refuses, because the index is "up to date" according to the code definitions. [19:47:28] Do we need to keep the state of the index if we're not currently running a maintenance operation? [19:49:29] hmm, yes i think the metastore has to stay. It can technically be recreated, but it carries data about the state of all wikis [19:49:55] if we are trying to get things moving, i can run the upgrade function from a shell. I also put together a quick patch to metastore to add a --force-upgrade option [19:50:26] i suppose what we need before migrating prod is a quick script that checks the creation version of all indexes, to make sure we address this before we start migrating [19:50:55] Sure, if you want to try. It's not urgent, I was also thinking of resurrecting the snapshots and seeing what happens if I try snapping and restoring to a newer version of OS [19:51:31] hmm, that is a curious option. I'm not sure if snapshot/restore counts as recreation. I suspect that places the exact same files on disk regardless [19:51:44] and the change is usually in the format of on-disk data [19:53:15] ah, so it would probably just refuse to restore if I tried that [19:58:10] omega and chi done, just need to find a psi wiki [19:58:26] https://stackoverflow.com/questions/76115943/version-of-opensearch-that-created-the-index [19:59:12] ok should be recreated on all three cloudelastic clusters [20:01:52] cool, looks like that stackoverflow article works, if you add `?human` to the query you can get stuff like `"version":{"created_string":"1.3.20","created":"135249827"}` [20:02:54] `Caused by: java.lang.IllegalStateException: index [.tasks/4K_JReOFSQCyLxA6-lT6QA] version not supported: 6.5.4 minimum compatible index version is: 7.0.0` <- new problem child [20:03:21] sigh, it's like every system index :P [20:04:42] wth...looking up a github issue link and it's just spinning. They've really gone down hill since the MS aquisition :( [20:07:36] Looks like .tasks might be the last one (based on ChatGPT one-liner?): ` curl -s localhost:9600/_all/_settings | jq -r 'to_entries[] | "\(.key) \(.value.settings.index.version.created)"' | grep -v 135249827 [20:07:36] .tasks 6050499` [20:08:10] that's for CE though, haven't checked prod yet [20:08:40] thats good at least, i would (naively) be wary of simply deleting that index...it's what opensearch uses to track a subset of in-progress things. There are probably no active tasks, so it's probably ok, but hard to be completely certain [20:09:41] The sad thing is https://github.com/opensearch-project/OpenSearch/issues/18717 is basically full of people with this issue, but i don't seen any responses that are clearly about fixing the root cause, they are all bandaids to get an upgrade moving [20:11:38] The closest is https://docs.opensearch.org/latest/migrate-or-upgrade/migration-assistant/ but it seems to be more about moving between clusters which is a non-issue anyways (if you move between clusters without a snapshot, you are recreating) [20:14:40] This seems to claim that .tasks is mostly about progress bars / reporting, and that there is no major problem if you delete the .tasks api and let it get recreated [20:14:51] s/.tasks api/.tasks index/ [20:17:13] Got it. I'm just reading down the GH issue. Reminds me of the problems we're having in https://github.com/opensearch-project/opensearch-build/pull/6124, although they have agreed in principle to merge it [20:17:41] There was a different PR open for almost a year also with a ton of people hitting the issue [20:21:30] Deleting .tasks did the trick, cloudelastic1012 is now part of the psi cluster! [20:21:35] nice! [20:52:21] OK, we are running OpenSearch 2 on all cloudelastic clusters. Just unbanned `cloudelastic1012` and we should be getting shards soon [21:09:56] looks like the exporters are unhappy, checking on that now [21:13:39] probably has something to do with the newer Python on Trixie [21:29:49] yeah, looks like we need a couple of lines to make `_socket.getaddrinfo` happy [22:10:22] out of time for today, but it looks like most of the old indices on prod are `titlesuggest` or `archive` indices: https://phabricator.wikimedia.org/P91237