[10:32:31] I just realised that I forgot to send the invite for the Search Platform office hours. I'll reschedule for next week instead [13:14:15] Hi all. I have a couple of questions. The simpler one is this: I saw the recent Phabricator task update regarding OpenSearch viability. Is the plan to require OpenSearch or will Elasticsearch remain an option? We're using Amazon's OpenSearch using the Elasticsearch engine (just upgraded from 6.8 to 7.10.2 with our MW 1.39 upgrade). [15:49:58] The other question is in regards to taking index snapshots. In AWS, I have a script that creates index snapshots in an S3 bucket (which is the recommended method), and I've tested a restore in dev, the idea being that if something goes wrong, like with an Elasticsearch software update, I could restore the indexes from the bucket. However, if [15:49:59] something happened with a live upgrade, since the software update is just a blue/green deploy, would rolling back by restoring the index from the bucket really be an option since there are constant index changes being made? If so, would I need to run the typical post-MW upgrade Cirrus reindexing procedure? [15:50:15] Hi justinl I'm bking (who just moved that task). Elasticsearch won't be supported forever, but it will be a least a year, probably closer to 2 before we pull it [15:50:44] even then, we won't actively break it, just stop testing with it. I imagine it will continue to work, although we can't guarantee that [15:51:20] Wow, that's a big change, good to know this far in advance, so I'll need to plan for the migration. Is is possible to migrate from Elasticsearch to OpenSearch, or would new indexes have to be created? [15:52:53] FWIW I have really no knowledge of managing Elasticsearch since I've never set it up myself, I just rely on the managed service and Amazon support. [15:53:38] justinl re: blue/green deploy, I guess it would depend on the failure scenario. Like if the newly-updated node refuses to join the cluster, maybe that's a rollback/recreate node. Data corruption, loss of cluster quorum, etc might require different approaches [15:55:20] Since this is an Amazon-managed service, it does the blue/green deploy on its own and would revert to the old cluster if there were a problem with the software or nodes. My question was more about if something were wrong with the indexes after the update, even though the updated cluster would otherwise be working fine. [15:55:49] justinl re: Elastic to opensearch migration, my team (search platform) has not tried this personally, but opensearch.org has some docs: https://opensearch.org/docs/latest/upgrade-to/upgrade-to/ [15:56:41] If the indices were corrupted then yes, you would have to restore from a snapshot [15:57:03] Thanks, I'll look at them. Seems like this might possibly be quite the project. I'm currently working on getting our Amazon Aurora MySQL databases upgraded from 5.7 to 8, which is another big task. Managing so many components is very challenging. [15:57:39] Thanks. After such a restore to a production cluster, would I need to run the Cirrus reindexing procedure? [15:58:13] Could the wikis remain live while doing the restore or should I take them offline (redirect users to the maintenance page)? [15:59:00] Yes, you would want to reindex to make sure you didn't miss anything that was changed at the SQL level during the maintenance [15:59:20] FWIW I've done live software updates and even the 6.8 to 7.10 upgrade while the wikis were live with no issues, so I'm not expecting problems, but I'm just trying to make sure I've got a procedure defined, just in case. [15:59:21] The wikis could remain live but search results might not work [16:00:29] Ok, that's good. I communicate with our main editors (I don't manage any content, that's all up to the users of the wikis), so if there were such an issue, I'd notify them and they'd update the sitenotice pages for each wiki. [16:02:31] Good luck! I'm going AFK for awhile, but will be back in ~40m or so. Others might be able to help too, so feel free to add any follow-ups [16:02:41] Thanks! I appreciate the help! [16:56:09] back [17:30:20] > My question was more about if something were wrong with the indexes after the update, even though the updated cluster would otherwise be working fine. [17:30:48] I would hope the indices wouldn't meaningfully change after the migration, although I don't know for sure [17:32:39] As an aside at least as of 4 years ago when I was using AWS elasticsearch at a previous job, the behind the scenes blue/green deployment was the source of lots of consternation to us. We'd frequently hit some sort of race condition where the new cluster would never come up properly but Amazon's wrapper software didn't seem to notice automatically so we'd have to open a support ticket every time it got stuck [17:46:43] lunch, back in ~40 [17:47:25] T356302 is in needs review, but no attached patch [17:47:25] T356302: setup production Cirrus Streaming Updater alerts - https://phabricator.wikimedia.org/T356302 [18:31:33] spark is being weird today...yesterday spark.sql.mapKeyDedupPolicy=last_win was fine. Today it throws unless i capitalize to LAST_WIN [18:32:21] oh wiat, no that still fails too :S [18:56:34] ryankemper The indexes shouldn't be affected by a software update or version upgrade but I am just being careful with planning and having rollback or other mitigation procedures defined in advance, where possible.I was paranoid enough about the 6.8 to 7.10 Elasticsearch upgrade. The blue/green deploy changes the IP addresses of the nodes, so I also [18:56:34] have a special Nginx proxy DNS resolver configuration to handle that automatically. Found out the need for that the hard way. [18:56:55] sorry, been back