[08:56:22] gehel: I’ll be 10mins late today [08:56:30] ack [08:56:46] Really struggling with the internet [10:01:34] lunch [10:54:24] Lunch [12:36:33] dcausse: I've merged https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/817215. I wonder if it make sense to send a PR to json-schema-validator to add the exclusion on their side. [12:36:54] gehel: sure will take a look [12:42:41] they switched to com.sun.mail in a recent version apparently [12:43:03] I should have bumped the version instead of excluding perhaps [12:46:11] greetings [12:46:18] o/ [12:46:23] gehel dcausse any objection to reimaging the rest of CODFW (15 hosts)? That would deploy the new plugin package on these 15 hosts instead of having to restart the cluster twice. I don't like doing 2 things at once, but it would save about 11 hours of watching cluster state [12:49:12] no objection at all! [12:49:34] dcausse: Oh yeah, upgrading sounds like a good option! [12:49:43] * gehel merged too fast [12:50:00] excellent, will get started soon [12:50:27] my bad I should have looked more recent versions [13:08:38] dcausse: there are a few minor open questions on https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/817304, but otherwise it looks mostly ready to merge [13:09:21] gehel: cool, I'll address them [13:09:55] it depends on https://gerrit.wikimedia.org/r/c/wikimedia/discovery/discovery-parent-pom/+/817302 tho [13:10:10] I should have marked that in the commit message [13:15:27] I'll merge that one and release to central [13:17:04] merged the parent pom, release in progress [13:19:54] merci! [13:20:58] after talking it over with Ryan, we decided not to wait for more masters and just finish the reimaging [13:22:16] +1! [13:32:25] a bit puzzled by spotbugs: it's OK to do this https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/817304/2/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/EventStreamFactory.java#158 but with https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/817304/2/eventutilities/src/main/java/org/wikimedia/eventutilities/core/util/ResourceLoader.java#229 you [13:32:27] have to suppress OCP_OVERLY_CONCRETE_PARAMETER [13:42:12] weird :/ [13:49:24] relocating, back in ~45. First CODFW just finished, should be back in time for the 2nd [14:15:14] back [14:25:12] I did not realize that EI_EXPOSE_REP and EI_EXPOSE_REP2 will complain about any mutable field. I think it make sense to disable them. [14:25:51] For collections where there are easy alternatives, it make sense to force immutability, but for more specific classes, I don't think it does. [14:26:24] see https://gerrit.wikimedia.org/r/c/search/extra/+/818466 / https://integration.wikimedia.org/ci/job/search-extra-maven-java8-docker/276/console [14:31:38] dcausse: let me know if you disagree with https://gerrit.wikimedia.org/r/c/wikimedia/discovery/discovery-maven-tool-configs/+/818475 [14:32:28] gehel: looking [14:34:42] oh I thought they somehow listed some well known classes (but was surprised to see ObjectMapper or ObjectNode) but if they inspect the object for mutability that's going to be a lot [14:35:41] I did not check implementation, but if they fail on Lucene Terms, that's a problem! [14:36:54] yes [14:49:21] Working from my favorite coffee shop! https://ewr1.vultrobjects.com/fun/whatsbrewing.jpeg [14:54:53] \o [14:55:55] o/ [15:04:49] why am I not surprised that Brian's favorite coffee shop is full of arcade machines... ? [15:13:54] True enough. I'm not sure if I posted this pic already, but here's me at Token in Dublin! https://ewr1.vultrobjects.com/fun/token.jpeg [15:23:47] pretty serious! [15:34:12] going offline, have a nice week-end [15:37:53] enjoy the weekend! [16:36:35] ryankemper are you around yet? I'm out in about 30m, just wanted to make sure you were able to watch the cluster reimage [16:36:43] I've got a tmux up on cumin1001 [16:38:53] \o [16:39:01] inflatador: yup, can watch [16:39:29] ryankemper ACK, thanks. It's been smooth so far [16:40:53] inflatador: I'll see if I can't fix that! /s [16:43:41] inflatador: wait I'm a bit confused. don't we want a restart of the eqiad elastic cluster? I see a reimage of codfw going on [16:44:01] well we want a restart of both clusters technically I guess but yeah [16:50:06] ryankemper yeah, see scrollback, we want both but I figured it would be easier to accomplish thru reimage, should save us a lot of work [16:50:53] inflatador: reimaging for codfw makes perfect sense to me. what about eqiad tho? should I kick off a rolling restart once the codfw reimage-followed-by-restart is done? [16:51:25] (reimage-followed-by-restart because we want to reimage with the same start datetime as before and then follow up with a codfw restart with a start datetime from after the plugin deploy) [16:53:36] yeah, I guess we don't want to reimage eqiad quite yet, rolling restart is OK there. And good catch on DFW, we will indeed want to rolling restart the nodes that weren't reimaged today [16:57:55] ryankemper oops, one more thing: be sure to install the updated plugin on each host BEFORE rolling restart, I haven't done that yet [16:59:58] heading out, if you need anything text/call . Have a great weekend! [17:02:51] thanks, you too [18:04:41] umm, hmm. turns out upgrading metastore doesn't reindex old data into the new metastore :S I'm sure it did at some point... [18:05:19] yea the code certainly has it..hmm [18:07:13] looks like we don't check the response though, probably what happens here. [18:20:04] oh...the code is intentionally excluding these other docs, but i don't think it actually wants to [18:21:00] so it does reindex, and it reindexes everything it was told to (everything but the metastore internal docs). But those metastore internal docs are the per-wiki namespaces and the version info for the created indices (mw version, git hashes for mediawiki and cirrus, etc.) [18:21:09] fun :P [18:21:55] i suppose will recreate the namespace ones, we have some scripts to do that part, will have to ponder if it's meaninful that it dropped the versioning info for the created docs. Suspect we don't actually use those for anything beyond debugging but not 100% [18:36:02] Looks like the new plugin was only built for `6.8.23` and not `7.10` (https://apt.wikimedia.org/wikimedia/pool/component/elastic68/w/wmf-elasticsearch-search-plugins/ vs https://apt.wikimedia.org/wikimedia/pool/component/elastic710/w/wmf-elasticsearch-search-plugins/). So none of the newly reimaged hosts have the new plugin [18:36:14] Stepping out for 20 mins and then will fix that [18:36:34] ryankemper: doh sorry thats part me, i built and released the plugin but not the debian package [18:36:47] forgot that we would have some hosts already on 7.1 [18:36:49] 0 [18:37:50] ebernhardson: actually I just didn't sleep enough. in my head I conflated the reimage with the 7.10 upgrade [18:37:53] disregard :D [18:38:26] Still a bit confused that they all look like `elastic2054-production-search-codfw extra 6.8.23-wmf1` though. I'd think that a reimage would come up with the latest plugin version since it's starting fresh [18:38:41] hmm, it should be -wmf2 for the new extra plugin [18:38:41] but I might just need to run a rolling upgrade operation https://github.com/wikimedia/operations-cookbooks/blob/1e02d9458844079b8a056cd3c9df235938fd5192/cookbooks/sre/elasticsearch/rolling-operation.py#L237-L241 [18:38:45] yeah exactly [18:39:48] oh it's gotta be a stretch vs bullseye thing [18:43:58] oh that might make sense, we probably need to build a debian package for each version? The package itself should be exactly the same, but apt/dpkg probably cares [18:44:14] well, the contents of the package at least probably the metadata varies a bit [19:03:03] Yeah I seem to remember copying it being okay but I don't remember if we've actually done that before so I'll just build with bullseye [19:36:48] Hmm [19:37:18] I tried changing the dist to `bullseye` like so: [19:37:20] https://www.irccloud.com/pastebin/A96A0xPo/ [19:37:34] But that ultimately results in this error: [19:37:36] https://www.irccloud.com/pastebin/JjvbULwq/ [19:38:04] Not sure if I should just make a `6.8.23-5` or if it is correct to just copy the package over manually instead [19:42:50] hmm [19:48:28] ryankemper: i'm not sure, and not seeing any obvious hints in the debian packaging info. It's probably fine to release a -5 for bullseye [19:52:07] ebernhardson: how's https://gerrit.wikimedia.org/r/c/operations/software/elasticsearch/plugins/+/818507 look? [19:56:40] ryankemper: seems reasonable [20:09:51] Built+uploaded like so: https://phabricator.wikimedia.org/T314078#8116396 [20:10:31] Will kick off the reimage of next host, verify it's got the new `search-extra => ~2` and then proceed with the rest of the reimages [20:11:21] Then circle back and do a plugin upgrade followed by rolling restart on remaining codfw, using a start-datetime of `2022-07-29T20:11:00` [20:11:25] Then could proceed to eqiad [20:11:54] ebernhardson: We're okay having the `search-loader` daemons offline until ~Monday, right? I'd expect codfw to finish today (or get close), but unlikely to have eqiad completely finished as well [20:12:18] IIRC from our discussions the impact of the daemons being offline for a few more days is fairly minimal but just wanted to check [20:13:15] Reimaging `elastic2029` now so that's the host whose `search-extra` plugin version I'll check after [21:00:46] ryankemper: yes it should be fine for them to stay offline [21:00:54] it will backlog a bit, but it will be ok [21:02:18] `elastic2029-production-search-codfw extra 6.8.23-wmf2` woot! [21:04:26] finally :)