[11:46:13] lunch [14:14:57] o/ [15:15:01] \o [15:18:46] o/ [15:19:46] inflatador, ryankemper: Given that we have already a few cloudelastic nodes running on Puppet 7 successfully, okay to migrate the role at large now? [15:33:30] moritzm Y, feel free to migrate the elastic role [15:34:09] specifically cloudelastic initially. ok, I'll go ahead in about 5 mins [15:38:00] moritzm Cool, feel free to move all of elastic at your convenience, just give us a heads-up [15:39:45] sounds good, the cookbook is running for cloudelastic ATM, I think I'll do the cirrus elastic ones tomorrow or Wednesday [15:52:10] cloudelastic is now fully on Puppet 7 [15:54:43] excellent! [16:01:36] dr0ptp4kt, inflatador: search triage in https://meet.google.com/eki-rafx-cxi [16:41:21] I doublechecked and there are currently two elastic::cirrus nodes running on Puppet 7: 2086 and 2087, are these suffciently representative before moving the rest of the role or do we need to move an additional canary/canaries in eqiad or on a node which handles a distinct function in the cluster? [16:41:35] happy to just move forward with the current ones, just wanted to check [16:46:25] moritzm Y, I believe those 2 are good enough as canaries...we can move fwd, just give a heads-up when starting [16:46:36] ok! [17:18:56] saneitizer fix rate is curious :P it had a delta (eqiad vs cloudelastic) of +1k on the 1st, and +7k on the second. But then the third is -30 and the fourth +80 [17:22:52] ??? [17:31:07] seems like it stabilized somehow indeed... https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&viewPanel=35 [17:32:17] (visible when selecting only the *.fixed dataseries) [17:39:47] inflatador: basically the saneitizer found a bunch of things to fix in cloudelastic but not eqiad, suggesting the SUP is missing updates that cirrus wrote to eqiad. It was 100 day after deploy, then 7k, suggesting yes there is a problem [17:39:58] and then saturday and sunday gave metrics that say the opposite :P [17:40:22] ebernhardson are we using the new SUP for all wikis on cloudelastic? [17:40:46] inflatador: only a few wikis, but they add up to ~25% of the write rate [18:52:21] lunch, back in ~40 [19:58:11] sorry...been back, but have to go to medical appointment now. Back in ~90 [20:00:34] * ebernhardson figures we need to store all the request-id's post-merge, not just the one that happens to win [20:59:49] * ebernhardson tries to remember how to get a UpdateEvent from a json blob for test cases [21:00:37] or really i guess i need a Row, that can go into UpdateEventConverters [21:19:09] * ebernhardson took way too longer to remember that all exists in UpdateEventConvertersTest already [21:28:01] back [21:48:55] hmm....a delete reports it's event time as the time the revision was created [22:11:28] ebernhardson in the Elastic Indexing grafana dashboard, what does the ".old" mean? [22:14:24] inflatador: thats the background reindexing process. Pages marked `old` skip the normal checking and are reindexed regardless of how correct they are [22:15:05] the rate is always high, sadly thats what it takes to visit every page once every ~16 weeks [22:38:17] cloudelastic is in red again, we're doing a rolling restart, checking on it now [22:42:40] primaries recovered...we're back to yellow [22:48:50] still some ALLOCATION_FAILED , but they seem to be clearing...going AFK for a bit [23:40:37] Rolling operation is finished, but I'm going to wait until tomorrow to finish the next private IP migration