[07:12:19] inflatador: reimage on `cloudelastic1003` still failed despite the firmware update showing `Complete` [12:47:29] ryankemper ACK, will take a look [12:47:33] also greetings [14:08:46] Guessing the NIC updates failed, I tried a different set of drivers instead of Broadcom. The sparse info under Hardware > Nic Slot 2 on the iDRAC suggests the external NIC is a Marvell qLogic [14:28:04] Given how many people are out today, and given that we have the staff meeting at the same time, I suggest we cancel today's retro. The only downside, I think, is that ryankemper may have gotten up early for it unnecessarily. ;-) [14:50:09] ^ agree [14:55:41] Trey314159 Yeah, probably good idea [14:56:25] Alrighty, that's a close to a quorum as we're likely to get, so the retro is canceled. [15:00:01] sounds good :) [15:02:55] just saw this! yeah, let's skip today [15:09:39] Mike! Haven't seen you in a while! Welcome back! [16:01:18] OK, the 64-bit Windows package for iDRAC firmware update is the one that works, apparently [16:15:41] looks like the firmware update stuff might be a red herring. If I skip the first prompt of interactive mode, the installer seems to go back to non-interactive and finish correctly [16:56:46] ^^ yep, ryan-kemper and i just confirmed this [19:06:39] * ebernhardson wishes spark was smarter about joins ... it's rediculous you have to set a global shuffle count instead of hinting that "this join is 5tb, use more partitions" [19:07:00] it's also ridiculous how bad my spelling is :P [19:38:41] :P [19:39:06] cloudelastic's taking forever to get back to green status (from yellow) [19:43:22] think it's just a matter of the shard recoveries cap being very low relative to the # of hosts we have [20:01:21] thanks Trey314159 ! so much to get caught up on.. [20:12:39] ryankemper: oh! it's probably because i turned off all the optimizations i previously made to recovery on cloudelastic. Sorry i should have been more clear [20:12:59] ryankemper: the thing is elastic uses the same recovery settings for snapshot restore and for moving things around in the cluster, and i was trying to make it gentler on thanos [20:14:02] ryankemper: i tried to include the command i used and the log output (which reported the old value) in this comment: https://phabricator.wikimedia.org/T309648#8054147 [20:15:49] and it looks like i forgot to mention in there that i also removed the indices.recovery.max_bytes_per_sec value, which i had previously set to 756mb, which means it now has the default of 40mb [20:17:20] i suppose for reference, as far as elasticsearch is concerned recovering from a snapshot repository and recovering from another node are the exact same operation, the only difference is how they request the files [20:23:21] ebernhardson: thanks, that makes sense [20:23:37] ebernhardson: are you still doing snapshot restore stuff? ie should i wait to set the settings back or can I flip those whenever [20:23:57] (lunch, back in 30) [20:24:21] ryankemper: not today, go ahead and set them back. This is perhaps more proof that we need a better way to manage these settings (but i want to avoid putting them all in puppet, because puppet :P)