[08:59:35] volans: any chance we could get a spicerack release for https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/776999 ? [09:02:52] errand, back in 20' [09:04:44] gehel: sure, no prob, will be done by end of UTC morning :) [09:05:21] volans: thanks ! [09:55:42] lunch [10:34:23] lunch 2 [11:02:17] gehel: sorry I'm fixing something in spicerack that I'd like to get deployed too, hoping to bundle them together today [11:03:21] No emergency on our side ! [11:04:37] thx [13:01:40] Greetings and salutations [13:19:12] Hello Ryan [13:19:21] Brian [13:23:07] ? [13:46:10] o/ [13:52:32] Errand [15:08:13] \o [15:20:48] o/ [16:01:15] workout, back in ~30 [16:03:57] Dinner [16:39:06] dinner [16:55:23] sorry, been back [17:09:29] ebernhardson: the SRE pairing session later today will be focused on the ES 6.8 upgrade. You probably don't need to be there if you have other pressing issues [17:18:45] gehel: ok, i don't know that i have anything in particular to add. I might suggest if your going to be restarting servers to also tune up cluster.indices.recovery.max_bytes_per_sec which is still set to 80mb from when we were on 1g. I arbitrarily set it to 756mb on cloudelastic a week or two ago [17:19:07] * ebernhardson wonders why 756 instead of 768 [17:21:02] actually eqiad is still set to 40mb. [17:26:04] (also worth mentioning, cloudelastic has many more disks and probably more io capacity than normal servers, 768 may be excessive elsewhere) [17:34:00] interesting, will keep that in mind. I can tell you that shuffling shards around "seems" slow based on last wk's reboots [17:40:20] lunch, back in ~30 [18:30:16] Couple mins late to pairing [18:32:54] meh, no great option for BC CLI options in maintenance scripts, i guess a quick reimplementation of the verification is fine [19:14:56] for no great reason wondering if the forbiddenapi's checks we use in java could be reimplemented on the php side now that types and code analysis are more common. Maybe a phan extension or something. Not even sure which api's i want to ban but it seems to have caught numerous problems on the java side that someone would have to notice in CR [19:15:03] anyways, lunch time :) [19:35:56] back [19:38:05] ryankemper, inflatador, gehel: the new spicerack release (2.4.1) with the change for elasticsearch is almost ready. If I deploy it to cumin2002 would you be able to test it? Or would be better tomorrow? [19:38:36] volans: we're in the process of doing a cluster restart. Won't be able to test until tomorrow [19:39:09] ack, I'll hold the deploy until tomorrow then, no prob [20:09:14] so i couldn't help but partially watch the restart :P Randomly guessing, perhaps the sleep(20) after elasticsearch comes up before we re-enable replication is too short. Maybe something like wait for green with replication disabled up to some timeout (2m? 5m?) after which we enable replication and continue waiting for green. It's certainly shuffling more shards than i would expect based [20:09:15] on previous single-node restarts. [20:10:08] otherwise, i don't know why we never used it but we can tell elastic on a per-index basis how long to wait for a shard to re-appear before creating a new instance, could set on the largest indices so they have a better chance to recover from disk instead of over network https://www.elastic.co/guide/en/elasticsearch/reference/6.5/delayed-allocation.html [20:11:23] might not be as useful though, would have to read more carefully :) [20:20:26] Nice [20:21:40] ebernhardson we have a tmux session up on elastic2050 under g-ehel user if you wanna shoulder-surf some more ;)