[10:43:57] <volans>	 mutante, klausman, effie, hnowlan: gentle reminder to not use cumin1001 for cumin/cookbooks/homer/etc. but cumin1002/cumin2002 instead (see moritzm 's email from before the break for context and the MOTD)
[10:45:33] <effie>	 I am still logged in on 1001 arent I 
[10:45:43] <effie>	 I am running from 1002 for sure
[10:46:20] <effie>	 lol volans you had me there for a bit 
[10:47:55] <volans>	 effie: I check execution logs, not who's logged in where :D
[10:50:21] <klausman>	 yeah, muscle memory drags me to 1001 sometimes :)
[11:01:13] <jnuche>	 hi there, SRE's infra window is starting now but I still have train steps to run, if you need me to stop please let me know
[11:02:01] <btullis>	 Heads up, I'm about to deploy a change to Presto, to switch it to use the PKI certificates. https://gerrit.wikimedia.org/r/c/operations/puppet/+/709713 - Presto services may be bumpy for a bit.
[11:06:51] <_joe_>	 jnuche: sorry I saw you just now
[11:06:54] <_joe_>	 please go ahead
[11:07:20] <jnuche>	 _joe_: thx
[11:18:10] <btullis>	 Presto changes are all deployed. Services should be back to normal now.
[13:31:51] <godog>	 jelto: nicely done re: gerrit 991000 
[13:34:21] <jelto>	 godog: thanks! I'm not a fan of migrating EOL services but this made the migration of the remaining services easier. And removing the misweb design-style-guide service from wikikube again is also quite easy.
[13:35:29] <godog>	 jelto: oh I was talking about the change number, +1 on the content/purpose though too
[13:35:56] <jelto>	 ah ok :D 
[18:29:09] <inflatador>	 legoktm re:email list creation | sorry, I just followed the docs on wikitech. It's not working, so I'll delete the list and try again w the Google groups approach as recommended by j-oe
[18:31:29] <mutante>	 volans: I copied my .bash_history from cumin1001 to cumin1002 to incentive myself to never use the old one again now, ACK
[19:54:50] <kamila_>	 volans: I have a present for you: the reimage cookbook needs locking or something for downtiming the host, see the run from https://phabricator.wikimedia.org/T351074#9463635
[19:55:21] <kamila_>	 I was reimaging ~5 hosts at the same time
[20:05:43] <volans>	 kamila_: if you're referring to the line:
[20:05:51] <volans>	 _Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99_
[20:06:03] <kamila_>	 volans: yep
[20:06:20] <volans>	 that's most likely because the puppet run on the alert host failed within the timeout passed to run-puppet-agent
[20:06:25] <volans>	 because of the concurrent runs
[20:06:27] <kamila_>	 oh
[20:06:29] <kamila_>	 I see
[20:06:57] <volans>	 on the alert hosts puppet is super slow, more than a minute IIRC
[20:09:40] <volans>	 in that case the reimage is just calling the downtime cookbook, that currently doesn't have yet opt-in in using the locking arguments
[20:09:51] <kamila_>	 right
[20:11:34] <volans>	 we could add the lock only when --force-puppet is set for example
[20:11:39] <volans>	 and make it exclusive per-host
[20:12:11] <volans>	 in all the other cases there is probably no need to limit the concurrency
[20:14:12] <volans>	 if you want feel free to open a task (to not forget) and/or send a patch :)
[20:16:00] <kamila_>	 right, yes, that'd probably work
[20:20:52] <kamila_>	 filed: https://phabricator.wikimedia.org/T355187
[20:21:29] <kamila_>	 I'm not sure I understand the problem well enough to send a patch, but it might be a valuable exercise :D 
[20:23:20] <volans>	 thanks for the task kamila_, if you want tomorrow we can pair to do it if you're interested
[20:23:31] <volans>	 (or any other day)
[20:24:11] <kamila_>	 sure, why not, thanks :)
[20:24:38] <volans>	 thank you for noticing and reporting it ;)
[20:26:34] <kamila_>	 I'm reimaging way too many things and feel awkward about the alert spam, so it's quite selfish :D