[06:53:25] <mutante>	 creating a ganeti VM, cookbook step runs DNS update but the diff I get includes an unexpected change:
[06:53:29] <mutante>	 -asw-a-eqiad                              1H IN A 10.65.0.17
[06:54:54] <mutante>	 I don't see a matching change in DNS repo so it should be from someone editing mgmt in netbox?
[06:55:54] <volans>	 mutante: https://netbox.wikimedia.org/extras/changelog/65107/
[06:56:06] <volans>	 and the icinga alert was alerting tonight
[06:57:15] <mutante>	 volans: thanks! hmm.. it seemed unusual to delete only mgmt
[06:57:22] <volans>	 I think asw-a8-eqiad
[06:57:25] <volans>	 were offlined
[06:57:26] <mutante>	 but looking at it .. Host asw-a-eqiad.eqiad.wmnet not found: 3(NXDOMAIN)
[06:57:32] <mutante>	 so the main IP is already gone
[06:57:34] <volans>	 should be 2
[06:57:50] <volans>	 the actual one is asw2-a-eqiad.mgmt.eqiad.wmnet
[06:57:52] <mutante>	 I guess then it can't hurt much and I can accept the diff
[06:57:58] <volans>	 but if XioNoX or topranks could confirm
[06:58:01] <volans>	 would be better
[06:58:05] <mutante>	 yea
[06:58:06] <volans>	 *agree but...
[06:58:41] <XioNoX>	 yeah the very old switch stack got decom
[06:59:06] <mutante>	 aha:) So ok to go ahead and delete mgmt, right?
[06:59:28] <XioNoX>	 yep
[06:59:36] <mutante>	 thanks, doing!
[06:59:57] <XioNoX>	 https://phabricator.wikimedia.org/T218734
[07:02:42] <mutante>	 I would leave a comment to remember running the dns cookbook but I think the actual issue is jclark doesn't have the needed access
[13:20:57] <volans>	 legoktm, rzl: FYI I've released and deployed the latest Spicerack today, AFAICT the downtime/remove_downtime services works as expected. Only nit is that it needs to match the whole string, because icinga-status uses fullmatch(), we should probably add that do the docstring.
[13:21:45] <volans>	 p.s. you got lucky for the live-test that I was releasing today anyway, I think that a bit more coordination could have helped here ;)
[14:02:37] <kormat>	 jbond: why is pcc lying to me? :(
[14:04:13] <kormat>	 e.g. https://puppet-compiler.wmflabs.org/compiler1002/31038/db2112.codfw.wmnet/index.html - the diff is lying about the current prod config
[14:04:37] <jbond>	 looking
[14:05:13] <kormat>	 if you compare with icinga, you'll see that the check already has #p.age appended to it
[14:13:23] <mutante>	 I don't see the "#page" at https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=db1163&service=MariaDB+read+only+s1 
[14:13:33] <mutante>	 db1163 
[14:13:44] <jbond>	 kormat: i think it is because it realies on mediawiki::state which reads a file directly from the puppet master /etc/conftool-state/mediawiki.yaml
[14:14:24] <jbond>	 on the compilers we have "primary_dc: eqiad
[14:14:25] <jbond>	 "
[14:14:47] <kormat>	 mutante: sure?
[14:15:15] <kormat>	 jbond: oh :/
[14:15:22] <kormat>	 jbond: that's.. problematic
[14:16:02] <mutante>	 kormat: yea, no string "page" on the host overview for that one?  https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=db1163
[14:16:13] <kormat>	 mutante: sorry, let me rephrase
[14:16:18] <kormat>	 mutante: yes, that's expected.
[14:16:24] <kormat>	 the related CR is acatually going to change that
[14:16:31] <kormat>	 but that's been the standard up until now
[14:17:00] <mutante>	 then the compiler says that it's going to change
[14:17:12] <mutante>	 I just looked at the first host 
[14:18:10] <jbond>	 kormat: https://phabricator.wikimedia.org/T290665 ill see if there is a quick fix later today/toomorrw
[14:18:10] <mutante>	 if it's just about codfw and not eqiad.. then ACK
[14:20:36] <kormat>	 jbond: wonderful, thank you!
[14:27:29] <jbond>	 kormat: no probs
[14:45:17] <mutante>	 kormat: intermittent "(Can Not Connect to MySQL)." on Phabricator
[14:46:02] <kormat>	 mutante: thanks, looking.
[15:10:43] <rzl>	 volans: nod -- full-match is the expected behavior but you're right, it's only documented on the icinga-status side, will send a fix
[15:10:56] <volans>	 thx!
[15:54:55] <legoktm>	 volans: thanks, let me add that to our /Coordination subpage
[16:01:26] <jelto>	 use the following command to view the live-test dc switch on cumin1001: tmux attach -t dc-switch-live-test 
[16:01:51] <volans>	 can't find session dc-switch-live-test
[16:01:56] <legoktm>	 sudo?
[16:02:02] <jelto>	 as root
[16:02:05] <volans>	 ok :)
[16:02:22] <rzl>	 jelto: did you want the rest of us to use -r as well so that our tmuxes are read-only?
[16:02:48] <jelto>	 I think most of you can use -r
[16:04:57] <legoktm>	 for non-roots, we have a google meet with screensharing, I'll put the link in _security
[16:05:14] <legoktm>	 (but all the discussion will happen here, on IRC)
[16:05:30] <volans>	 no live-commentary? :-P
[16:05:30] <rzl>	 the services switch doesn't have a --live-test mode
[16:05:39] <rzl>	 we can --dry-run it though
[16:06:00] <jelto>	 like so?
[16:06:01] <rzl>	 (which would go before the cookbook name iirc)
[16:06:08] <volans>	 before the cookbook name
[16:06:11] <volans>	 is a cookbook option
[16:06:16] <volans>	 not the specific cookbook one
[16:06:21] <volans>	 *it's a general cookbook option
[16:06:22] <rzl>	 👍
[16:06:33] <volans>	 cookbook -h
[16:06:41] <volans>	 and cookbook sre.switchdc.services -h
[16:06:46] <volans>	 are alwayas your friends ;)
[16:07:00] <legoktm>	 is there a reason services doesn't have a --live-test? should we add one?
[16:07:01] <rzl>	 I tried that and I just got "#--- Switch Datacenter for Services args=['-h'] ---#"
[16:07:11] <legoktm>	 jelto: lgtm
[16:07:19] <volans>	 rzl: ah, because services is a directory
[16:07:23] <volans>	 right
[16:07:33] <volans>	 dry_run=True, all good in that sense
[16:07:53] <jelto>	 so 00-reduce-ttl-and-sleep would be the first cookbook we would like to execute, right?
[16:07:56] <volans>	 tat's a good question if we should support -h for directories, but offtopic
[16:08:01] <rzl>	 legoktm: I think --live-test wouldn't actually do anything different from just running the full cookbook, volans can check me
[16:08:14] <rzl>	 assuming you did it in the same direction, e.g. eqiad->codfw right now
[16:08:17] <volans>	 indeed, if you run it with DC inverted
[16:08:23] <volans>	 would be the same AFAIR
[16:08:36] <volans>	 as it should be a noop
[16:08:39] <rzl>	 mediawiki live-test has to do things like skip setting RO, but for services we don't have any steps like that
[16:08:42] <volans>	 unlike the mediawiki oen
[16:08:46] <legoktm>	 right
[16:08:52] <legoktm>	 jelto: yep
[16:09:43] <jelto>	 okay then I will execute the first cookbook now 00 for services
[16:09:50] <rzl>	 🚀
[16:10:47] <legoktm>	 I think we need to exclude "mwdebug" from this
[16:10:55] <jelto>	 we should add elevator music to that cookbook
[16:13:48] <legoktm>	 where is the "label swift did not match regex..." coming from?
[16:15:29] <legoktm>	 conftool apparently
[16:16:06] <jelto>	 should I wait because of the swift error or can this be investigated later?
[16:16:15] <legoktm>	 no, you can keep going
[16:16:30] <volans>	 legoktm: in dry-run mode verbose is also activated
[16:16:33] <rzl>	 that's the expected output since swift isn't one of the services we're switching
[16:16:34] <volans>	 and conftool is pretty verbose
[16:16:45] <legoktm>	 ack
[16:16:50] * volans not sure at which line you're looking at though
[16:16:50] <jelto>	 okay then I will keep going with 01-switch-dc
[16:17:03] <rzl>	 volans: is all of that output logged to a single file somewhere? I see it isn't in -extended.log
[16:17:12] <volans>	 yes it should
[16:17:21] <volans>	 for the directory though
[16:17:30] <rzl>	 oh no yep there it is
[16:17:45] <volans>	  /var/log/spicerack/sre/switchdc/services-extended.log
[16:18:46] <rzl>	 in the real thing, here's where we'll stop and make sure everything is still healthy
[16:18:57] <rzl>	 before running 02-restore-ttl I mean :)
[16:19:18] <jelto>	 makes sense :)
[16:19:37] <legoktm>	 woot
[16:19:50] <rzl>	 so, if we want to do a services live test, we can rerun the same thing without --dry-run
[16:19:57] <rzl>	 I don't know that we need to do that, though
[16:20:17] <rzl>	 there's not much complexity there and I don't think we've touched it since last time, right?
[16:20:48] <legoktm>	 the only issue I noticed is that mwdebug needs to be added to MEDIAWIKI_SERVICES so it gets excluded (and also added in the MW cookbooks)
[16:20:49] <jelto>	 so now I keep going with sre.switchdc.mediawiki cookbooks
[16:21:05] <rzl>	 legoktm: yeah, smart - want me to write up a task, or will you just remember?
[16:21:52] <legoktm>	 I wrote it down in my notepad
[16:21:55] <rzl>	 👍
[16:22:16] <rzl>	 jelto: I'll defer to others but I'd be inclined to run the MW cookbooks with --dry-run first, and then --live-test after
[16:22:30] <legoktm>	 +1 on --dry-run first
[16:22:57] <jelto>	 like that? cookbook --dry-run sre.switchdc.mediawiki eqiad codfw --live-test
[16:23:16] <rzl>	 I think only --dry-run, I'm not sure if we've ever tested --dry-run and also --live-test?
[16:23:26] <rzl>	 I guess there's no reason it wouldn't work
[16:23:45] <rzl>	 volans: ^? now I'm curious
[16:24:05] <rzl>	 (but the more useful test is probably just one or the other)
[16:24:14] <rzl>	 args lgtm
[16:25:30] <volans>	 dry_run is embedded in all the spicerack modules and in some cookbooks that do something specific
[16:25:30] <jelto>	 ok I will start 00-disable-puppet now, ping me to stop :)
[16:25:42] <volans>	 will surely "work" using both options for some definiton of work
[16:25:51] <volans>	 depends how the cookbook or libraries use the results
[16:25:56] <volans>	 of things that don't change
[16:26:06] <rzl>	 nod
[16:26:08] <rzl>	 jelto: go ahead
[16:28:50] <rzl>	 heh, the real advantage of slower warmup scripts was we didn't have to sit around waiting for this in dry-run mode :P
[16:29:34] <rzl>	 this time.sleep is the last line of the cookbook btw, so if we were really impatient we could just ^C out of it
[16:29:38] <rzl>	 I don't feel strongly though
[16:29:47] <legoktm>	 jelto: I think you can ctrl+c...yeah, what rzl said
[16:30:08] <rzl>	 everything else looks good, modulo legoktm's point about mwdebug
[16:30:39] <rzl>	 the ERROR is just from the ctrl-c, we're all good
[16:31:10] <legoktm>	 jelto: wait
[16:31:20] <rzl>	 this would be a scary question since codfw is the wrong DC, but we're in dry-run
[16:31:24] <legoktm>	 oh
[16:31:27] <legoktm>	 forgot about dry run
[16:31:33] <rzl>	 so it won't actually warm anything up
[16:31:33] <legoktm>	 jelto: continue :)
[16:31:54] <legoktm>	 yeah, I thought we were in --live-test inversion mode and was expecting eqiad
[16:32:00] <rzl>	 (when we live-test instead, it'll actually do the warmup, but it'll swap DCs for this step so that we warm up the passive DC instead of the active one)
[16:32:06] <legoktm>	 > Warmup completed in 0:00:00.000162
[16:32:15] <rzl>	 efficiency!
[16:32:19] <volans>	 lol
[16:32:30] <rzl>	 and I see six runs as expected
[16:32:54] <rzl>	 we should maybe clarify that `The script will re-run until execution time converges.` log line, but not a blocker
[16:33:53] <rzl>	 those cumin failures are expected in dry-run, we didn't actually kill any processes so they're still running
[16:34:19] <legoktm>	 yep
[16:34:45] <jelto>	 ok the next cookbooks should be executed rather quickly, right? because 02-set-readonly has user impact
[16:34:55] <rzl>	 during the real thing, yes
[16:35:43] <rzl>	 instead of stopping for approval at each step, you'll stop before 02- and then continue all the way through 07- without asking -- instead just keep an eye on IRC, and stop if we're all yelling stop :)
[16:35:58] <rzl>	 plus probably stop if the output is full of errors or whatever
[16:36:12] <jelto>	 allright
[16:36:52] <rzl>	 (but there's no user impact expected today, during either --dry-run or --live-test, just to be clear)
[16:37:19] <jelto>	 sorry hit ctrl+c :(
[16:37:32] <rzl>	 no worries, you can restart  from where you were
[16:37:54] <legoktm>	 all you lose is the nice [PASS] [ERROR] states, so nbd
[16:38:04] <rzl>	 output lgtm so far but I haven't been checking individual DB hostnames or anything
[16:38:29] <rzl>	 legoktm: oh, did you happen to verify we did the right thing wrt x2?
[16:38:56] <rzl>	 I guess that was before the last switch so it's already tested, but
[16:39:12] * legoktm looks
[16:39:42] <rzl>	 jelto: hang on a sec :)
[16:39:48] <jelto>	 ack
[16:41:19] <legoktm>	 it should've just excluded x2 entirely but I do see it being queried in the logs
[16:42:21] <rzl>	 hrm
[16:42:34] <legoktm>	 no, that's just for the heartbeat part
[16:42:41] <rzl>	 ahh, I was about to ask
[16:43:39] <legoktm>	 lgtm, it did not try to set it read only
[16:43:53] <rzl>	 👍
[16:44:08] <jelto>	 so continue with 06-set-db-readwrite?
[16:44:19] <legoktm>	 +1 from me
[16:44:35] <rzl>	 yep, fire away
[16:46:40] <rzl>	 cool, and in the real thing this is where we'll stop again for a while
[16:47:08] <rzl>	 to make sure everything is basically in good shape, and switch back quickly if we need to (unlikely)
[16:47:58] <rzl>	 go ahead when ready though
[16:50:02] <legoktm>	 woot
[16:50:40] <legoktm>	 the only note I have is rzl's comment about improving the message about the warmup script
[16:50:52] <rzl>	 no reason we have to do that before Tuesday IMO
[16:51:14] <legoktm>	 indeed
[16:51:49] <legoktm>	 https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Phase_8_-_post_read-only says "The parsercache hosts and x2 will need to manually be updated in tendril"
[16:51:57] <legoktm>	 but I see
[16:51:59] <legoktm>	 2021-09-09 16:49:53,540 DRY-RUN jelto 9449 [DEBUG remote.py:651 in _execute] Executing commands ['mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'db2142.codfw.wmnet\') WHERE name = \'x2\'" tendril'] on 1 hosts: db1115.eqiad.wmnet
[16:53:32] <legoktm>	 I think my patch related to x2 may have accidentally fixed this? cc: kormat, marostegui ^
[16:54:32] <legoktm>	 ready for live test?
[16:55:04] <rzl>	 don't forget this will make some noise in the SAL, worth !logging something ahead of it
[16:55:27] <rzl>	 blah blah live test blah blah no real user impact expected but we're monitoring blah blah etc
[16:55:38] <jelto>	 so DC_FROM eqiad -> DC_TO codfw is correct now?
[16:57:52] <rzl>	 eqiad -> codfw is correct yep
[16:57:55] <rzl>	 args lgtm
[16:58:00] <legoktm>	 +1
[16:59:50] <rzl>	 the good news is the service names are in the cookbook, so the mwdebug fix doesn't require a spicerack release afaict
[17:04:03] <legoktm>	 yep :)
[17:05:18] <rzl>	 cache warmup in eqiad is correct -- we might see some appserver latency alerts in eqiad, they're okay
[17:05:35] <legoktm>	 +1
[17:06:48] <legoktm>	 you can see the spike at https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=appserver&var-method=GET&var-code=200&from=now-1h&to=now
[17:08:09] <rzl>	 first warmup took 30 seconds, all the rest took ~15
[17:09:21] <jelto>	 I continue with 02-set-readonly ok?
[17:10:50] <rzl>	 I still see php processes on mwmaint2002, do we skip killing them in live-test mode?
[17:10:53] <rzl>	 checking
[17:11:06] <rzl>	 yeah we do skip it, okay
[17:11:13] <legoktm>	 yeah, since it's active
[17:11:46] <legoktm>	 however all the systemd jobs have correctly disappeared from mwmaint1002
[17:12:14] <rzl>	 👍
[17:12:22] <legoktm>	 jelto: lgtm to continue
[17:12:49] <rzl>	 jelto: go ahead and practice doing these steps one-after-another without pausing, if you want
[17:13:09] <rzl>	 and also practice watching in here in case we yell stop :)
[17:13:16] <jelto>	 ack
[17:14:22] <rzl>	 nice
[17:14:26] <legoktm>	 woot
[17:15:46] <jelto>	 for today I can continue right? Next week wo would look a little bit more if everything seems healthy?
[17:16:26] <rzl>	 yeah, here we would test edits, look at dashboards, etc
[17:16:44] <rzl>	 we don't want to wait *forever* with maintenance disabled, but we would pause and check things out
[17:16:53] <legoktm>	 there was an icinga alert for scs-c1-eqiad.mgmt.eqiad.wmnet rebooting just now
[17:17:03] <rzl>	 yeah I saw that, it has to be coincidence though right?
[17:17:48] <legoktm>	 topranks, XioNoX: around? ^^
[17:17:55] <rzl>	 I guess we could rerun and see if it reboots again :P hard to imagine though
[17:18:30] <legoktm>	 yeah, I think it has to be a bad coincidence
[17:18:58] <jelto>	 should I rerun some cookbook or continue?
[17:19:00] <legoktm>	 the only major traffic we sent was the warmup script, and that finished minutes before the alert fired
[17:19:03] <XioNoX>	 what did happen in parallel?
[17:19:08] <XioNoX>	 but yeah
[17:19:19] <legoktm>	 we're running the DC switchover live test (eqiad -> codfw)
[17:19:59] <legoktm>	 but it runs the warmup process against eqiad to avoid sending a bunch of requests to codfw, impacting real traffic
[17:20:15] <XioNoX>	 yeah coincidence
[17:20:37] <legoktm>	 ok, thanks for looking :)
[17:20:51] <legoktm>	 jelto: I think you can continue now
[17:20:54] <rzl>	 +1
[17:21:17] <XioNoX>	 legoktm: rzl https://phabricator.wikimedia.org/T238036#7342571
[17:21:49] <legoktm>	 oh, perfect
[17:23:36] <rzl>	 legoktm: have we ever talked about running 08-start-maintenance first, and moving all the other 08- cookbooks to 09-?
[17:23:49] <rzl>	 since there's more and more stuff in phase 8 now
[17:24:15] <rzl>	 cc volans if you're still nearby, and happen to have context on it ^
[17:24:19] <legoktm>	 no, but I think that would be reasonable
[17:24:33] <legoktm>	 I think the jobrunner step should probably be 08 still though
[17:24:40] <rzl>	 mm that's true
[17:24:50] <rzl>	 and it's quick anyway
[17:25:10] <volans>	 I think 08 was the catchall for all the cleanup stuff after we're back in RW and safe and sound
[17:25:11] <legoktm>	 so 08 is "get everything MediaWiki running again" and 09 is updating other things and resetting TTLs
[17:25:19] <volans>	 no problem to add additionals teps
[17:25:28] <volans>	 the TTL should be the last probably
[17:25:33] <rzl>	 yeah agree
[17:25:48] <volans>	 if you have some priprity post-RW steps
[17:25:53] <rzl>	 and I think in particular, maintenance scripts are the last thing people will actually be *waiting* for
[17:25:55] <volans>	 leave them in 08 and move the others in 09
[17:26:27] <rzl>	 not nearly as much as they're waiting for read-write, but it's still a little time-sensitive especially until the WDQS dispatcher rewrite
[17:27:15] <legoktm>	 yeah, we won't be at 100% edit rate if maintence doesn't update the WDQS lag status, which puts a hold on all Wikidata bots
[17:27:47] <rzl>	 so I guess the only question is, do we want to do this before Tuesday or not
[17:27:59] <rzl>	 we should probably start holding off on last-minute changes, but the rename is pretty low-risk IMO
[17:28:40] <legoktm>	 I think as long as someone dry runs it before Tuesday then it's fine
[17:29:09] <volans>	 +1
[17:29:15] <rzl>	 cool
[17:29:27] <rzl>	 imagine! phase nine. what a time to be alive
[17:30:00] <legoktm>	 nice job jelto :)
[17:30:05] <rzl>	 everything else lgtm
[17:30:12] <rzl>	 and yeah +1, smoothly operated
[17:30:18] <legoktm>	 we'll have a 5 phase lead on the MCU once again
[17:31:58] <jelto>	 thanks for the support! I leave the tmux session open for a bit in case somebody wants to check the backlog/output?
[17:32:17] <rzl>	 I think no need, it's all in /var/log/spicerack/sre/switchdc/mediawiki-extended.log
[17:32:38] <volans>	 btw you should ask people joining to keep the terminal large enough
[17:32:45] <jelto>	 okay then I will close the session
[17:32:46] <volans>	 as tmux will resize to the smallest one
[17:32:57] <volans>	 for your readability
[17:33:30] <legoktm>	 he tried using https://wikitech.wikimedia.org/wiki/Collaborative_tmux_sessions#Create_sessions_with_fixed_size but it didn't seem to work
[17:33:35] <jelto>	 i tried some fancy tmux setting which forces the screen size but I was not working. So yes I will put a disclaimer next time to use bigger screens
[17:33:48] <volans>	 lol, great for trying
[17:33:52] <rzl>	 that only works as of bullseye I think :(
[17:34:04] <rzl>	 at least if it's the same setting that I was looking at last time
[17:34:26] <volans>	 cumin2002 is on bullseye
[17:35:02] <rzl>	 really!
[17:35:05] <rzl>	 hmmmm.
[17:35:12] <volans>	 since a while!
[17:35:47] <jelto>	 but do we need to do another dry-run / live test on cumin2002 then first to be sure everything is working as expected?
[17:38:16] <legoktm>	 heh, I think we should do the actual switchover from cumin1001, and try bullseye next time
[17:38:34] <volans>	 why?
[17:39:00] <legoktm>	 do you think it's not that big of a change? or have been people running cookbooks from 2002 regularly?
[17:40:09] <volans>	 cumin2002 is a regular cumin host used by people and is on bullseye since before the summer. The /var/log/spicerack directory was created on july 7th
[17:41:11] <legoktm>	 ok, I take that back then
[17:42:36] <jelto>	 I can run the --dry-runs on cumin2002 if you renamed/added stage 9 before Tuesday if that helps. But I would also be fine just to stick with cumin1001. The screen size issue is not too bad 
[17:43:11] <rzl>	 you say that now ;) it turns out to be a lot more of an issue when a couple dozen people are connected, somebody always has a 10x24 terminal or something
[17:43:25] <rzl>	 just because they don't realize it's important, easy mistake to make
[17:44:24] <legoktm>	 we'll just have to nag people again, like we did last time :p
[17:44:43] <legoktm>	 but I think doing the dry-runs to verify the stage 9 stuff on cumin2002 sounds good
[17:44:45] <rzl>	 haha yep
[17:45:01] <jelto>	 okay :)
[17:46:11] <rzl>	 I can send a CR for the renames but I'll be offline this afternoon and tomorrow, so I won't be able to help test unless we do that on Monday
[17:46:34] <rzl>	 (also happy to let someone else do that rename too, no reason it has to be me)
[17:48:21] <rzl>	 I'm also not sure if https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/718936 will land before Tuesday or not, but no big deal either way
[17:57:35] <rzl>	 mailed https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/720075/ but feel free to merge it without me, I'll be offline for the week in about 1h
[17:58:26] <legoktm>	 I'll take a look shortly
[17:58:57] <legoktm>	 and j.elto and I just tested, the tmux on cumin2002 has the feature to force the screen-size to the person running the commands
[18:00:14] <jelto>	 I can do the dry-runs on Friday again, also on cumin2002 if thats ok for you.
[18:00:51] <jelto>	 but I might ask around if someone more experienced want to look over my shoulder :)
[18:01:15] <legoktm>	 sounds good to me
[18:01:43] <rzl>	 perfect
[18:02:27] <jelto>	 I'm out for today o/