[10:23:46] Emperor: 11:30 today works for me [10:26:52] hnowlan: when I said "I have meetings 11:30-12:00 and 16:00-16:30 UTC, pick a time :)" I meant "a time when I am not in meetings" :) [10:26:59] sorry, that was a bit snarky. [10:27:02] oh, d'oh. [10:27:26] 11? shouldn't have to last long. [10:27:36] hnowlan: cool, it's a plan :) [10:28:33] effie: XioNoX, I'm on purpose not disabling puppet on alert hosts before putting new mw appservers in production, in agreement with godog, to check if this step of the production process is still needed [10:30:59] cool cool [10:59:38] hnowlan: here and ready to stare at swift dashboards :) [11:02:38] Emperor: cool! pooling in a second [11:04:04] going to pool at weight of 4 as opposed to 10 to ramp up slowly [11:04:24] seems good [11:10:08] alright, turning the weight down further [11:10:22] capacity still not enough [11:10:51] 5xx rate is way lower at least [11:12:34] I didn't see a spike in 5xx on the ATS graphs at least...? [11:13:42] there's a little bump here https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=upload&var-origin=swift.discovery.wmnet&viewPanel=12&from=now-30m&to=now [11:14:30] and I was seeing NOSRV in the haproxy logs [11:14:38] but still, not as chronic as other times [11:19:43] Cool; that bump is smaller than the variation we see generally. [11:43:37] Emperor: I think that's enough for now, I've depooled. thanks! [11:54:46] NP [16:51:08] does anyone know why cookbook sre.hosts.reboot-single expects Puppet to be enabled and fails otherwise and if there is a way to skip that? [16:51:12] can't seem to find it in the docs [16:51:31] the usecase here is that we can't disable Puppet before rebooting an LVS host as we need to make sure pybal is stopped [16:52:34] sukhe: have you seen --enable-puppet ? [16:52:47] volans: yeah but I think that enables Puppet? [16:52:51] after the reboot [16:52:53] if it is already disabled [16:52:55] ohhh [16:53:04] or not [16:53:04] wait [16:53:06] Enable Puppet with a specific reason. (default: None) [16:53:08] cc jbond [16:53:13] this doesn't seem to be that I think [16:54:03] no you're right it re-enables it right efore depool and then reboot [16:54:53] I'll defer to jbond that added htat option recently [16:54:57] I should do a git log on it to see why this changed but I don't recall this being the case, so I was mostly curious [16:55:03] volans: thanks not urgent [16:55:23] the other option is to improve sre.loadbalance.restart-pybal to support also the reboot action [16:57:28] there are many things that should be a cookbook. we will get to them slowly I guess! [16:57:33] many processes I meant [16:58:30] automate all the things :) [17:22:14] sukhe: to be picky, you can still use it, with --enable-puppet, just checking when the puppet timer is supposed to run and just run the cookbook few minutes after the last run so that it doesn't risk the race condition that puppet will run before the reboot [17:23:46] ah ok, so that was the intended purpose [17:26:17] I think the indented purpose was: I disable puppet with reason 'foo', then do stuff, then want to reboot and have puppet re-enabled on reboot ensuring that it runs @boot time [17:26:33] we could argue that it could mask the systemd timer in a non-permament way [17:26:59] so can surely be improved, but I'll leave to j..bond to comment more precisely [17:27:11] * jbond missed the pings reading scroll back now [17:27:12] no problem, thanks for responding. we can discuss it tomorrow! [17:27:22] jbond: no, get off the internet :P [17:27:44] https://www.youtube.com/watch?v=YHf28w7LB8c [17:34:35] hi all sorry dropped of just aftr sukhe said "jbond: no, get off the internet :P [17:34:43] suspicious ;) [17:34:57] let me repaste what i said [17:35:01] so my use case for this was so that i could disable puppet fleet wide and the reboot a host (specificaly the puppet infrastructure) [17:35:04] the reboot cookbook use to hang for adges wating for a puppet run aftr it cam back online [17:35:09] o the idea of the new flag was that it would enable puppet before doing the reboot so that when it came back online the cookbook could detect the new puppet run and finish early [17:35:17] however now i think about it i thikn i have changed the behaviour and made it so the cookbook failes if puppet is disabld instead of the old behaviour which althugh slow still works [17:35:25] sukhe: dose that sound sound about right from what yu have seen? [17:37:21] jbond: that does sound right, yep! [17:37:37] I think at least from the point of view of some hosts (and maybe others) [17:37:43] some -> LVS [17:37:59] we need to keep Puppet disabled as otherwise it might restart pybal [17:38:13] ack [17:38:41] I think what we can do is have an optional flag to skip the Puppet disabled check perhaps [17:38:56] or disable the run puppet? [17:39:00] the timer [17:39:01] looking at the code it dose actully continue the reboot, however it now skips the self.puppet.wait_since and self.icinga_hosts.wait_for_optimal calls if puppet is disabled [17:39:18] s/does/should [17:39:36] jbond: and yeah so it fails execution [17:40:13] let me check the logs i think im missing something [17:40:14] volans: sure, I am not sure what the best way is (or why the change was done but now I understand) [17:40:25] which cumin host did you use? [17:40:29] 2002 [17:40:46] * jbond looking [17:40:49] to be clear, that's why I didn't tag you or volans [17:40:55] this can definitely wait till tomorrow or even next week [17:41:02] sur sure :) [17:41:03] :) [17:41:45] and what host? [17:42:10] I tried the cookbook with lvs4010 [17:56:35] sukhe: seems something wrong with my connectionso i probably will call it a day now [17:56:41] <3 [17:56:49] probably the song I sent :P [17:56:52] enjoy! [17:57:08] lol :) [17:57:18] will do thanks [19:04:42] Isn’t there supposed to be a wikimedia email for WMF group [19:04:42] See https://ldap.toolforge.org/user/Sharvaniharan [19:05:19] User is in -operations