[12:35:44] btullis: regarding https://phabricator.wikimedia.org/T192551#8714226 I would suggest to start a new task and referencing that one rather than expanding that already closed one [12:36:16] And I just saw that Jaime suggested the same thing heh [13:01:43] marostegui: yeah, I wasn't really going to reopen it. Just adding a p.s. really. I stumbled across the ticket whilst looking at standard_packages. [13:42:15] who should I be asking to review https://gerrit.wikimedia.org/r/c/operations/puppet/+/896318? [13:45:00] taavi: if you can grab a +1 from dan.cy or jnuc.he then ill be happy to merge [14:53:37] taavi: same, I had +1 it earlier, happy to merge whenever someone from the folks involved with scap has +1d it [14:53:59] cool, thank you both. I'll get someone from releng to review it [16:28:18] I am doing some maintenance in db1145 for work related to db1150 crash- I have silenced alerts but heads up in case something complains in the next minutes (it is me) CC rzl, claime, jhathaway [16:28:29] ack [16:28:42] heads up as in, you can ignore it for the next hour or so [16:34:15] 👍 [16:34:41] for a ganeti vm with a public ip are there specific steps which need to be taken, before running makevm? [16:34:57] like reserving the ip in netbox? [16:35:27] looking at the ganeti docs in wikitech, but not seeing anything obvious [16:35:42] re: db1145 e.g. there is sometimes monitoring jobs showing failures if the pool of servers is too small [16:36:01] jhathaway: it's done by makevm itself, just pass the correct parameters for public network [16:36:24] and then there is the sre.ganeti.reimage cookbook to install the OS [16:37:55] ah wow TIL that public ips are also reserved by makevm [16:37:59] I thought it was still manual [16:38:01] nice :) [16:38:14] elukey: wasn't that the case? [16:38:49] volans: for some reason I though that only internal IPs were auto assigned, and stuff in subnets like https://netbox.wikimedia.org/ipam/prefixes/79/ip-addresses/ was assigned manually [16:39:04] ip_v4_data = prefix_v4.available_ips.create({}) [16:39:20] volans: looks like it blew up in generate_dns_snippets, https://phabricator.wikimedia.org/P45906 [16:39:22] yep yep very neat, happy about it :) [16:39:23] I think it has been always the case of auto-assigning as it has no logic to grab an already assigned one [16:40:57] jhathaway: that error (KeyError: 20463) happened in the past and was supposed to be related to netbox's cache management [16:41:02] it was a while since that happened [16:41:15] I wonder if the redis issue from yesterday might have anything to do with it [16:41:24] ah, interesting [16:42:47] jhathaway: could you just retry it to see if it's a transient thing? [16:42:55] yup! [17:11:12] nothing major to report from EU oncall except db1150 crash because of ram issues [17:13:29] cool [17:17:59] mutante: when you have a second, could you check this please? https://phabricator.wikimedia.org/T323262#8710550 I think you might have the historical context [18:15:26] volans: it does appear to be transient, second try succeeded [19:42:46] do we have any helper scripts for doing inplace dist-upgrades? [19:43:11] I only saw this, https://wikitech.wikimedia.org/wiki/Distribution_upgrades, which seems out of date [19:43:16] i don't think they are advised jhathaway [19:43:35] for sure, but sometimes they are helpful :) [19:43:55] in this case I need to test some software on debian bullseye [19:45:33] we have nothing documented (but would be good to have :-), but the steps are essentially: [19:45:37] - stop puppet [19:45:51] - edit sources.list to move to the new OS sources [19:45:58] - apt-get dist-upgrade [19:46:09] nod, thanks for the confirmation moritzm [19:46:15] - rm -rf /opt/puppetlabs/facter/cache/cached_facts [19:46:25] I'll add some docs, and maybe a helper script [19:46:28] - re-enable puppet and run it once [19:46:52] - apt-get dist-upgrade one more time (since the puppet run might have added some components) [19:46:59] - run puppet a second time [19:47:04] - eventually reboot into the new kernel [19:47:23] not 100% sure if the path to the cached facts changed [19:47:30] i'll verify thanks [19:47:39] but having a script eventually would be nice indeed [19:47:48] or a cookbook [19:48:05] jhathaway: thx for closing the loop [19:48:06] we don't do it very often, but still makes sense to eliminate human error :-) [19:48:19] yeah [19:48:25] as for dist-upgrade I'd avoid them as much as possible [19:48:41] for sure, but we are sprinting ;) [19:49:53] so? :D [19:51:54] jokes aside, I agree, but trying to be pragmatic for some of these bullseye upgrades [19:54:25] ok... [19:58:02] we have a script in fr-tech that we use but not sure how much overlap it would have to prod. [19:59:17] dwisehaupt: if you have a link I would to take a gander, to at least compare with my script, if I write one [19:59:28] *would like [20:01:52] it's in our internal puppet repo, so here's a paste with it: https://phabricator.wikimedia.org/P45908 [20:03:48] it shouldn't break too many things and could probably use improvement, but we have used it reliably for a while. [20:04:24] dwisehaupt: thanks! [21:46:56] volans: actually I am not sure myself about that ticket, I wasn't really involved in it, it was more the other people in the comment history [21:47:41] volans: well, involved a little bit I guess. I will ask via an email [21:48:23] ack, thx. If all the sustainability/onfire stuff is done I'd rather remove the related tags and leave the last bits of the task to its owners basically