[07:16:07] 10Puppet, 10Infrastructure-Foundations, 10Wikidata, 10wdwb-tech, 10User-Ladsgroup: Migrate wikibase-dispatch-changes crons to systemd timers - https://phabricator.wikimedia.org/T288175 (10Ladsgroup) >>! In T288175#7289164, @Ladsgroup wrote: > Maybe after a while we should delete the old logs, I can put a... [08:51:33] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: ulsfo: (2) mx80s to become temp cr[34]-drmrs - https://phabricator.wikimedia.org/T295819 (10ayounsi) a:03RobH `mgmt` ports to the `mgmt` switch please :) Once we have this and console, we can check and upgrade them. [09:15:16] interesting, NTT operates SGIX [10:07:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Can't commit on asw-b-codfw - https://phabricator.wikimedia.org/T295118 (10ayounsi) This will cause a hard downtime for 6 servers (rack [[ https://netbox.wikimedia.org/dcim/racks/57/ | B7 ]]), for up to 1h, but most likely less: (1) thanos-be2002... [10:22:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Can't commit on asw-b-codfw - https://phabricator.wikimedia.org/T295118 (10LSobanski) Adding @MatthewVernon for the Swift hosts. [10:25:23] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Can't commit on asw-b-codfw - https://phabricator.wikimedia.org/T295118 (10LSobanski) [10:48:11] 10SRE-tools, 10netops, 10Analytics, 10Data-Engineering, and 3 others: an-worker hosts: Netbox - PuppetDB interfaces discrepancies - https://phabricator.wikimedia.org/T295763 (10BTullis) If we look at another host that is not in the list, but was purchased and installed at the same time as an-worker110[45]... [10:53:05] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE: Netbox - PuppetDB audit 2021-11 - https://phabricator.wikimedia.org/T295762 (10BTullis) [10:53:30] 10SRE-tools, 10netops, 10Analytics, 10Data-Engineering, and 3 others: an-worker hosts: Netbox - PuppetDB interfaces discrepancies - https://phabricator.wikimedia.org/T295763 (10BTullis) 05Open→03Resolved Committed. The results are here: https://netbox.wikimedia.org/extras/scripts/results/1924060/ Resul... [10:53:54] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Can't commit on asw-b-codfw - https://phabricator.wikimedia.org/T295118 (10MatthewVernon) I don't think so, no - the frontends will not route requests to down servers (at least in theory!); we'll be more vulnerable to failur... [11:06:05] 10SRE-tools, 10netops, 10Analytics, 10Data-Engineering, and 3 others: an-worker hosts: Netbox - PuppetDB interfaces discrepancies - https://phabricator.wikimedia.org/T295763 (10Volans) Thanks a lot! [11:36:56] jbond: are you planning to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/736499? you +1'd it already, and I don't think it is waiting for any other reviews [11:38:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Can't commit on asw-b-codfw - https://phabricator.wikimedia.org/T295118 (10BTullis) I don't believe that we need to do any prep or depooling work for furud.codfw.wmnet We can downtime it in Icinga, but I think that's the lim... [11:41:54] majavah: sorry i forget you dont have +2, merged now [11:42:38] heh :D thanks [11:43:19] is ops access even possible for us volunteers? [12:38:37] majavah: re ops access i have no idea [13:44:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Can't commit on asw-b-codfw - https://phabricator.wikimedia.org/T295118 (10Gehel) The elasticsearch cluster should be able to cope with loosing 2 nodes with no issues. Thanks for flagging this, and please ping @RKemper and m... [15:20:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Can't commit on asw-b-codfw - https://phabricator.wikimedia.org/T295118 (10lmata) @ayounsi after a chat with the team we think we should be fine, we will monitor and be available should something happen. [15:41:19] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: ulsfo: (2) mx80s to become temp cr[34]-drmrs - https://phabricator.wikimedia.org/T295819 (10RobH) [15:43:05] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: ulsfo: (2) mx80s to become temp cr[34]-drmrs - https://phabricator.wikimedia.org/T295819 (10ayounsi) If you can take pictures of the front panels that could be useful to instruct remote hands when they get to drmrs too. [16:34:40] 10netops, 10Infrastructure-Foundations, 10SRE: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by cmooney@cumin2002 for hosts: `rpki2001.codfw.wmnet` - rpki2001.codfw.wmnet (**FAIL**) - **Host steps... [16:53:16] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: ulsfo: (2) mx80s to become temp cr[34]-drmrs - https://phabricator.wikimedia.org/T295819 (10RobH) [17:32:03] moritzm: are you about? [17:32:30] I'm re-trying the rpki VM rebuild in codfw again (had a few problems yesterday - hit that same issue with console a few times) [17:33:01] Out of interest in that circumstance - when ganeti console didn't work - no DHCP was hitting the install server in the DC from it. [17:33:30] So seems like problem may be more than just the virtual tty not working [17:34:01] I'm being cautious today so if you could tell me if this looks good I'd appreciate it: [17:34:02] https://gerrit.wikimedia.org/r/c/operations/puppet/+/739580 [17:35:21] Current status is VM created successfully with cookbook and next step is to merge that and power it on [17:57:01] topranks: looking [17:57:26] topranks: is rpki2001 gone? [17:57:52] yeah removed via cookbook, and I manually verified references are gone in Netbox / DNS too. [17:58:12] it crashed yesterday (ran out of disk), otherwise would have left it in place until rpki2002 was ready. [17:58:55] +1ed [17:59:26] ok thanks, I'm doing the same steps that failed yesterday but maybe luck is on my side today :) [17:59:29] we'll see! [17:59:46] lol ok [18:13:11] And magically the console is working now :) [18:14:05] yay [18:17:45] the installer is unattended right? seems to have paused on the menu screen asking for partition layout... I'm fairly sure I didn't accidentally hit a key or anything (previous menus just zipped by) [18:21:15] yes it should, unless there are new questions that the preseed is not answering ;) [18:23:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10cmooney) Change went well in ulsfo earlier. De-pooled the site in DNS first and then proceeded with steps as outlined above. All went as expected. Did tak... [18:24:07] It's still on that screen... should I hit enter to proceed with the default "Guided - use entire disk" ? [18:24:26] it's definitely not a new screen in the installer same old one that was in previous debian too [18:25:01] does the globbing in puppet match the new hostname? [18:25:06] maybe there is no partman recipe matching [18:25:43] nope [18:25:47] topranks: see modules/install_server/files/autoinstall/netboot.cfg [18:25:47] I think it should yeah. Moritz ended up doing this part for rpki1001 (didn't change name on that rebuild), but they all start with "rpki" so I expect it's fine. [18:25:50] let me check [18:25:50] not matching [18:26:09] rpki[12]001) [18:26:19] ah.... my bad. [18:26:47] I should have checked, I get what I deserve for assuming. [18:26:58] :D [18:27:41] From the wiki I kind of assumed they were all *, didn't know we were being as specific as we are. [18:27:55] we are [18:28:18] I can probably just shut the VM, correct that and merge, then start it again and it'll boot into the installer? [18:28:35] if you make it boot into pxe yes [18:28:51] (boot_order=network) [18:29:09] I haven't changed that to disk yet so it should still be set. But I'll verify. [18:29:13] Cool - thanks for the help! [18:30:06] anytime [18:46:42] that did the trick btw, system is installing :) [18:48:17] great [19:20:34] best to convert the globbing to rpki*, given that those will always be VMs in the future as well [19:21:11] moritzm: yeah that was my thinking, but then I figured we must be being so exact for a reason. [19:21:28] I'll submit a new CR now in a moment, agree it'd be more flexible in future. [19:22:16] Would the same logic apply for manifests/site.pp ?? Or perhaps we want to be more specific there? [19:22:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Can't commit on asw-b-codfw - https://phabricator.wikimedia.org/T295118 (10ayounsi) For the record, there is also a link to lvs2007, after chatting with @bblack on irc, the usual `disable puppet then stop pybal` is to do bef... [19:24:03] for site.pp being specific is useful since it controls whether a given host gets managed by puppet or not and we want those to be precise [19:24:41] for the partman globbing we've fixed most cases over time to use globbing, but there's a few left still [19:27:02] Ok I can see the logic in that. [19:27:16] I'll change the partman stuff now shortly [19:27:17] thanks [19:30:48] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: ulsfo: (2) mx80s to become temp cr[34]-drmrs - https://phabricator.wikimedia.org/T295819 (10ayounsi) [19:42:05] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 (10cmooney) 05Open→03Resolved Ok both VMs have been rebuilt with 20GB disk and updated to version 0.10.2. rpki1001 remains with the same name, r...