[09:03:27] Short Description: Peering Request: From AS14907 WIKIMEDIA [09:03:27] Requested for: Faidon Liambotis [09:03:32] Opened: 2020-09-10 09:43:34 PDT [09:03:40] 2022-07-15 14:10:48 PDT [09:03:58] (name) would like to move forward and establish these peerings with you if you're still interested [09:04:29] just 22 months hahaha [09:04:44] hahaha [09:04:46] who is that? [09:05:35] it was sent to peering@, last Friday [09:05:47] trying not to out them in a public channel :) [09:06:18] I see it now, thx [09:06:40] paravoid: got lost in the emails about the IX port being down [09:06:59] I'll follow up with them [09:07:44] that's the "slow open policy" :D [09:09:26] XioNoX: you mean now, or in 22 months? [09:09:51] the slowest game of ping-pong ever [09:38:30] volans: the decom cookbook is neat! [09:38:46] thx :) [09:56:45] moritzm: looks like the decom is stuck with: [09:56:45] Shutting down VM netboxdb2001.codfw.wmnet in cluster codfw [09:56:45] ----- OUTPUT of 'gnt-instance shu...2001.codfw.wmnet' ----- [09:56:45] Waiting for job 1757100 for netboxdb2001.codfw.wmnet ... [09:58:34] nevermind, it's started again [09:58:42] continued I mean [10:03:47] yeah, I'm shuffling some VMs currently and when a live migration is in progress, this blocks until the ongoing migration is in progress [10:04:16] alright! [10:04:23] now it's waiting for: [10:04:26] Issuing Ganeti remove command, it can take up to 15 minutes... [10:04:26] Removing VM netboxdb2001.codfw.wmnet in cluster codfw. This may take a few minutes. [10:05:56] it should resume soon, there's currently another migration in progress which should complete in ~10m [10:06:47] no rush as long as the script doesn't timeout [10:48:41] moritzm: still syncing? I'm watching the decom for ar.zhel that had to step out [10:52:45] yeah. there's four other VMs getting moved to a new DRBD node before the removal happens, and one of them is puppetdb2002. and the source node is still on 1G, so this will take some more time [10:53:19] shuffling VMs over a 1G feels like transferring them via floppy disks :-) [10:53:57] ahahahah [10:54:04] ok no prob, just checking it was all still expected [10:56:45] it went through [10:58:08] ah yes, indeed [13:37:10] someone knows how to use the sretest hosts to test the decom/provision/re-image cookbooks? [13:37:33] can I just run the cookbooks or there are special things to know? [13:47:27] XioNoX: just announce it in -sre and ping anyone who is logged on then have at it [13:47:42] thx! [13:47:47] np [14:30:37] We're missing the steps for this workflow: https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Decommissioned_-%3E_Staged [14:32:22] the provision network attribute can't work as the interfaces already exist. Running the offline script seems a bit too much [14:32:26] better call volans [14:32:41] as it assumes we're unracking it [14:33:57] yeah, thinking out loud here. Maybe it's not an action done often enough to be worth it. Or maybe we should have a script to only do the IP assignment [14:40:20] sorry, in a meeting, be back to you shortly [14:40:44] no rush, thx! [14:55:56] XioNoX: ah so you decomm'ed it [14:55:58] and it lost the IPs [14:57:15] can you run the provision script in netbox for it? [14:57:40] this is basically something I will need to address for the make-reimage-work for VMs project [14:57:50] a way to keep the IPs on decom [14:59:00] I don't recall if the script has some check that prevents it to run in this situation [14:59:46] volans: the netbox provision script doesn't work as the host still have its interfaces [15:00:16] so I guess we could run the offline then re-do all the provision [15:00:38] no [15:00:40] but maybe we should have a shortcut [15:00:42] it woul dchange mgmt one too [15:00:55] ah right [15:01:46] so the "quick" way [15:01:58] which can't happen as it's configured and would need to have its ipmi wiped? [15:02:03] is to delete eno1/2 ens2f0/1 ifaces, keeping note of the switch and port [15:02:28] yes would need manual change on the ipmi and you will loose the capability to connect remotely to it [15:02:59] yeah switch port and switch port config (eg. vlan) [15:03:09] See also https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Rename_while_reimaging [15:03:14] that does something similar [15:03:17] decom + reimage [15:03:35] alright [15:45:57] re-image is now stuck on "No root file system is defined." [15:46:02] (in d-i) [15:48:39] but the hosts are in partman.. [16:02:27] here is the real error, someone knows what that means? https://www.irccloud.com/pastebin/vDCiO0IO/ [16:02:57] lvm wasnot setup [16:02:59] I think [16:03:18] and/or mdadm [16:03:50] is there a cookbook for that? [16:04:24] that's what it's support to apply: https://github.com/wikimedia/puppet/blob/production/modules/install_server/files/autoinstall/partman/raid1-2dev.cfg [16:24:36] were the disks wiped during the decom? [16:24:50] not sure if that might have any effect [16:34:13] yeah they got wiped