[08:09:26] morning [10:13:01] g'day [10:13:34] [oops, thought this was a different channel, but g'day all the same] [10:20:04] TheresNoTime: why would we not normally get a g'day? [10:25:24] this channel is serious business only /joke [12:34:44] taavi: hey, just checking what the plan is for the cloudrabbit servers? [12:36:11] we're still planning to move them into cloud-private yeah, I guess that's possible now that cloudvirt-wqds was moved? [12:43:00] taavi: RELATED: I'm gonna try a firmware upgrade on the NIC in cloudvirt-wdqs1002 see if we can get some life out of it [12:43:23] what was the re-image command you ran for it? if it's ok I'll retry that after and monitor what's going wrong [12:59:08] topranks: hey! yeah, cloudrabbits are in theory ready to be moved now. I was talking with Andrew and Francesco about them yesterday, I think the agreement was that we want to wait until dhinus is done with the current openstack upgrade cycle since openstack is very sensitive to any rabbitmq operations and we don't want to have two [12:59:08] openstack-disrupting things going on at once [12:59:56] no probs yeah, I was working on an updated patch for the cloud-in filter which got me thinking about them [13:00:04] so was just checking the status [13:00:32] so very much in the roadmap still, but unfortunately can't be done just quite yet [13:00:36] I think the cloud-in filter isn't relevant to them though, comms is from cloud servers on their 10.x IPs to cloudrabbit WMF public IP right now [13:01:04] which will remain allowed (on labs-in filter), but best to move them anyway [13:01:12] yep all good thanks for the update! [13:01:16] for cloudvirt-wdqs1002, the command I was trying to use was 'sudo cookbook sre.hosts.reimage --os bookworm -t T346948 --new cloudvirt-wdqs1002'. it didn't boot to the debian installer. cloudvirt-wdqs1001 just needed to have the new NIC added to the boot order, but 1002 didn't see any links on the new NICs. I was meaning to ask dc-ops about that, a [13:01:16] firmware upgrade seems like a good approach [13:01:21] T346948: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 [13:02:11] taavi: thanks for that command, I'm upgrading the idrac now as a requirement to doing the nic firmware, hopefully it's as simple as that [13:02:18] btw if you get a chance to review: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/970767 [13:02:26] we certainly want to test in codfw first [13:02:39] looking [13:02:46] no hurry [13:14:06] seems the firmware was up to date (can only see version after idrac upgrade) [13:14:21] I'll try the reimage anyway, but I suspect it may be faulty NIC or DAC cable [14:17:14] taavi: I think that is probably a physical issue, I asked Valerie to have a look on the task [14:17:26] thanks [15:35:21] topranks: you'll probably have to run the provision cookbook to configure the boot order to use the new nic on cloudvirt-wdqs1002 [15:40:13] could well be right :) [15:40:15] https://usercontent.irccloud-cdn.com/file/HGvPNoI5/image.png [15:41:36] bd808: andrewbogott: balloons: can I have the chanserv flags needed to update the topic here? [15:42:41] I don't know how to do that offhand but certainly don't object [15:45:43] taavi, give me a moment and I'll update chanserv [15:47:31] taavi, you should have plenty of permissions now :-) [15:47:48] yep, thanks [16:00:58] taavi: seems the card in cloudvirt-wdqs1002 wasn't fully seated on the motherboard, it's installing the OS now [16:01:35] John also noticed cloudvirt-wdqs1001 has the network card in the wrong port (port 2 not 1) which might cause us problems down the road [16:01:55] is it possible to shut it down and move the cable? I'll change the references in /etc/network/interfaces to match [16:02:28] topranks: yeah, that's fine, just downtime the host first so it doesn't page [16:02:43] taavi: sure, will do [16:43:48] cloudvirt1001-wdqs is back online now, John also fixed a bad cmos batter issue [16:44:24] taavi: looks ok I think, one element I'm not sure if correct after the connection move from ens2f1np1 to ens2f0np0 is the vlan interface for the instances [16:44:29] that's there, and connected ok [16:44:54] https://www.irccloud.com/pastebin/B4R8kXdS/ [16:45:17] but the device is not a member of the bridge on the system [16:45:21] https://www.irccloud.com/pastebin/lLlPkyAT/ [16:45:41] perhaps openstack changes that if there are VMs on the box, currently no tap interfaces so I assume there are not [16:46:05] did you try rebooting the host already? [16:46:45] it was just booted cold after the port move / battery fix [16:47:06] yes, but after fixing /e/n/i I mean? [16:47:12] yeah [16:47:24] aha [16:47:30] I'll try re-creating the VM on there [16:47:35] I did that before power down, and no signs of an issue [16:47:43] taavi: actually can you wait one sec? [16:47:52] sure [16:47:58] BIOS settings reset when old batter was removed [16:48:16] John went in and re set them up - but I think he failed to enabled virtualization extensions [16:48:25] file /dev/kvm doesn't exist [16:49:02] ah, that'd do [16:49:04] I'll reboot it again and enable that before you try to make the VM [16:50:41] in the meantime, I'll fix the puppet config for cloudvirt-wdqs1002 post-move [16:59:42] taavi: ok cloudvirt-wdqs1001 is back online and /dev/kvm shows [16:59:58] strangely there is no bridge device on it now after reboot, but openstack creates that [17:02:45] I told nova to reboot the canary instance running on it [17:03:32] and I can ssh in, so succes! [17:04:21] topranks: how is cloudvirt-wdqs1002 going? [17:04:52] let me check [17:05:22] taavi: seems ok, it's into the first puppet run stage of the reimage cookbook [17:06:50] good news on 1001 too :) [17:21:22] taavi: cloudvirt-wdqs1002 reimage finished ok :) [17:22:05] topranks: thanks, my canary VM there is having some problems so I'm having a look [17:22:17] system just rebooted - was that you? [17:22:20] yep [17:22:33] ah ok [17:38:47] topranks: got a VM to schedule correctly on 1002, so I think we're all done there [17:39:23] Great! [18:12:56] * bd808 lunch [23:31:44] * bd808 off