[00:28:30] Rook: re today's list, there is no global account for "Séc". [00:29:40] Oh, I suspect that was a two word account and I only copied one of the words. Skip that one, if they come back can block it then [00:31:09] ok - locked the others - will investigate IPs later [00:31:22] Thanks! [09:00:21] thanks dcaro \o :) [13:39:10] puppet is failing in cloudcontrol2006-dev [13:39:49] Rabbitmq::User[designate]/Exec[rabbit_designate_create]/returns) Error [13:41:21] systemd unit drain_rabbitmq_notification_error.service is failing [13:41:42] I don't think I know offhand what is that about [13:41:58] me neither... I'm having a look [13:44:04] I'll try a random "systemctl restart rabbitmq-server" [13:45:16] that fixed the other unit [13:45:20] let's try puppet [13:45:25] what was the error in the unit log? [13:45:47] the error was "Exception: Received 500 Internal Server Error for path /api/queues/%2F/notifications.error" [13:47:44] puppet is also happy [13:49:18] we should rename rabbitmq to monkeymq [13:50:59] :D [13:51:57] arturo: how does cloud vps public ingress work on ipv6? just send traffic to the IP of a VM, or is there something similar to the floating IPs on v4 that limits direct access? [13:52:57] taavi: direct access controlled by security group [13:53:13] i see [13:53:19] I will add explicit info here about that here https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/IPv6/initial_deploy [13:53:50] and I assume additional VIPs for keepalived use can be allocated like they can on v4? [13:53:51] arturo: gentle reminder to archive the etherpad and update the channel topic when you have 5 mins :) [13:54:15] taavi: yeah, that trick is made via neutron ports, so similar mechanism [13:54:23] dhinus: ack, thanks for the reminder [13:57:01] taavi: https://wikitech.wikimedia.org/w/index.php?title=Portal%3ACloud_VPS%2FAdmin%2FIPv6%2Finitial_deploy&diff=2244678&oldid=2244667 [14:03:50] * arturo nursery run & food break [14:27:46] hmm, in Horizon I see https://labtesthorizon.wikimedia.org/project/instances/46d63ccf-dca3-4466-b865-99a4a8a6a17c/ (in proxy-codfw1dev) has been allocated '172.16.129.215', but DNS does not agree: https://phabricator.wikimedia.org/P71057 [15:48:50] * andrewbogott waves at taavi [15:49:05] dcaro, the eternal life of tools-sgebastion-10.tools.eqiad1.wikimedia.cloud is related to the ongoing anomiebot work right? [16:09:15] andrewbogott: FYI david is usually not online on fridays [16:09:38] I think I knew that, he might catch in the backscroll :) [16:10:06] ok [16:11:01] useful reminder though! [16:11:08] I tried deleting and re-creating a new codfw1dev VM and it still did not get a DNS name :( [16:13:41] andrewbogott: I'm pretty sure the answer is yes to the eternal life of the buster bastion [16:14:40] It serves a noble purpose [16:27:42] taavi: don't rule out designate problems [16:27:48] I have been seeing them all month [16:28:01] restarting rabbitmq + all of designate usually solves the problem [16:28:27] let me do that [16:31:29] taavi: try again [16:31:30] ? [16:31:48] one moment [16:32:57] hah, it works [16:33:54] arturo: thank you! [16:34:45] 🎉 [16:36:20] argh, the bookwork image in codfw1dev is old enough so that the cumin ssh rules have the old bastion IP, so if the first puppet run fails you need to fix that manually to use the puppet cert refresh cookbook :/ [16:38:51] taavi: want me to update that image? [16:41:46] good to annotate that somewhere, we will have similar problems in eqiad1 eventually [16:42:28] andrewbogott: if it's easy, sure :P [16:42:52] If it's not easy then something is broken. I'll give it a go [16:50:05] arturo: just realized we can't share keepalived VIPs between legacy and vxlan networks :/ [16:50:46] that makes sense, as they are on different L2 domains [16:50:55] what would be the use case? migration? [16:51:05] yes [16:51:35] what service do you think would be most impacted by that? [16:52:15] toolforge redis is the only place where i think it can actually cause some problems [16:52:24] otherwise it should be fine, even if it adds some extra manual work [17:02:04] ok [17:03:58] andrewbogott: I just noticed https://gerrit.wikimedia.org/g/operations/puppet/+/2d223927974f8789958b1edcfd163cd3b6503a07/hieradata/cloud/codfw1dev.yaml#54 needs updating before the new image would be helpful [17:05:41] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1091772 [17:06:50] isn't the IP for -03 wrong as well? [17:07:18] I'm seeing 172.16.129.190 [17:07:32] not sure [17:07:37] we could probably just drop it entirely [17:26:08] andrewbogott: I can't find at the moment how to tell horizon which network to pick in the new VM panel [17:26:49] I probably have that ui disabled and forced to a default right now. [17:27:10] Should be easy to re-enable but will take a bit to redeploy. Are you talking about just codfw1dev for now? [17:27:49] found it [17:27:49] https://gitlab.wikimedia.org/repos/cloud/cloud-vps/horizon/deploy/-/blob/2024.1/runtime_local_settings.py?ref_type=heads#L470 [17:28:15] well, we will need to enable that UI soon as part of the migration to vxlan/ipv6 [17:28:22] I can create a phab ticket [17:28:24] sure, do you mind making me a ticket? [17:28:26] thx :) [17:28:35] If it's blocking you from testing I can work on that today [17:28:51] no, I'm about to log off [17:31:15] ticket is T380081 [17:31:15] T380081: horizon: enable the UI to select networks on VM creation panel - https://phabricator.wikimedia.org/T380081 [17:31:43] thx! [17:41:04] * arturo offline [19:19:52] $ curl --connect-to ::[2a02:ec80:a100:1::e2] https://wmcs-proxy-test.taavivaananen.fi [19:19:52] hello from taavi-backend.testlabs.codfw1dev.wikimedia.cloud via the web proxy! [19:19:55] :-) [19:31:23] There's a new bookworm base image in codfw1dev now -- it doesn't seem to work with nova-fullstack but I'm not convinced that it's any worse than what was there before