[11:41:25] hardware routers and ASICS are not really optimal for NAT, which is the cloudgw's main role [11:41:45] most CG-NAT implementations are done on specialised appliances but they are mostly cpu-bound x86 devices under the hood [12:48:09] dhinus: the kernel errors is something you are playing with? or a real alert? (or both? xd) [12:49:19] kinda both :D [12:49:43] it's a new server that was rebooted or booted for the first time [12:49:58] and logged some errors on boot [12:50:26] okok, I was going to ask this morning but got in a meeting and then forgot xd [12:51:03] the alerts will disappear in 24h, but they are also a good opportunity for testing T382961 [12:51:04] T382961: Kernel error metrics have overlapping definitions - https://phabricator.wikimedia.org/T382961 [12:51:22] there's a patch there but I want to refine it further [12:56:04] 👍 [12:56:22] left a message in one, but let me know if you want reviews [13:39:09] hey folks, is ar.turo back around or still out? [13:51:14] topranks: I'm still out for a few weeks, I'm sorry [13:55:38] no probs at all arturo nothing urgent, we might skip the netbof chat today in that case [13:55:45] hope you're doing ok <3 [14:20:29] * andrewbogott is here and ready whenever [14:20:38] * dcaro around, let me join the meeting [14:21:10] oops I'm oktad [14:24:25] topranks: we're ready for you :) [15:09:09] andrewbogott: I'll leave a note in the qos task with the couple commands to enable the qos config on an osd and cluster-wide [15:09:46] ok -- you mean there are other things besides the ferm change? [15:13:21] yep, we have to enable a couple config on ceph side, so ceph flags packages for hearbeats between osds, from osds to mons [15:14:52] ok [15:18:35] ok guys I'm happy with the new patch [15:18:36] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1109434 [15:18:45] if you want to have a look and we can jump back on to merge? [15:19:43] be there in 2 [16:26:01] dcaro: I restarted all the osds in cloudcephosd1040. Think I should leave it there for now or restart more things? [16:26:59] we can try an osd in a different rack, just to make sure [16:27:14] cloudcephosd1038 for example [16:29:36] ok, I'll do that one [16:32:44] thanks! [16:39:29] andrewbogott: looks good :) [16:40:10] yep, so far so good [16:51:46] folks update from me [16:51:56] checking both the switches and the graphing things look good [16:52:03] everything working as expected :) [16:54:07] great! So we'll leave it as is until Monday, then restart the rest of the osds? [16:59:15] topranks: hey, is there anything I can do to move T379283 and/or T379282 forward? afaics those are still sort of blocked on the ip allocations [16:59:16] T379283: IPv6 support in cloud-private - https://phabricator.wikimedia.org/T379283 [16:59:16] T379282: IPv6 for cloud-realm services - https://phabricator.wikimedia.org/T379282 [17:02:37] taavi: I'll try to take some time tomorrow to do the allocations [17:03:01] it's not tricky but the DNS changes for the IPv6 reverses are cumbersome :( [17:03:08] so probably easiest I take care of that [17:03:25] how do we want to assign the end-host IP addresses? or have we thought of that? [17:03:55] what we can do for the existing hosts is copy the last octet of their IPv4 address and use the same for IPv6 [17:04:26] that should work [17:04:27] so for instance if a host is using 172.20.1.27 we can make it's v6 address 2a02:ec80:a000:0201::27 [17:04:57] *or* we can copy what we do elsewhere in prod and use the whole thing, make it 2a02:ec80:a000:0201::172:20:1:27 [17:05:25] I'm fairly agnostic either way. the shorter address in the first way probably appeals to me more [17:06:07] let's go with the short ones then? [17:06:23] ok cool [17:06:30] I'll take a look in the morning [17:06:34] thank you!! [17:06:54] I will hold off on adding IPv6 AAAA records in Netbox for now probably [17:07:06] let's get the networking working and test comms are ok, then we can add them [18:36:00] topranks, here is one of the hosts waiting for cloud-private networking setup: T378825 [18:36:01] T378825: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825 [18:51:33] I made a merge request for T383357, but since I gave the +1 there somebody else should probably at least approve the MR. [18:51:34] T383357: New flavor for integration project for larger worker testing - https://phabricator.wikimedia.org/T383357 [18:58:38] LGTM, though I think you have to run the plan on the patch or something using a cookbook to test it so it can show the actual changes it's going to make [18:58:43] * dcaro looking for the docs [18:59:53] thanks dcaro. the tofu stuff in that repo isn't something I have used yet. [19:00:12] found it https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/OpenTofu#Usage [19:01:09] I hope you can run that cookbook, otherwise we should fix it asap [19:02:43] running now. it outputs a lot of stuff :) [19:04:14] oh it reports on the MR. neat! [19:05:39] yes it does :), let me take a look, if the output is simple I might be able to approve (if it's not, you might have to wait for someone that knows better...) [19:06:40] The tofu repo I created for deployment-prep runs the plan and apply via gitlab ci -- https://gitlab.wikimedia.org/cloudvps-repos/deployment-prep/tofu-provisioning/-/jobs/407460 -- with the apply requiring manual approval to fire. [19:07:13] would be nice to set that up there too yep [19:07:45] there might be a task around [19:07:56] btw. I +1d the patch, looks good :) [19:10:18] thanks! I will try out the wikitech deployment instructions after I have my lunch. I think this change should only need to run in eqiad1-r? [19:11:36] it will create the flavor also in codfw [19:11:52] (for testing/consistency mostly) [19:12:07] ah, and just not assign it to any projects, sure [19:12:28] I just noticed that if you create a project-specific flavor and the project doesn't exist in eqiad the whole thing fails, but if the project doesn't exist in codf1dev it seems to still work? [19:12:48] (this news brought to you courtesy of 'wikitexexp' which was supposed to be 'wikitextexp') [19:13:50] andrewbogott: aren't the projects specified per deployment? [19:14:25] yes, which is why I was surprised. Maybe it skipped the flavor in codfw1dev entirely, but it /said/ it was making it... [19:14:27] * andrewbogott checks [19:15:14] the plan said it would create the flavor in codfw, but only assign the project for eqiad [19:15:49] so what happens if we create a private flavor without any project assigned? I would've though that impossible [19:17:13] I guess it's fine, it's just a flavor that no one can use. Seems harmless [19:17:21] iirc it's possible, it's just not a very useful flavor [19:18:47] I'm curious to see what happens :), but I have to log off, I'll read the scroll on monday [19:18:50] cya! [19:18:52] * dcaro off [20:58:43] wow. my horizon login just did ~30 redirects between idp and openstack.eqiad1.wikimediacloud.org before finally completing. [21:00:30] andrewbogott: it looks like you merged and applied that tofu patch, is that correct? [21:00:36] yep! [21:00:40] that's fine, right? [21:01:06] yeah, totally. thanks [21:01:20] regarding the redirects: I see that on labtest pretty often but haven't on 'real' horizon. Do you have a theory about why it happens? I would certainly like it to not, and it's likely due to some weird keystone think I misconfigured. [21:03:36] I'm not sure. The one time I got tired of waiting to see if it would complete I purged my session cookies for both horizon and idp and it worked right away after. I think we'd need logs from the keystone side of things to try and figure out why it redirects back to idp instead of accepting the inbound OIDC data. [21:04:35] yep, also my experience that it works on a cold login... [21:04:43] I will at least open a task for it if you haven't already [21:05:13] This time I was in a state where my session had been authed to horizon before I left for lunch and when I got back my first page change sent me into the auth loop. [21:32:28] * andrewbogott enters T383370