[07:08:55] morning [08:35:15] o/ [11:05:33] hey folks the maintain-kubeusers refactor should be ready for review https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/23 [11:06:11] given is a large change, I wouldn't mind doing some more intense review. I can explain even the code in a videochat [11:07:11] after some reviews, I am interested in deploying in toolsbeta to give it a more actual environment beyond lima-kilo [13:17:39] Hello! I am back from traveling but out again on Monday for the US holiday. Is there anything pressing I can/should look at before I dive into my many phab emails? [13:18:47] andrewbogott: hey! T365096 seems relevant to what you were doing before you left so I've been saving it for you [13:18:47] T365096: Adding new members to Cloud VPS project fails - https://phabricator.wikimedia.org/T365096 [13:19:12] thanks, I'll look [13:20:36] yeah, very likely broken by the upgrade [14:14:34] Can a cloud vps project be renamed? Or do we just make a new one to remove - characters? [14:16:43] Making a new one and deleting the old is the best plan if possible. [14:17:06] Some things (e.g. dns domains, cinder volumes) can be tediously transferred from one project to another. VMs typically can't. [14:17:37] I am pretty sure that trying to rename an existing project is a fool's errand but I haven't really tried it [14:23:36] Sounds good [14:24:11] Can I get a +1 on T365822? [14:24:11] T365822: TfInfraTest project - https://phabricator.wikimedia.org/T365822 [14:30:47] done [15:08:13] slyngs: did you get your labtest ldap account sorted? [15:59:57] Is there something happening with web proxies. I'm getting "Can't use domain foo.wmcloud.org" doesn't seem to be using the project subdomain when I try to create a web proxy [16:34:46] Nothing broken that I know of but it could be nonetheless broken! Does it fail the same no matter what domain name you try? [16:35:51] there's usually a better error message in the invisible-unicorn logs on the active proxy api host [16:36:01] in this case it thinks something else is already using that name: Rejecting can_use_hostname (cloudinfra foo.wmcloud.org.), found existing records: foo.wmcloud.org. [16:37:40] that is what I suspected [16:42:03] Interesting. When a proxy is deleted is its name not recollected? Trying with tf-infra-test and I'm getting a different error, though can't use it in the new project space while it is deleted from the old project [16:42:57] Hm, it /should/ be reclaimed. [16:43:16] So you just deleted foo.wmcloud.org from the old project? [16:44:15] tf-infra-test.wmcloud.org [16:45:05] hey folks, i'm currently in the process of decommissioning blubberoid and i found a reference to it in https://gerrit.wikimedia.org/g/cloud/instance-puppet/+/5d315de6374f62679018a12ee06a6f0948d0058c/traffic/traffic-dnsbox.traffic.eqiad1.wikimedia.cloud.yaml [16:45:25] is that something i need to be worried about when following https://wikitech.wikimedia.org/wiki/LVS#Remove_a_load_balanced_service ? [16:45:34] or can i clean it up later? [16:46:07] that's something internal to the traffic team [16:46:32] removing the service might or might not break things on their VMs depending on if that hiera code is still consumed by a living VM [16:47:16] dduvall: I'm assuming that blubberoid is not involved in my 'docker build --target production -f ./.pipeline/blubber.yaml' workflow? [16:49:54] ah, ok. i will go back to traffic with that then [16:50:35] andrewbogott: nope. the blubber buildkit frontend does not rely on blubberoid. it supersedes it [16:50:50] great, then I have no opinions [16:50:52] soon blubber will not even produce a dockerfile [16:52:26] probably that will confuse me all over again [16:52:41] :D [16:53:35] your user experience will not be affected if the native-llb port goes according to plan, and as we know plans never fail [17:05:25] topranks: I'm reimaging a server (T364984) and the deb installer is asking me for a netmask and a gateway. Does that suggest anything specific to you? [17:05:25] T364984: cloudvirt1041: can't boot after reimage - https://phabricator.wikimedia.org/T364984 [17:06:10] andrewbogott: yep probably means dhcp failed inside the Debian installer [17:06:18] let me have a look [17:06:25] thank you! [17:06:39] this can sometimes be due to incompatible firmware version on the Nic [17:06:39] partman also fails but as far as I know those two configs have nothing to do with each other [17:06:53] (Except that I guess if the network is busted it probably can't download partman config) [17:06:58] indeed yeah they are separate [17:07:07] yep [17:07:15] I have the console open here but can disconnect if you'd like it [17:07:20] also the fact it made it to the installer means the first DHCP (done by bios) was ok [17:28:06] I'd never rule out a hw issue, but the fact we get DHCP up and some network makes me think that's less likely here [17:28:16] certainly cabling looks ok [17:28:40] If partman is waiting for confirmation but got the config then that's somewhat expected. But if it got no config at all, that's new to me [17:29:01] andrewbogott: I'm not 100% sure what's expected [17:29:18] the system has two ~500GB disks, but the suggested raid config doesn't look quite right to me [17:29:28] very much outside my area of knowledge so I could be wrong though [17:29:43] I don't get why it prompts either, if it's being pushed to it [17:29:46] It should just do a mirror raid of the two drives [17:30:47] yeah exactly [17:31:50] ok actually what it shows probably looks ok then [17:32:05] it sees a raid 1 device with 479GB capacity, which makes sense if they are mirroring [17:32:09] not sure why it's prompting [17:32:43] I've quit there now, being honest I can't work it out [17:33:01] you could maybe try proceeding hitting enter, although no guarantees it'll leave a working system [17:33:22] I will restart the cookbook and see what I get [17:33:42] I have lost many hours trying to get it to not ask for that 'press enter to confirm' step and have mostly stopped trying to fix it. [17:33:58] the NIC is a BCM5720, which is less usual for us, on firmware 21.x which is normally good, but not 21.85 which is the "best", at least for BCM57412 card [17:34:09] so perahps a firmware change to 21.85 is worth a shot [17:34:21] if the same thing happens again I'd say maybe asking in dc-ops channel, they may have seen it before [17:34:35] ok! [17:34:39] thank you for looking [17:34:44] network-wise things look ok anyway [17:35:07] np! sorry I couldn't be more help [17:35:26] you may have been help! But won't know for 10 minutes or so :) [17:39:47] it said 'network autoconfiguration succeeded' but is now back to asking me for the netmask [17:39:51] so I'll see about a firmware change [17:42:14] * andrewbogott -> lunch + brain reset