[09:05:03] morning! [09:12:05] o/ [09:18:58] o/ [09:26:47] hello! I'm working today, travelling to kubecon tomorrow [09:39:23] same here! [09:43:52] same :) [09:44:56] oooh, have fun at kubecon! [10:09:28] new toolforge bastion alert: tools-bastion-12.tools.eqiad1.wikimedia.cloud [10:09:36] please give that a test and report back if you have any issues [10:12:08] where is my beloved -sge- keyword [10:12:15] I miss something [10:33:06] there was an interesting thread in the "#talk-to-qte" slack channel last friday: creating a new cloudvps project had the side effect of overriding a DNS entry (pixel.wmcloud.org) that was pointing to a different project [10:33:54] I'm not sure where that DNS was configured, and if it's something that could happen again when we create new cloud vps projects [10:34:19] that's an interesting bug.. can you file a task? [10:34:29] sure [10:38:32] looking for a review for https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1010520 [10:42:58] taavi: LGTM [10:43:24] please review https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/223 [10:50:25] I'm about to upgrade toolforge k8s control plane to 1.24 [10:56:15] taavi: T360294 [10:56:15] T360294: [cloud-vps] creating a new project can override existing DNS entries - https://phabricator.wikimedia.org/T360294 [10:56:34] arturo: approved [10:56:38] thanks dhinus [10:58:01] thanks [11:05:20] arturo: I manually approved your message to cloud-announce [11:05:30] dhinus: thanks [11:05:38] let me find why it was not auto-approved, maybe you need to be re-added to the list? [11:07:36] I added you to the "owners" for that list [11:08:25] thanks! [11:25:21] taavi: the tools control plane is now in 1.24. Would you like to double check stuff before I upgrade the workers? [11:26:12] I can poke around a few things [11:27:53] on a quick glance everything seems fine. I'd say go ahead, thanks [11:28:01] ok, thanks [11:28:07] everything looks good on this side too [11:29:53] proceeding now [11:35:07] what is tools-filesystemtest-1? is it still needed? if so, it needs an OS upgrade [12:08:48] I have no idea [12:24:20] quick review for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1012368? [12:27:13] with regards to `tools-filesystem-1` it was last used by brooke in 2021, I think we can get rid of it [12:27:39] ok, will do [13:10:22] * taavi paged for harbor [13:11:54] Mar 18 12:58:39 tools-harbor-1 puppet-agent[2637805]: (/Stage[main]/Profile::Toolforge::Harbor/File[/etc/docker/daemon.json]/content) -} [13:11:54] Mar 18 12:58:39 tools-harbor-1 puppet-agent[2637805]: (/Stage[main]/Profile::Toolforge::Harbor/File[/etc/docker/daemon.json]/content) \ No newline at end of file [13:11:54] Mar 18 12:58:39 tools-harbor-1 puppet-agent[2637805]: (/Stage[main]/Profile::Toolforge::Harbor/File[/etc/docker/daemon.json]/content) +} [13:12:17] sigh [13:14:57] it's back [13:16:27] dunny why docker did not auto-start the containers after the docker.service restart [13:27:10] taavi: I'm upgrading ingress nodes now [13:27:14] FYI [13:27:22] ack [13:33:06] completed [13:33:28] \o/ [13:34:25] so finally we get to the elephant in the room: T279110 and T335131 [13:34:25] T279110: Replace PodSecurityPolicy in Toolforge Kubernetes - https://phabricator.wikimedia.org/T279110 [13:34:26] T335131: Toolforge: replace admission controllers with an existing policy admin project - https://phabricator.wikimedia.org/T335131 [14:20:07] arturo: the decision wasn't to use one of those for the PSP and then reconsider? (instead of replacing admission controllers right away) [14:20:16] I'm about to switch the central cloud-vps puppetmaster to a new server. Ideally it won't be very noticeable (and won't affect toolforge in any case) [14:20:47] every time I try this script it fails for a different reason... is it me or is it it? https://www.irccloud.com/pastebin/OzCCHceB/ [14:21:03] blancadesal: you can use the packages built in cy [14:21:05] *ci [14:21:11] blancadesal: https://gitlab.wikimedia.org/cloud/toolforge/toolforge-envvars-cli 404s [14:21:49] oh this was renamed [14:21:54] the url is https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli [14:22:30] (with `/repos` and `s/toolforge-//`) [14:22:49] dcaro: what's the way to use the ci-built packages? is it easier? [14:23:54] ssh `tools-services-05...; wget ; for deploy in toolsbeta; do for repo in buster bullseye bookworm; do aptly repo add $repo-$deploy ./toolforge-builds-cli_0.0.13_all.deb && aptly publish --skip-signing update $repo-$deploy; done; done` [14:24:10] then repeat with `tools` instead of `toolsbeta` after testing [14:24:20] you have to manually upgrade on the bastions though [14:24:40] and "we should automate that stuff"™ [14:24:48] xd [14:25:41] there's a cookbook to do the toolsbeta->tools copy [14:26:49] should be easy to automate fetching the package from the CI of the release commit [14:27:04] (last words of every rabbit hole) [14:27:04] I want to do that in CI eventually [14:27:33] semi-related: quick review for https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/32? bumping the version + build tools to bookworm only [14:32:28] tried the script again... it actually works if you supply it with the right input xd [14:35:24] * arturo food [14:44:06] taavi: could I get a quick review of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1012389? (supporting autosign on puppet7) [14:44:12] looking [14:45:35] thx! [14:45:45] -1'd, sorry [14:46:20] I don't think I've used 'variant', do I need to know anything other than "it can be either a bool or a path"? [14:46:51] that's pretty much it. https://www.puppet.com/docs/puppet/7/lang_data_abstract.html#variant-data-type [14:47:39] well that seems better [14:50:11] Why does `role::wmcs::nfs::standalone` cause `Notice: /Stage[main]/Profile::Wmcs::Nfs::Standalone/Service[nfs-server]/ensure: ensure changed 'running' to 'stopped'`? [14:50:59] most likely due to missing `profile::wmcs::nfs::standalone::cinder_attached: true` from hiera [14:52:59] When I set that to true, it fails on a hostname lookup. Is my sense that I can't have this active on two different servers in one project at the same time accurate? [14:53:35] ok taavi, updated [14:53:57] Rook: I would expect that to work, presuming they're mounting two different cinder volumes [14:54:26] Unless there's a hostname hardcoded in there someplace :( [14:54:49] `$host_prefix = regsubst($::hostname, '-[^-]*$', '')` [14:55:07] Yeah that appears to be the place that it is coded in [14:55:11] so you need different prefixes for the host names, and different service ips [14:56:05] Sorry what is a different prefix for a hostname? I parse that as "hostname" from hostname.domain [14:56:07] andrewbogott: looks good, just running a PCC before +1'ing [14:56:18] so the typical NFS server hostname is something [14:56:18] ok! [14:56:24] like project-nfs-2 [14:56:34] 'project-nfs' is the prefix part it's extracting [14:57:14] So I can have multiple quarry sections in /etc/nfs-mounts.yaml ? [14:57:28] andrewbogott: +1 [15:05:47] Rook: I'm not fully sure I understand what you're trying to do.. but yes, as far as I can tell `profile::wmcs::nfs::standalone::volumes` controls which sections of modules/cloudnfs/data/projects.yaml to host on that server, project name is not used there. now support in the client profile is a different question [15:07:21] Thanks, I don't think I understand this well enough to use it. I think I'll solve the issue I'm having a different way [15:15:17] I got an email from zulip about GSOC on wikimedia, is that something we are using now? [15:16:06] sounds familiar at least [15:16:41] yep https://www.mediawiki.org/wiki/Outreach_programs/Zulip [15:49:16] dcaro: yes, it's used for both outreachy and gsoc [15:52:08] just logged in (had to reset my pass), how/when was I added to it? [15:57:25] taavi: I was confused -- I don't think dcaro is going to kubeconf so we can/should keep doing our checkins with him this week. [15:57:53] I'm not no, I'm grounded until july at least [15:58:35] There are worse towns to be stuck in [15:58:44] Agree :) [16:11:34] :-P [17:07:17] * arturo offline [17:25:35] I thought I had made progress on one of my long running wikibugs mysteries, but I eventually figured out that I just watched a Kubernetes cluster upgrade. :rofl: -- https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/15#note_74379 [17:49:45] andrewbogott: I assume you're looking at the 'Error: Could not send report: Server hostname 'puppetmaster.cloudinfra.wmflabs.org' did not match server certificate; expected one of cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud, DNS:puppet, DNS:cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud' errors [17:49:45] already? [17:50:11] yep, that's https://gerrit.wikimedia.org/r/c/operations/puppet/+/1012414 [17:50:50] is `profile::puppet::agent::dns_alt_names` not suitable here? [17:51:15] My by-hand tests suggest that it doesn't do anything [17:51:21] even though it seems like it ought to be the right solution [17:51:40] well, also, that goes in the 'agent' section which seems wrong? [17:51:44] I guess I didn't test that, let me try... [17:51:46] you need to re-generate the certs after changing that setting [17:52:49] and I guess they're agent certs so not copied over from /var/lib/puppet/server on the old puppetmaster? [17:53:16] do you prefer that to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1012414 ? [17:53:45] I'd prefer using the already-existing method, yes :-) [17:54:23] OK. Seems like a 70% chance that regenerating these certs will just break everything forever but I'll give it a try. [18:04:33] ok, that seems to have fixed puppet on the clients but broken puppet on the server itself... [18:15:12] * dcaro off [18:27:15] taavi, would you expect me to need to include 'puppet' in the alt names? That standalone name was working previously but seems to not be now. (I've tried both including it and not including it with the same results) [18:27:25] Also I need lunch! Maybe this will be clearer on a full stomach [18:27:49] I would expect that we need both that and the full puppet.cloudinfra.wmflabs.org (wmcloud.org?) name [18:29:44] yep, I definitely tried that [18:29:48] but will try again after I eat [19:10:21] * bd808 late lunch because meetings [23:27:07] * bd808 off