[00:24:28] * bd808 off [03:05:20] * andrewbogott is down a puppetdb rabbithole and is now going to struggle to get up in time for SRE things tomorrow [09:08:25] morning [09:08:28] o/ [09:09:37] morn [09:18:05] morning [09:51:46] andrewbogott: seems like it was a locale issue (https://phabricator.wikimedia.org/P58736) that made the toolsbeta puppetdb fail to set up. I manually added en_US.UTF-8 via `dpkg-reconfigure locales` and then it worked [09:57:29] Rook: when you are awake, we got this question to answer in the context of the catalyst project: [09:57:31] Is Magnum ready for non-privileged users to deploy and maintain kubernetes clusters? [09:58:25] who is we? [09:59:19] well, a question for the WMCS team by the catalyst folks [09:59:27] I guess! [10:00:34] where was this question presented in? is there some place I can follow catalyst development that I'm not in? [10:00:56] ^same [10:02:36] there is a Catalyst meeting happening today, with folks from the DevExp group, and not all the WMCS folks are invited to the meeting [10:02:58] I got the question from the meeting document, which is the only agenda item BTW [10:03:45] if you are interested in attending the meeting, I would be happy to request an invite for you [10:04:05] (or, to invite you myself, I think I can edit the event) [10:06:55] it would be nice for those things to be open, as in not being hidden (not as in everyone should attend), I'm interested on the outcome, I'm ok with you representing the team in it [10:08:00] yes, I think it's concerning that not even everyone in our team was aware of these discussions, let alone people interested in following what's going on with catalyst outside the team [10:08:44] 'is magnun an option in this case?' seems like something that could have easily been asked on the IRC channel or the mailing list first [10:08:57] I agree [10:18:51] I joined the slack channel to try to increase the information flow [10:20:28] that is not a publicly accessible venue in the slightest [10:20:48] agree [10:20:59] but I can at least forward the info to a public venue [10:22:56] daily updates from the team members might help with that too (ex. * Catalyst meeting today about X, ...) [10:24:26] that is not something you should be forced to do [10:26:07] like, as far as I can see the last public activity about catalyst was some phabricator updates about a month ago [10:26:11] from my perspective, catalyst is "just another cloud vps project". I don't see that we have to interface with them beyond whatever they need in terms of ad-hoc "consultancy" on how to do X on cloud vps. [10:27:11] ok, in that case why are they not using our normal public support channels like any other cloud vps project? [10:30:52] because they can, probably? catalyst is a FY23-24 annual plan hypothesis, and the participating teams are all members of DevEx, so it's not unreasonable to want to collaborate more closely in that sense. But that still doesn't make it more than a cloud vps project, in terms of use of our infra. [10:38:57] yes, but unless the annual plan states that work must be done in private I don't think it's unreasonable for us to push that work (or at least the part we're advising on) be be less secretive [10:51:27] I don't think there's any intention to be secretive, just lack of being mindful to make it open, that's why I think non-formal information flow might help surface the information when that happens (not saying it should not be done at the source, just trying to make the system more reliable to human failure) [10:51:56] I can do a weekly catalyst update in our team sync going forward if you think that would be helpful [10:52:58] we stopped doing syncs during out team meeting, and moved to optional daily syncs instead [10:52:59] until now, there hasn't been any need. the project has been on hold for several months [10:54:16] I would appreciate a note in the daily channel whenever there's anything to share though (no need for it to be periodical) [10:54:24] sure [10:55:09] (or wherever really, if asynchronous and public better, so I and others can read whenever, and you can write whenever) [11:02:17] I don't expect there to be a lot of updates though. The current plan is to evolve Patchdemo https://patchdemo.wmflabs.org/ into Catalyst, and use some flavor of k8s as the backend via helm deployments. Patchdemo is currently a PHP monolith, and it lacks some of the features that were wanted for Catalyst, such as being able to deploy services (in the mediawiki sense). [11:03:15] thanks :) [11:27:54] arturo: yes. So long as the user is comfortable with tofu or command line deployment. Horizon still doesn't give a reasonable credential [11:28:05] Rook: ACK [11:42:13] can I get a +1 on T359923? [11:42:13] T359923: Request increased quota for eranbot Toolforge tool - https://phabricator.wikimedia.org/T359923 [11:43:28] (ty!) [11:45:48] taavi: done [11:52:44] TheresNoTime: `Update quota for tool eranbot from version '2' to version '2-T359923'` [11:52:45] T359923: Request increased quota for eranbot Toolforge tool - https://phabricator.wikimedia.org/T359923 [11:54:08] thanks, will try to keep it moving forwards :) [12:36:39] just fyi, grid engine webservices no longer work in toolsbeta as I'm testing patches to drop grid support from the front proxy in there [12:37:50] taavi: thank you for sorting out the toolsbeta/toolsdb thing -- I guess we'll see if it repeats in tools. [12:43:23] hmm. the grid engine master has stopped producing accounting log files which breaks the history view in https://sge-jobs.toolforge.org/ and https://grid-deprecation.toolforge.org/. I'm tempted to not fix that, unlessa anyone feels strongly about fixing it [12:43:36] are any of you running spicerack/cookbooks on your laptop on debian testing trixie? [12:43:41] me [12:44:03] I'm having a hard time installing afresh, with pip dependencies and such [12:44:25] do you remember having to do any special magic to get it working? [12:45:05] iirc no, at least the last time I installed a venv for that [12:45:14] would you like me to try again if it works now? [12:45:26] yeah, but don't destroy your current venv, just in case [12:45:32] use a new one [12:45:41] thanks! [12:46:03] ok [12:48:12] arturo: seems to work as expected: https://phabricator.wikimedia.org/P58766 [12:48:50] taavi: try with the spicerack repo [12:50:02] also works in a fresh venv, https://phabricator.wikimedia.org/P58767 [12:51:01] it uses a cached pyyaml (Using cached PyYAML-5.4.1-cp311-cp311-linux_x86_64.whl) [12:51:17] building that wheel is one of the things that's failing for me [12:51:34] building? [12:52:57] cteam: once I start building the new tools puppetserver we'll need to freeze the state of the old one, which means no new hacks in git repos, no new VMs, (probably) no deleting VMs until the switchover. Will that freeze mess with anyone's plans? [12:53:00] well, I thought that was the name [12:53:03] see also https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/1010507 [12:53:24] andrewbogott: works for me [12:53:52] arturo: can you post the full error? [12:54:12] * andrewbogott looks at taavi who is probably hoping to delete grid VMs on Thursday [12:54:28] yes, but I think we can get the migration done beforehand :-) [12:54:45] I won't need new VMs before that so it's fine I think [12:56:11] arturo: also, try installing the `wheel` package first [12:56:16] cool, thx [12:56:29] taavi: ack [13:01:58] arturo taavi is this the same problem as T345337? [13:01:59] T345337: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337 [13:02:41] * dhinus only here briefly and from my phone, I didn't look at the details [13:12:44] ^I have that issue too [13:13:09] my solution was to manually installed the patched spicerack on the venv, then installing everything else [13:13:16] (the one using opensearch) [13:35:41] dhinus, dcaro: I think that's exactly my problem, thanks! [13:36:58] \o/ shared problems are the best kind of problems! xd [13:38:41] * arturo very distracted by kitchen contractros [13:54:34] andrewbogott: puppet-git-sync-upstream.service is in 'failed' state in toolsbeta-puppetserver-1, known? [13:55:35] not known. I can look after this session [13:55:43] thanks [13:57:15] hm, "git-sync-upstream --base-dir /srv/git/" works fine [13:58:19] Mar 12 13:49:27 toolsbeta-puppetserver-1 git-sync-upstream[167062]: stderr: 'error: insufficient permission for adding an object to repository database .git/objects [13:58:22] this feels familiar [14:08:25] the dns leaks seems to be a persistent issue, I ran the script twice yesterday, and I'm running it again, is that something we should automate or fix? [14:09:07] * dcaro does not remember if it's still an open issue [14:15:15] * arturo food [14:28:50] dcaro: my current theory is that the leaks are a result of VMs created during an historic, broken period. VMs created today certainly don't leak anything, at least not the fullstack ones. [14:29:00] It might still be worth automating cleanup though... I'm not sure. [14:32:48] taavi: I did chown/chgrp -R gitpuppet in /srv/git and I think that fixed things. I suspect that same issue is present on the other servers I've built though [14:34:52] andrewbogott: ack, thanks! [14:42:21] I'm getting `WARNING nothing to export.` from /usr/local/bin/nfs-exportd I'm assuming I've misconfiguration it. I'm trying to get k8s-test-nfs in quarry to populate the /etc/exports.d/quarry.exports file any pointers on what I could be doing wrong? [14:46:43] I think that script relies on a static file from puppet, let me look... [14:47:53] do you have /etc/nfs-mounts.yaml ? [14:48:52] oh, you must or it would've produced a different error [14:49:12] Yes. Ah as you surmised [14:51:44] I don't 100% remember how this works, but I'm seeing this: [14:51:46] project: quarry-nfs.svc.quarry.eqiad1.wikimedia.cloud:/srv/quarry/project [14:51:49] in the config [14:52:00] that has a hostname in it, I bet the script checks for the current hostname against that [14:52:04] but I'm double-checking [14:52:34] if fqdn_is_us(host): [14:52:34] mountpoints.append(path) [14:52:47] Oh, do I have to go add that somewhere in puppet? [14:52:59] So, yes, will need to hack your config for testing and then add that in puppet eventually [14:53:19] Ok, thanks! [14:53:51] ultimately the puppet change will go in modules/cloudnfs/data/projects.yaml [14:57:02] so I wonder if we need that check at all anymore, with per-project NFS servers and all that [14:58:57] yeah, that config could be split into per-host configs probably [15:45:19] is anyone else interested in moderating the toolforge meeting in a bit? if not I'm happy to take care of it too [16:18:55] XioNoX: shall we cancel our network sync tomorrow? I'm guessing that you and topranks will be busy with offsite things [16:20:23] andrewbogott: yep [16:21:00] great, done [16:21:31] thanks! [16:33:42] taavi: looks like puppetdb-2 is working now. Thank you! [16:54:18] * dcaro off [17:00:26] * arturo off [18:20:29] * bd808 lunch [20:51:38] taavi: I'm building one more puppetdb server; can you tell me what you ran to fix the encoding? [20:52:24] `sudo dpkg-reconfigure locales`, select en_US.UTF-8 and make that the default, then log out and back in [20:55:05] did you need to reinstall posgres after that? Or something else? [20:55:43] I did an `apt purge` for the server package, not sure if that was actually necessary or not [20:55:52] ok [20:56:58] trying, we'll see how it goes [20:57:06] (meanwhile I'm adding that to the docs) [21:13:31] taavi, what did you do to get 'create user' to work? I'm still hitting all those errors. [21:13:37] (possibly because I didn't set the locale properly) [21:13:40] (but I tried) [21:16:08] andrewbogott: in general that means /etc/postgresql/15/main/pg_hba.conf was somehow not generated correctly [21:16:16] (brb, rebooting my bouncer) [21:17:58] * andrewbogott forces it to regenerate [21:43:37] * andrewbogott starts over [23:30:22] * bd808 off