[08:49:33] morning [08:50:03] o/ [09:02:55] o/ [09:04:59] FYI I plan to upgrade toolsbeta k8s today T359638 [09:05:00] T359638: toolsbeta: upgrade kubernetes to 1.24 - https://phabricator.wikimedia.org/T359638 [09:15:56] taavi: is this table expected to not be anywhere? https://wikitech.wikimedia.org/w/index.php?title=Portal:Toolforge/Admin/Kubernetes/Components&oldid=2120949#Third-party_components [09:19:40] arturo: david moved that to toolforge-deploy.git for some reason, https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/119 [09:21:10] ok [09:21:56] The reason were to be able to automate it in the future, and keeping the source of truth of what's installed in the same repo (so if you update one, you don't forget to update the other) [09:22:06] ok! [09:22:42] The alternative would be to move what's in toolforge-deploy to that wiki page, and parse that wiki when deploying (also doable, but feels more tedious) [09:24:14] is anyone interested in giving T359676 a go? I wrote the process/docs for updating that image so it would be useful to have someone else test that what I wrote is understandable [09:24:15] T359676: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T359676 [09:24:56] I am somewhat interested, but also have plenty of tasks mid flight already [09:25:40] I can give it a go [09:25:57] thanks! [09:26:43] in that sense, should users be warned in any way? (scripts might break right after pushing the image) [09:27:08] no [09:27:47] as this image is only for running the scripts that come with pywikibot itself, and the CLI of those is very stable (as compared to the pywikibot API which is not as stable especially on major releases) [09:30:33] ack, /me was still reading the breaking changes to scan for cli changes [10:08:44] taavi: is there any test I can do to try out the new image? (do we have a tool with pywikibot setup already?) [10:09:12] I've used https://toolsadmin.wikimedia.org/tools/id/wikitech-double-redirect-bot for testing [10:10:41] nice, let me try it out too [10:11:40] hmm... 2024-03-11T10:11:25+00:00 [fix-double-redirects-wmzv6] pywikibot.exceptions.NoUsernameError: Failed OAuth authentication for mediawiki:mediawiki: The authorization headers in your request are not valid: Invalid consumer [10:12:33] oh yeah. that's an issue with the tool setup, not pywikibot itself :/ [10:13:28] basically, wikitech has some pages using #REDIRECT syntax for redirecting to mediawiki.org, and not {{soft redirect}} like they should, and as wikitech does not use the same authentication system as the rest of the wikis it gets confused about that [10:15:07] is there any script I can run instead on mediawiki? it does not need to do edits, just listing something or similar might be enough (though probably failing to authenticate might be enough of a smoke test already) [10:15:39] s/on mediawiki/on wikitech/ I guess? [10:18:15] something like https://www.mediawiki.org/wiki/Manual:Pywikibot/listpages.py yes [10:21:17] I'd say mediawiki, as it can't login into wikitech (if I understood your previous comment) [10:22:22] no [10:22:26] it can log in to wikitech [10:22:32] but it can not log into mediawiki or any other SUL wiki [10:22:49] oh, okok, so I got it the other way around xd [10:25:06] that worked :) [10:25:07] thanks [10:27:42] taavi: done, it worked :), I did a few updates to the docs but they were good already (just extended some parts) [10:28:19] perfect, thanks [10:44:13] I think this is my first time updating kube-state-metrics via toolforge-deploy, please review https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/219 [10:49:25] You might want to upgrade the environments one by one, we should not have something in the deploy repo that is not actually deployed right away [10:49:28] (ex. tools) [10:49:34] unless you plan to upgrade tools right away [10:49:38] (then it's ok) [11:25:34] ok! [11:45:40] dhinus: what do you know about clouddb-wikireplicas-query-1.clouddb-services? It's running on Buster which makes me wonder if it's defunct (which would make that whole project defunct) [12:03:29] andrewbogott: I remember there was still someone using it, or wanting to re-check if they were using it [12:03:54] Adam was the last person to use that I believe [12:04:13] cool, I will give him a few hours to wake up [12:04:15] thx [12:12:43] andrewbogott: unless you're strongly opposed to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1009350 I'd like to merge that shortly [12:14:25] seems fine, I'm certainly not planning to fix it. [12:17:50] yeah. I think we'll end up deploying a striker instance for toolsbeta at some point but that's in eqiad1 [12:18:37] puppet-merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1010166/ too [12:22:49] thanks [13:00:12] * arturo distracted by kitchen contractors [13:02:48] Rook: quarry-puppet-master-02 has been shut down for almost a year, does that mean we aren't using local secrets there anymore? (I can also make a ticket for framawiki who actually did the shutdown) [13:04:10] Yeah the current (non-k8s) setup of quarry just copies the secrets right into the box, so as far as I know it isn't using any puppet secrets [13:04:30] great, is it ok if I delete that shutdown VM? [13:07:49] Seems fine to me [13:09:18] taavi: the docker registry FQDN is still docker-registry.tools.wmflabs.org right? [13:09:31] currently yes [13:09:34] ok [13:11:09] I just created T359816, please let me know if you think this should not be automated [13:11:09] T359816: toolforge: automate docker image caching workflow - https://phabricator.wikimedia.org/T359816 [13:14:41] taavi: when uploading to the docker registry from docker-imagebuilder-01, I get: [13:14:44] https://www.irccloud.com/pastebin/v6yGqp1O/ [13:14:47] rings any bell ? [13:15:09] try using tools-imagebuilder-2 [13:18:21] ok [13:19:15] worked this time! thanks [13:21:20] please approve https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/219 [13:25:44] added a comment LGTM depending on the answer xd [13:27:13] andrewbogott: puppet is failing on paws-puppetmaster-1, are you working on it or should I take a look? [13:27:30] (it's more like not running, the update of the puppet repo is failing it seems) [13:27:50] I can look. It shouldn't have any clients anymore though [13:28:07] dcaro: thanks, good catch, just sent an update [13:28:30] andrewbogott: I think there was still something last time I checked [13:28:55] do you mean paws-puppetmaster-2 or paws-puppetserver-1? [13:29:11] oh wait, puppetserver-1 yes [13:29:20] so not the master, sorry xd [13:30:45] it looks right to me, can you tell me what you're seeing? [13:31:09] (for context, 'master' is old puppet5 stuff, 'puppetserver' are the new servers I'm making today) [13:31:18] I got an email, oh, got another saying resolved [13:32:05] https://www.irccloud.com/pastebin/zb4ZJnRV/ [13:32:23] ^that is the sync cron, but puppet is working again 👍 [13:32:42] ok, probably it just emailed mid-setup. [13:33:36] arturo: I'm getting errors locally with the metrics upgrade though [13:33:47] E0311 13:33:40.036409 1512134 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request [13:33:57] might be just slow to start I guess (pulling images) [13:33:59] dcaro: ACK, I'm investigating [13:34:18] also, I think the common/ override system is not granular enough [13:34:20] oh, yes, it's up again [13:35:26] look ok now to mee, just took a bit to start [13:35:46] about the common/ override, the patch updates image.version [13:35:59] but how does helmfile know to which chart use these values? [13:36:08] I think we need to somehow namespace them [13:37:29] the overrides are per-release, the common/kube-state-metrics is only applied for the kube-state-metrics [13:37:58] but yes, there's no namespacing, so if two common/* would have the same key value, it would override both [13:38:31] which is the case for image.repository and image.version [13:39:07] we could add another potential override file, like `values/{{ .Environment.Name }}.kube-state-metrics.yaml*`, that would allow to override only the kube-state-metrics [13:39:14] release [13:41:50] (same for each release) [13:42:29] then a global environment.yaml to store the kubeVersion and friends? [13:43:24] yep [13:43:31] ok! [13:43:37] good idea [13:44:38] something like https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/220 (untested and raw, just for the sake of explanation) [13:45:01] yea [13:45:43] I don't remember; do we need to list the new value files in the environments: section? [13:46:37] only the generic ones, that have stuff we use inside the chart itself [13:47:02] `chartVersion` [13:47:55] ok [13:50:23] dcaro: I think we can merge your patch as is [13:50:33] did you test it? (I have not xd) [13:50:41] feel free to take it [13:50:48] I have not tested it [13:52:00] will test it later! [13:52:04] * arturo food break [13:52:13] thanks! [14:17:21] taavi: before I start writing it... there's no puppet code for puppetdb w/puppet 7 on wmcs yet is there? [14:18:38] andrewbogott: no.. but I think we should be able to just re-use whatever wikiprod uses [14:18:49] yep, hopefully! [14:32:55] seems like I can just point the new server to the old puppetdb host. Best I can tell that's what prod is doing. [14:36:19] \o/ toolforge jobs board archived [14:36:22] 🎉 [16:30:49] andrewbogott: for toolforge-specific trove databases, what do we usually do? We create a full cloudVPS project? or create it inside toolforge + set some auth? (ex. T359785) [16:30:50] T359785: Request increased quota for pm20-* Toolforge tool - https://phabricator.wikimedia.org/T359785 [16:31:59] dcaro: I think we've been creating a special cloud-vps project with 0 instance quota [16:32:01] it's interesting because the use only needs very little data, so a really small postgres db would be enough, but toolsdb is mariadb xd [16:32:50] and postgres in trove is pretty terrible. I don't really know what to tell people who want it. [16:33:11] They might be better off with a VM and administering their own postgres if they're up to it. [16:33:17] hmm, I wonder if just installing postrgres in a bulidservice image, and running it from there might be good enough (we don't have yet services for continuous jobs, but there can be a workaround) [16:33:38] i don't want that in NFS [16:33:47] If that's a thing you can do it would be interesting to try it. but not it... [16:33:53] yeah, not if it's stored on NFS :) [16:34:39] taavi: nnono, it's just temp storage, they want it to process the data and do a dump in another format, not to store it [16:36:03] hmm, there's a point there on having to set it up on every restart kind of though [16:36:31] (user auth and such) [16:42:35] dcaro: I just tested & updated this patch: https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/220 [16:42:52] (helmfile will complain if there is no file) [16:43:37] LGTM, I think there was a way to make helmfile not complain, but I'm ok with the files there, makes it a bit clearer that you can override stuff [16:43:55] ok [16:46:29] deployed, will merge [16:46:36] (noop) [16:50:01] \o/ [16:50:03] thanks! [16:52:36] taavi: just in case you've seen this before... [16:52:36] Server Error: Failed to execute '/pdb/cmd/v1?checksum=185619aca16bf8714d7175e8ce4203439fe047eb&version=5&certname=toolsbeta-sgecron-02.toolsbeta.eqiad1.wikimedia.cloud&command=replace_facts&producer-timestamp=2024-03-11T16:48:40.827Z' on at least 1 of the following 'server_urls': https://toolsbeta-puppetdb-02.toolsbeta.eqiad.wmflabs [16:52:59] That doesn't show up in the server log at all but I'm guessing that's someone not being able to talk to puppet db? [16:53:05] thanks to /pdb/ [16:55:04] does the client talk directly to the db? I thought it was all moderated by the puppetserver [16:56:00] dcaro: I also just tested this, should be now ready to merge: https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/219 [16:56:48] thanks [16:56:51] andrewbogott: no, it's the server talking to puppetdb [16:57:14] arturo: oh, I think I megred it? I wanted to approve :facepalm: [16:57:34] iirc, puppet 7 server can't talk to a puppet 5 puppetdb, but a puppet 5 server can talk to a puppet 7 puppetdb [16:57:36] dcaro: np, should be fine [16:57:44] arturo: it's deployed already, awesome [16:58:11] * dcaro is thinking on finishing up the day xd [16:58:19] taavi: ok, I'm sure I'll encounter that problem but first I need to figure out the firewall [16:58:28] hm [16:59:42] ah, it has ferm [17:00:22] andrewbogott: I think this https://github.com/wikimedia/operations-puppet/blob/d7c8beca73d61c95fe852e5265ddf05431613eb9/modules/profile/manifests/puppetdb.pp#L131C22-L131C41 needs changing to `wmflib::class::hosts` with the puppetserver profile [17:00:38] or alternatively change https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/refs/heads/master/toolsbeta/toolsbeta-puppetdb.yaml#8 to add the new puppetserver too [17:00:55] * arturo offline [17:00:59] yeah, looking at that instance puppet thing now [17:12:27] dcaro: with the sub projects for buildservice and job service now closed, are folks supposed to triage things by adding that subject tagging that you have been doing? [17:13:10] I liked having the actual phab tags, but I can see how y'all might be trying to get to a unified backlog view for #reasons [17:14:14] bd808: to some extent yes, though it would be added by us anyhow, the idea is to create a triaging workflow that is enough for us to be able to go through all the tasks eventually. Yep, the tags would have been awesome if you could add multiple at the same time, and still see all the tasks in the main view [17:14:55] we are still pending though on defining that process, joanna is in the SRE summit, and Francesco in pto [17:15:10] (they were the main pushers for it with me) [17:16:32] let's say that we are trying to see what works, and what works is a moving target xd [17:18:53] y'all are not the only ones shooting at that target either. ;) [17:41:04] * dcaro off [17:41:06] cya tomorrow [18:19:54] * bd808 lunch [19:46:43] The "collaboration of the open" theme for Wikimania 2024 feels like a good fit with WMCS technical and social support of the community. Who has an idea for a talk to propose? https://diff.wikimedia.org/2024/03/11/apply-to-speak-at-wikimania-2024/ [19:47:59] If anyone wants to dust off the core ideas of my past talk at https://wikitech.wikimedia.org/wiki/User:BryanDavis/Developing_community_norms_for_critical_bots_and_tools and breathe new life into it, I would be glad to try and help with advice. [20:13:37] On our nfs systems there is a magically updated exports file in /etc/exports.d/.exports [20:13:38] I see that role::wmcs::nfs::standalone creates the /etc/exports.d/ directory, though I don't see where the file comes from. Did I miss it in that role or is it from somewhere else? [20:19:54] I think it's called nfsexportd? Let me look... [20:20:01] Rook: is that what modules/cloudnfs/files/nfs-exportd.py makes? [20:20:23] cloudnfs::fileserver::exports seems to provision that service [20:20:38] I was close! nfs-exportd. [20:21:33] I did my usual `git grep -l exports.d` search in ops/puppet.git to look for the thing I vaguely expected to be there [20:22:03] I did 'grep -ir export * | grep nfs' and then looked for things that were familiar [20:46:41] Oh I see it makes nfs-exportd.service which runs every ten minutes. Looks like it was dead on my box. Thank yinz!