[07:48:48] morning [07:49:34] hmm, it seems I don't have the right permissions to change the topic on wikimedia-cloud. I wanted to remove the k8s upgrade message [07:52:29] done [07:52:31] morning [07:52:35] IRC permissions are a mess [07:55:40] thanks! [07:56:36] arturo: do we need to remove the downtime or did you do that already? [07:57:51] I can't find it browsing the list of silences. I believe I created it for 4h only [08:00:44] ok [08:01:56] morning [08:08:10] I have ops, but chanserv tells me I can't give ops [08:08:55] well well chanserv [08:13:37] blancadesal: you don't have a wikimedia cloak either, iirc it was used in some places, though I think it's not for this channel (https://meta.wikimedia.org/wiki/IRC/Cloaks) [08:16:28] is having a wikimedia cloak linked to channel permissions? [08:17:49] it was at some point at least, though it was never clear to me if it stopped being linked, or if it was more of a "Ahhh, I see you have the cloak, I'll give you permissions manually" kind of thing [08:19:27] as far as I know, there is a bot that only accepts commands from folks using a wikimedia cloak, in particular the bot that understands the `!status` command to change the `Status:` string in the -cloud channel topic [08:19:46] other than that, the cloak doesn't really work for anything [08:20:44] blancadesal: you don't have nick protection enabled it seems [08:21:09] https://wikitech.wikimedia.org/wiki/Libera_Chat [08:21:33] that "might" be one reason why it does not let me give you access [08:23:15] dcaro: I've enabled it now [08:29:24] blancadesal: it did not help xd [08:29:28] I see it enabled though [08:31:44] I'm able to change the topic on this channel... what's the difference? [08:32:52] `blancadesal has flags +Aefiorstv in #wikimedia-cloud-admin because they are logged in as blancadesal.` [08:33:06] but when I try to give you the same flags on -cloud it says I have no rights :/ [08:33:40] I think I need to be set as 'founder' of the channel or something to give rights [08:35:29] thanks for trying :) [08:35:43] I'm missing the 'f' flag for myself on -cloud, I think that's what prevents me from giving access to you (I can change/give access on this channel) [08:37:58] ` +f - Enables modification of channel access lists.` that one xd [08:39:25] bryan and andrew have the -f flag [08:40:25] arturo: you have it too [08:41:00] I'd be happy to copy and execute the commands you think I should run [08:41:19] `Founder : Majavah, bd808, andrewbogott, balloons` [08:41:26] for -cloud [08:41:28] so xd [08:42:38] oh, arturo then you can `/cs flag #wikimedia-cloud blancadesal +Aefiorstv` (you might want to open the chat first with chanserv, /query chanserv) [08:43:36] https://usercontent.irccloud-cdn.com/file/gMbMpUe9/image.png [08:44:04] once you have a chat open with chanserv, you don't need the `/cs` prefix [08:44:22] https://usercontent.irccloud-cdn.com/file/RQ1odv13/image.png [08:45:00] 🤦‍♂️ it's `flags` plural [08:45:22] https://usercontent.irccloud-cdn.com/file/to7PjA31/image.png [08:45:27] \o/ [08:45:30] can you do the same for me? [08:45:40] (so I get f, and give access to others) [08:45:55] https://usercontent.irccloud-cdn.com/file/MJQmU8RR/image.png [08:45:56] blancadesal: can you verify you can op yourself? [08:45:59] \o/ [08:46:46] it worked for me (I can give access now on -cloud) [08:46:56] arturo: thanks!@ [08:47:07] you are welcome [08:48:27] yes it works! [08:48:56] thanks arturo and dcaro! [08:59:46] if anyone runs functional tests in toolsbeta/tools, you can use https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/426 and run as your user [09:05:14] and if you use this one, you also get parallel run prevention https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/427 [09:05:25] (reviews welcome) [09:34:44] blancadesal: do you know where the helm/helmfile/helm-diff packages should be coming from for toolforge control nodes? (T370252) [09:34:45] T370252: [infra,k8s] helm packages are not available on new k8s repos - https://phabricator.wikimedia.org/T370252 [09:35:37] dcaro: I don't :/ [09:37:59] okok, I was trying to install helm it in the bastions so we can run the get versions script there, and found that it's not available anymore :/ [10:17:52] blancadesal: so we have already the 'final' api paths? [10:18:37] at least for jobs [10:32:48] dcaro: jobs is 'final' except if we change quota --> quotas [10:37:45] awesome :) [10:39:05] dcaro: I'll send an email to cloud-announce after lunch. Wdyt about maybe extending the deprecation deadline to Monday or so? [10:41:15] sure, we should change all the clients too [10:43:59] 👍 [11:17:49] hmm... for some reason helm on tools has the builds-builder version 110, but it won't update it to 113 as there's no changes (the updates there are only for toolsbeta/lima-kilo), so we have a non-latest version of the chart for a bit [11:37:41] * dcaro lunch [11:39:14] there's a few easy (I hope) patches a couple adding ingress to lima-kilo (https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/396 and dependent one) and adding the version listing of components for tools/toolsbeta (https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/429) if anyone has some time to review (not critical, but should be easy), going for lunch [11:39:18] * dcaro lunch [12:49:58] https://www.irccloud.com/pastebin/nuMhV45X/ [12:50:13] any idea why I'm suddenly getting permission denied? [12:53:23] not really, that should be using the cumin powers, not your user [12:53:30] so that should fail for all, let me check [12:53:39] blancadesal: try using cloudcumin servers instead? [12:54:35] i.e, `ssh cloudcumin1001.eqiad.wmnet` [12:55:14] yep, not sure how toolsbeta-cumin is nowadays, cloudcumin kind of superseded it [12:57:59] how do I target just the toolsbeta bastion? [12:59:02] cumin "O{project:toolsbeta name:toolsbeta-bastion*}" ... [12:59:06] I think that should work [12:59:19] I may be wrong, but I don't think you can use puppetdb information from cloudcumin servers, meaning you cannot use `R:Package = ` etc [12:59:49] you can use openstack provider though (`O{...}`) [13:00:33] I was following the instructions here: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Packaging#Uploading_a_package toolsbeta-cumin worked the last time I deployed a package :/ [13:01:08] nowadays I wget it from the built ones in CI [13:01:42] then `for deploy in toolsbeta; do for repo in buster bullseye bookworm; do aptly repo add $repo-$deploy /tmp/toolforge-jobs-framework-cli_16.0.14_all.deb && aptly publish --skip-signing update $repo-$deploy; done; done` and then the same for tools [13:01:56] from tools-services-05 [13:02:16] I did that, but then the packages need apt installing on each bastion too, right? [13:02:46] oh yep, that should work from cloudcumin yep, we should update the docs [13:04:28] blancadesal: just updated it, can you test those commands? [13:05:03] thanks, testing now [13:05:38] the name match was wrong xd [13:06:52] and it was wrong again, just tested myself, now it's the good one `O{project:tools name:.*-bastion.*}` [13:08:14] this last one worked :) [13:09:51] the second command too 👍 [13:14:02] I should probably read the cumin docs xd [13:16:16] the syntax reminds me of perl for some reason [13:20:35] hopefully good memories xd [13:22:05] https://en.wikipedia.org/wiki/Obfuscated_Perl_Contest [13:23:02] xd [13:23:21] that brings me to https://en.wikipedia.org/wiki/Brainfuck [13:23:40] and that to https://github.com/samshadwell/TrumpScript .... [13:23:43] the "WIki Loves Sport 2024" banner is quite appropriate xd [13:24:11] of course it was written by a Physics student... ehem. [13:24:26] (brainfuck) [13:24:31] dhinus: what do you think? https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 [13:24:35] arturo: you can use puppetdb from cloudcumin, but only for metal hosts. being able to use tools-puppet was briefly discussed in T362629 [13:24:36] T362629: Allow interacting with Toolforge PuppetDB from wmcs-cookbooks - https://phabricator.wikimedia.org/T362629 [13:24:59] arturo: will have a look at the tofu MR in a minute [13:25:46] on the cumin/puppet thing, I was wondering if we could have a toolforge-cumin host that is linked to tools-puppetdb instead of the prod-puppetdb [13:26:00] but I'm not sure I understand exactly how it all works :D [13:26:56] it would definitely be nice to query cloud-vps hosts based on a package or other properties, as you can do for metal hosts [13:27:12] dhinus: I guess that's exactly how `toolsbeta-cumin-1` and friends were set tup [13:27:58] * dhinus checks [13:29:46] wow, you're right, and it works! [13:32:21] you can do things like "sudo cumin 'P{R:Class = Nginx}'" or "sudo cumin 'P{R:File = /etc/cumin}'" [13:32:58] compared to cumin/cloudcumin, you have to add P{} because the default backend on tools-cumin-1 is set to O{} [13:34:08] * arturo food [13:52:18] andrewbogott: dhinus - unfortunately, I have a calendar clash with our upcoming sync space meeting, so I can't make it. Sorry about that. [13:52:39] btullis: are there others from your team coming or should we cancel? [13:53:12] I'll skip it too [13:53:26] btullis: what about same time next week? [13:56:50] I wouldn't think that there will be anyone else from our team. Someone has scheduled a DPE Sync meeting for the same slot. Maybe we need to find a different time? [13:59:21] arturo: your mention of tools-cumin opened up a new world to me :D now my question is: why do we use puppetdb only for tools/toolsbeta and not for all cloud-vps projects? [13:59:49] btullis: would the same time on monday or thursday work? [14:00:06] well, or tuesday [14:00:18] dhinus: I think we only have puppetdb deployed on those, would need to deploy it for everyone [14:01:14] (and cumin nodes) [14:01:25] there was a cloud-cumin vm iirc [14:01:45] yes there was but I don't think it was pointing to any puppetdb [14:01:59] I wonder if a single puppetdb for all cloud-vps would work? instead of one per project? [14:02:03] dhinus: puppetdb doesn't support multi-tenancy. So we can't have a central puppetdb or it will leak info between projects. [14:02:08] ha [14:02:12] The only way to use it properly would be to have one per project which would be a big pain [14:02:16] Caught ConnectionError exception: HTTPSConnectionPool(host='localhost', port=443): Max retries exceeded with url: /pdb/query/v4/resources (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused')) [14:02:37] ^ from cloud-cumin (probably not there anymore) [14:02:48] s/anymore/just not there [14:02:55] andrewbogott: what would be an example of a "leak"? [14:03:00] secrets [14:03:02] facts [14:03:58] what kind of secrets are stored in puppetdb? [14:05:32] hmm ok I guess we deploy some secrets using puppet, and those end up in puppetdb as well [14:05:48] I'll have to double check, but I think that the hiera values end up as parameters, and our secrets are just locally deployed hiera values [14:06:38] maybe we should revisit the whole idea of hiding secrets in private git commits :P [14:06:47] but that's a big task [14:07:41] facts might be an issue too [14:07:59] it includes ssh keys and such [14:08:12] IMO anything that is a direct data conduit from one project to another breaks the whole idea of tenants [14:08:25] so even if we hammer down individual cases it would still be a disaster waiting to happen [14:11:26] I see your point. I was thinking that if we remove private git commits, then puppetdb would only include things that are already public, but probably that's not true [14:12:12] Yeah, the whole point of puppetdb is to gather info from clients and distribute it to other clients [14:12:26] my use case here would be to be able to do things like "cumin 'P{O:wmcs::some::role}'" [14:13:15] https://openstack-browser.toolforge.org/puppetclass/ is annoying because it won't list derived roles I think? [14:13:55] I just discovered that I can do "cumin 'P{O:wmcs::some::role}'" in tools using tools-cumin-1 and that's great [14:14:02] it would be nice to have that cloud-wide [14:14:58] it would! [14:15:44] correction: roles are probably ok to search in openstack-browser, profiles are more of an issue (because we often have profile a requiring b requiring c) [14:15:59] while I don't think we have roles requiring other roles [14:16:40] tl;dr :old-man-yells-at-puppet: :D [14:18:08] yep, roles should not include each other afaik, only profiles, while profiles are a bit more lax [14:19:30] blancadesal: btw. I manually changed zoomviewer, we'll see if that helped or it's curling somewhere else too [14:20:19] dcaro: nice. it might also be that it's not the only tool using that endpoint [14:20:41] yep [14:21:08] I summarized the cumin/puppetdb discussion in T179816 [14:21:09] T179816: Cumin: create external backend for WMCS Puppet API - https://phabricator.wikimedia.org/T179816 [14:21:15] though it points that way (at least the one that does it the most) [14:21:17] https://usercontent.irccloud-cdn.com/file/mdrWC6Qn/image.png [14:22:36] hopefully it's the only one [14:23:10] dhinus: I'm looking for victims, do you see anything on this list that I can upgrade today w/out waiting for the notice period? https://phabricator.wikimedia.org/T369723 [14:23:23] dcaro, same question mostly because tools-harbordb is on that list [14:23:57] I wonder if some of those are still in use... [14:24:16] I'm sure many of them are not but that's a much harder question to answer :) [14:24:16] I think quarry-db-02 is superseded by quarry-k8s [14:24:24] quarry-dev-kube | ACTIVE | quarry | [14:24:34] ^ that one looks 'dev'elopment? [14:24:35] I was suddenly so very very confused because I opened k9s on what I thought was lima-kilo and was greeted by a bunch of loki-promtail and loki-grafana pods xd [14:24:45] dcaro: yep that one is a good candidate as well [14:24:55] OK, I'll try those two for starters... [14:25:15] andrewbogott: toolsbeta-db went well rightV [14:25:16] ? [14:25:26] the 7th time, yes :) [14:25:29] xd [14:25:34] all the other times the rebuild was just a reboot/noop [14:26:15] as long as harbor db is down, builds can't start and images can't get pulled (so new jobs/webservice might fail to start), it's kinda critical [14:26:25] but if it's a short downtime nobody should notice [14:26:31] ok, so that might need its own notice period [14:27:17] no need for lots of advance, but I would send an email with "we are going to do this, nothing should happen, if you see anything weird, please ping us" [14:27:19] kind of thing [14:27:26] * andrewbogott nods [14:29:45] blancadesal: looks promising [14:29:48] https://usercontent.irccloud-cdn.com/file/lVDQfnqJ/image.png [14:36:18] very promising! [14:57:32] dcaro: I'll be afk for a while – will come back and finish up the last bits of the jobs-api/cli stuff in an hour or so [14:58:33] blancadesal: no rush, lots of work got deployed today! kudos! [14:58:52] :D [14:59:48] https://usercontent.irccloud-cdn.com/file/0HbkNA2B/Screenshot%202024-07-17%20at%2016.59.29.png [15:31:01] arturo: I'm looking at the tofu MR but I have to refresh my memories on some tofu/tf patterns :) [15:32:36] dhinus: ok :-) I'm still iterating on it. I found you can "code" the imports, and I'm currently exploring all that [15:34:29] I bet some features didn't even exist the last time I used terraform :D [15:56:34] dcaro: do we already have examples of cookbooks that run rbd commands? (It looks to me like we don't) [15:57:43] andrewbogott: I think we do not, only osd and similar things [15:57:51] ok [17:25:28] * dcaro off [17:25:30] cya tomorrow [17:38:20] I somehow broke the make-instance-vol script almost exactly one year ago when passing it through shellcheck [17:38:41] and surprisingly nobody got affected by the issue until I had to create a new instance today ;) [17:39:00] the issue is some bad quoting :/ I have fixed it with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1054916 if one can puppet-merge it [17:39:19] (I have already applied the puppet patch on the WMCS integration puppet server, so there is no rush) [17:54:05] merged! [21:17:21] uuid project names look pretty gross in DNS: db.933ad3ff1e264aada56e6bc3ed9e08f3.eqiad1.wikimedia.cloud [21:18:55] andrewbogott: would it be reasonable for me to file a feature request to use the project name in DNS rather than project id? [21:19:17] https://openstack-browser.toolforge.org/project/933ad3ff1e264aada56e6bc3ed9e08f3 is that particular project [22:31:11] bd808: it ought to be creating two records, one with the name and one with the uuid. [22:31:16] Is that broken? [22:33:15] andrewbogott: I'm not sure. All I see on openstack-browser is the long random number version. I haven't tried a name lookup other than that. [22:33:38] * andrewbogott checks [22:33:43] * bd808 doesn't remember how openstack-browser decides on the name to show [22:33:59] I'm sure it shows the id, but it could be changed to show the name :) [22:35:27] ok, so I can ssh to ssh db.webperformancetest.eqiad1.wikimedia.cloud [22:35:39] so that's working, sounds like all we need to change is openstack-browser [22:35:43] `{% set fqdn = server.name ~ '.' ~ data.id ~ '.eqiad1.wikimedia.cloud' %}`, so yeah we can replace `data.id` with `data.name` if the readable project name is also in DNS [22:36:09] * bd808 opens a bug report about openstack-browser [22:36:18] That's probably all it takes [22:36:45] T366679 was way ahead of me here. [22:36:46] T366679: openstack-browser support for projects where id != name - https://phabricator.wikimedia.org/T366679 [22:42:59] ugh. the use of project.id is endemic to the tool. [22:47:00] andrewbogott: I think the answer to this must be yes if you are making redundant DNS records, but to be sure: are project names guaranteed to be unique now that they are decoupled from project ids? [23:04:47] bd808: that's a good question! The are probably not guaranteed. [23:04:49] * andrewbogott tries it [23:07:01] keystone says: "Conflict occurred attempting to store project - it is not permitted to have two projects with either the same name or same id in the same domain: name is duplicatename, project id dc28a098a38d437d925c449106139ef1. (HTTP 409) (Request-ID: req-d5b474f6-1b73-4fe1-a68e-cce8f6d76d50)" [23:07:05] bd808: so we're good! [23:08:45] andrewbogott: w00t! I think that means I can make URLs pretty in https://openstack-browser.toolforge.org/ again :) [23:09:59] bd808: I also have the feature set in keystone to require project names to be valid in urls although I don't remember exactly what that means. [23:10:27] https://docs.openstack.org/keystone/pike/admin/identity-url-safe-naming.html [23:11:28] I would assume it means that a project name has to fit these character exclusions -- https://datatracker.ietf.org/doc/html/rfc3986#section-2.2 [23:12:08] "keystone supports the optional ability to ensure both projects and domains are named without including any of the reserved characters specified in section 2.2 of rfc3986." [23:12:33] likely. I vaguely remember some ambiguity about _ and - in domain names (although clearly - works since we use it all the time) [23:16:36] per RFC, hostnames cannot include `_`, but DNS names can is the simple rule. [23:17:13] This is why Bonjour uses records like _tcp.foo.bar [23:18:00] This is also why the S3 gateway stuff is so confusing [23:18:21] https://en.wikipedia.org/wiki/Hostname#Syntax [23:18:44] https://en.wikipedia.org/wiki/Domain_Name_System#Domain_name_syntax,_internationalization [23:20:07] I don't think i've ever been tempted to name a host with _s but I don't think I knew specifically not to do that [23:20:16] "The characters allowed in labels are a subset of the ASCII character set, consisting of characters a through z, A through Z, digits 0 through 9, and hyphen. This rule is known as the LDH rule (letters, digits, hyphen)." -- "label" here is a dot separated DNS component [23:21:42] hang on, does that mean that domain names can or can't have underscores? I feel like you've said both things [23:22:01] this all gets really confusing when you find out that most DNS clients and servers will let you name things WTFever you want. [23:22:23] ok, I guess the article says _ is ok for dns but /not/ if it refers to a host [23:22:50] which seems funny since a dns 'hostname' is barely a thing anymore [23:23:02] Or, I mean, we call things hostnames but they're always vhosts and proxies and such [23:23:08] * andrewbogott may still not be getting it [23:24:40] There is probably some way this is subtly wrong, but I think "hostname" in this sense can be thought of as the left most dotted component of a FQDN when the FQDN is referring to an A or AAAA record. [23:25:26] so "bd808-rocks.bd808.com" is good; "bd808_sucks.bd808.com" is not [23:25:34] Yeah, I'm confused that the wiki article refers to the hostname as containing the dots... [23:25:39] It sounds like openstack will allow project names to be created with_ , and foo.project_name.wikimedia.cloud will work, but be forever technically incorrect [23:26:48] there are at least 4 different RFCs at play here, so some things depend on when in the courseof history you froze the rules in your brain, code, or document [23:27:22] kubernetes will not let you use `_` in a namespace. this is the important bit for Toolforge [23:27:54] as long as we don't rename toolforge tool_forge everything should be fine [23:28:21] there were a number of tools that had to be renamed in the long ago [23:29:17] magnus had at least a couple with `_` in the name. Which honestly made sense in relation to how mediawiki handles spaces in titles itself. [23:31:05] the label definition from RFC1035 is the one that Kubernetes uses for namespace names. "The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen." [23:32:30] RFC1123 extended that to allow labels to start with a digit. [23:33:08] It's funny that they exclude ending with a - [23:33:16] makes the regex messier :) [23:35:42] rfc2181 clarified that when not talking about internet hosts DNS labels and full names can do pretty much whatever is desired as long as each lable is 1-63 octets long and the full name is <=255 octets. [23:36:23] * bd808 checks to see what rfc5892 did to make things messier [23:37:51] * andrewbogott notices that petscan has been rebuilt with bookworm! [23:40:13] ok, rfc5892 is Internationalized Domain Names code point madness. run far away :) [23:40:41] I hope to never need to understand how internationalized urls work [23:41:06] its mostly a big list of unicode code points that someone decided would be confusing in a label (without punycode encoding) [23:44:07] * bd808 wanders off for the evening