[07:48:48] morning [07:49:34] hmm, it seems I don't have the right permissions to change the topic on wikimedia-cloud. I wanted to remove the k8s upgrade message [07:52:29] done [07:52:31] morning [07:52:35] IRC permissions are a mess [07:55:40] thanks! [07:56:36] arturo: do we need to remove the downtime or did you do that already? [07:57:51] I can't find it browsing the list of silences. I believe I created it for 4h only [08:00:44] ok [08:01:56] morning [08:08:10] I have ops, but chanserv tells me I can't give ops [08:08:55] well well chanserv [08:13:37] blancadesal: you don't have a wikimedia cloak either, iirc it was used in some places, though I think it's not for this channel (https://meta.wikimedia.org/wiki/IRC/Cloaks) [08:16:28] is having a wikimedia cloak linked to channel permissions? [08:17:49] it was at some point at least, though it was never clear to me if it stopped being linked, or if it was more of a "Ahhh, I see you have the cloak, I'll give you permissions manually" kind of thing [08:19:27] as far as I know, there is a bot that only accepts commands from folks using a wikimedia cloak, in particular the bot that understands the `!status` command to change the `Status:` string in the -cloud channel topic [08:19:46] other than that, the cloak doesn't really work for anything [08:20:44] blancadesal: you don't have nick protection enabled it seems [08:21:09] https://wikitech.wikimedia.org/wiki/Libera_Chat [08:21:33] that "might" be one reason why it does not let me give you access [08:23:15] dcaro: I've enabled it now [08:29:24] blancadesal: it did not help xd [08:29:28] I see it enabled though [08:31:44] I'm able to change the topic on this channel... what's the difference? [08:32:52] `blancadesal has flags +Aefiorstv in #wikimedia-cloud-admin because they are logged in as blancadesal.` [08:33:06] but when I try to give you the same flags on -cloud it says I have no rights :/ [08:33:40] I think I need to be set as 'founder' of the channel or something to give rights [08:35:29] thanks for trying :) [08:35:43] I'm missing the 'f' flag for myself on -cloud, I think that's what prevents me from giving access to you (I can change/give access on this channel) [08:37:58] ` +f - Enables modification of channel access lists.` that one xd [08:39:25] bryan and andrew have the -f flag [08:40:25] arturo: you have it too [08:41:00] I'd be happy to copy and execute the commands you think I should run [08:41:19] `Founder : Majavah, bd808, andrewbogott, balloons` [08:41:26] for -cloud [08:41:28] so xd [08:42:38] oh, arturo then you can `/cs flag #wikimedia-cloud blancadesal +Aefiorstv` (you might want to open the chat first with chanserv, /query chanserv) [08:43:36] https://usercontent.irccloud-cdn.com/file/gMbMpUe9/image.png [08:44:04] once you have a chat open with chanserv, you don't need the `/cs` prefix [08:44:22] https://usercontent.irccloud-cdn.com/file/RQ1odv13/image.png [08:45:00] 🤦‍♂️ it's `flags` plural [08:45:22] https://usercontent.irccloud-cdn.com/file/to7PjA31/image.png [08:45:27] \o/ [08:45:30] can you do the same for me? [08:45:40] (so I get f, and give access to others) [08:45:55] https://usercontent.irccloud-cdn.com/file/MJQmU8RR/image.png [08:45:56] blancadesal: can you verify you can op yourself? [08:45:59] \o/ [08:46:46] it worked for me (I can give access now on -cloud) [08:46:56] arturo: thanks!@ [08:47:07] you are welcome [08:48:27] yes it works! [08:48:56] thanks arturo and dcaro! [08:59:46] if anyone runs functional tests in toolsbeta/tools, you can use https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/426 and run as your user [09:05:14] and if you use this one, you also get parallel run prevention https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/427 [09:05:25] (reviews welcome) [09:34:44] blancadesal: do you know where the helm/helmfile/helm-diff packages should be coming from for toolforge control nodes? (T370252) [09:34:45] T370252: [infra,k8s] helm packages are not available on new k8s repos - https://phabricator.wikimedia.org/T370252 [09:35:37] dcaro: I don't :/ [09:37:59] okok, I was trying to install helm it in the bastions so we can run the get versions script there, and found that it's not available anymore :/ [10:17:52] blancadesal: so we have already the 'final' api paths? [10:18:37] at least for jobs [10:32:48] dcaro: jobs is 'final' except if we change quota --> quotas [10:37:45] awesome :) [10:39:05] dcaro: I'll send an email to cloud-announce after lunch. Wdyt about maybe extending the deprecation deadline to Monday or so? hmm... for some reason helm on tools has the builds-builder version 110, but it won't update it to 113 as there's no changes (the updates there are only for toolsbeta/lima-kilo), so we have a non-latest version of the chart for a bit
there's a few easy (I hope) patches a couple adding ingress to lima-kilo (https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/396 and dependent one) and adding the version listing of components for tools/toolsbeta (https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/429) if anyone has some time to review (not critical, but should be easy), going for lunch [12:53:23] not really, that should be using the cumin powers, not your user [12:53:30] so that should fail for all, let me check [12:53:39] blancadesal: try using cloudcumin servers instead? [12:54:35] i.e, `ssh cloudcumin1001.eqiad.wmnet` [12:55:14] yep, not sure how toolsbeta-cumin is nowadays, cloudcumin kind of superseded it [12:57:59] how do I target just the toolsbeta bastion? [12:59:02] cumin "O{project:toolsbeta name:toolsbeta-bastion*}" ... [12:59:06] I think that should work [12:59:19] I may be wrong, but I don't think you can use puppetdb information from cloudcumin servers, meaning you cannot use `R:Package = ` etc [12:59:49] you can use openstack provider though (`O{...}`) [13:00:33] I was following the instructions here: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Packaging#Uploading_a_package toolsbeta-cumin worked the last time I deployed a package :/ [13:01:08] nowadays I wget it from the built ones in CI [13:01:42] then `for deploy in toolsbeta; do for repo in buster bullseye bookworm; do aptly repo add $repo-$deploy /tmp/toolforge-jobs-framework-cli_16.0.14_all.deb && aptly publish --skip-signing update $repo-$deploy; done; done` and then the same for tools [13:01:56] from tools-services-05 [13:02:16] I did that, but then the packages need apt installing on each bastion too, right? oh yep, that should work from cloudcumin yep, we should update the docs
just updated it, can you test those commands?
thanks, testing now
the name match was wrong xd
and it was wrong again, just tested myself, now it's the good one `O{project:tools name:.*-bastion.*}`
this last one worked :)
the second command too 👍
I should probably read the cumin docs xd
the syntax reminds me of perl for some reason
hopefully good memories xd
https://en.wikipedia.org/wiki/Obfuscated_Perl_Contest
xd
that brings me to https://en.wikipedia.org/wiki/Brainfuck
and that to https://github.com/samshadwell/TrumpScript ....
the "WIki Loves Sport 2024" banner is quite appropriate xd
of course it was written by a Physics student... ehem. [13:24:26] (brainfuck) [13:24:31] dhinus: what do you think? https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/13 [13:24:35] arturo: you can use puppetdb from cloudcumin, but only for metal hosts. being able to use tools-puppet was briefly discussed in T362629 [13:24:36] T362629: Allow interacting with Toolforge PuppetDB from wmcs-cookbooks - https://phabricator.wikimedia.org/T362629 [13:24:59] arturo: will have a look at the tofu MR in a minute [13:25:46] on the cumin/puppet thing, I was wondering if we could have a toolforge-cumin host that is linked to tools-puppetdb instead of the prod-puppetdb [13:26:00] but I'm not sure I understand exactly how it all works :D [13:26:56] it would definitely be nice to query cloud-vps hosts based on a package or other properties, as you can do for metal hosts [13:27:12] dhinus: I guess that's exactly how `toolsbeta-cumin-1` and friends were set tup [13:27:58] * dhinus checks [13:29:46] wow, you're right, and it works! you can do things like "sudo cumin 'P{R:Class = Nginx}'" or "sudo cumin 'P{R:File = /etc/cumin}'"
compared to cumin/cloudcumin, you have to add P{} because the default backend on tools-cumin-1 is set to O{}
andrewbogott: dhinus - unfortunately, I have a calendar clash with our upcoming sync space meeting, so I can't make it. Sorry about that.
btullis: are there others from your team coming or should we cancel?
I'll skip it too
btullis: what about same time next week?
I wouldn't think that there will be anyone else from our team. Someone has scheduled a DPE Sync meeting for the same slot. Maybe we need to find a different time?
arturo: your mention of tools-cumin opened up a new world to me :D now my question is: why do we use puppetdb only for tools/toolsbeta and not for all cloud-vps projects? [14:00:06] well, or tuesday [14:00:18] dhinus: I think we only have puppetdb deployed on those, would need to deploy it for everyone [14:01:14] (and cumin nodes) [14:01:25] there was a cloud-cumin vm iirc [14:01:45] yes there was but I don't think it was pointing to any puppetdb [14:01:59] I wonder if a single puppetdb for all cloud-vps would work? instead of one per project? [14:02:03] dhinus: puppetdb doesn't support multi-tenancy. So we can't have a central puppetdb or it will leak info between projects. [14:02:08] ha [14:02:12] The only way to use it properly would be to have one per project which would be a big pain [14:02:16] Caught ConnectionError exception: HTTPSConnectionPool(host='localhost', port=443): Max retries exceeded with url: /pdb/query/v4/resources (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused')) [14:02:37] ^ from cloud-cumin (probably not there anymore) [14:02:48] s/anymore/just not there [14:02:55] andrewbogott: what would be an example of a "leak"? [14:03:00] secrets [14:03:02] facts [14:03:58] what kind of secrets are stored in puppetdb? [14:05:32] hmm ok I guess we deploy some secrets using puppet, and those end up in puppetdb as well [14:05:48] I'll have to double check, but I think that the hiera values end up as parameters, and our secrets are just locally deployed hiera values [14:06:38] maybe we should revisit the whole idea of hiding secrets in private git commits :P [14:06:47] but that's a big task [14:07:41] facts might be an issue too [14:07:59] it includes ssh keys and such [14:08:12] IMO anything that is a direct data conduit from one project to another breaks the whole idea of tenants [14:08:25] so even if we hammer down individual cases it would still be a disaster waiting to happen [14:11:26] I see your point. I was thinking that if we remove private git commits, then puppetdb would only include things that are already public, but probably that's not true [14:12:12] Yeah, the whole point of puppetdb is to gather info from clients and distribute it to other clients [14:12:26] my use case here would be to be able to do things like "cumin 'P{O:wmcs::some::role}'" [14:13:15] https://openstack-browser.toolforge.org/puppetclass/ is annoying because it won't list derived roles I think? [14:13:55] I just discovered that I can do "cumin 'P{O:wmcs::some::role}'" in tools using tools-cumin-1 and that's great [14:14:02] it would be nice to have that cloud-wide [14:14:58] it would! [14:15:44] correction: roles are probably ok to search in openstack-browser, profiles are more of an issue (because we often have profile a requiring b requiring c) [14:15:59] while I don't think we have roles requiring other roles [14:16:40] tl;dr :old-man-yells-at-puppet: :D [14:18:08] yep, roles should not include each other afaik, only profiles, while profiles are a bit more lax [14:19:30] blancadesal: btw. I manually changed zoomviewer, we'll see if that helped or it's curling somewhere else too [14:20:19] dcaro: nice. it might also be that it's not the only tool using that endpoint [14:20:41] yep [14:21:08] I summarized the cumin/puppetdb discussion in T179816 [14:21:09] T179816: Cumin: create external backend for WMCS Puppet API - https://phabricator.wikimedia.org/T179816 [14:21:15] though it points that way (at least the one that does it the most) [14:21:17] https://usercontent.irccloud-cdn.com/file/mdrWC6Qn/image.png [14:22:36] hopefully it's the only one [14:23:10] dhinus: I'm looking for victims, do you see anything on this list that I can upgrade today w/out waiting for the notice period? https://phabricator.wikimedia.org/T369723 [14:23:23] dcaro, same question mostly because tools-harbordb is on that list [14:23:57] I wonder if some of those are still in use... [14:24:16] I'm sure many of them are not but that's a much harder question to answer :) [14:24:16] I think quarry-db-02 is superseded by quarry-k8s [14:24:24] quarry-dev-kube | ACTIVE | quarry | [14:24:34] ^ that one looks 'dev'elopment? [14:24:35] I was suddenly so very very confused because I opened k9s on what I thought was lima-kilo and was greeted by a bunch of loki-promtail and loki-grafana pods xd [14:24:45] dcaro: yep that one is a good candidate as well [14:24:55] OK, I'll try those two for starters... [14:25:15] andrewbogott: toolsbeta-db went well rightV [14:25:16] ? the 7th time, yes :)
xd
all the other times the rebuild was just a reboot/noop
as long as harbor db is down, builds can't start and images can't get pulled (so new jobs/webservice might fail to start), it's kinda critical
but if it's a short downtime nobody should notice
ok, so that might need its own notice period
no need for lots of advance, but I would send an email with "we are going to do this, nothing should happen, if you see anything weird, please ping us"
kind of thing
blancadesal: looks promising
https://usercontent.irccloud-cdn.com/file/lVDQfnqJ/image.png
very promising!
dcaro: I'll be afk for a while – will come back and finish up the last bits of the jobs-api/cli stuff in an hour or so
blancadesal: no rush, lots of work got deployed today! kudos! :D
https://usercontent.irccloud-cdn.com/file/0HbkNA2B/Screenshot%202024-07-17%20at%2016.59.29.png
arturo: I'm looking at the tofu MR but I have to refresh my memories on some tofu/tf patterns :)
dhinus: ok :-) I'm still iterating on it. I found you can "code" the imports, and I'm currently exploring all that
I bet some features didn't even exist the last time I used terraform :D
dcaro: do we already have examples of cookbooks that run rbd commands? (It looks to me like we don't)
andrewbogott: I think we do not, only osd and similar things
ok
cya tomorrow
I somehow broke the make-instance-vol script almost exactly one year ago when passing it through shellcheck
and surprisingly nobody got affected by the issue until I had to create a new instance today ;)
the issue is some bad quoting :/ I have fixed it with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1054916 if one can puppet-merge it
(I have already applied the puppet patch on the WMCS integration puppet server, so there is no rush)
merged!
uuid project names look pretty gross in DNS: db.933ad3ff1e264aada56e6bc3ed9e08f3.eqiad1.wikimedia.cloud
andrewbogott: would it be reasonable for me to file a feature request to use the project name in DNS rather than project id? [21:19:17] https://openstack-browser.toolforge.org/project/933ad3ff1e264aada56e6bc3ed9e08f3 is that particular project [22:31:11] bd808: it ought to be creating two records, one with the name and one with the uuid. [22:31:16] Is that broken? [22:33:15] andrewbogott: I'm not sure. All I see on openstack-browser is the long random number version. I haven't tried a name lookup other than that. [22:33:38] * andrewbogott checks [22:33:43] * bd808 doesn't remember how openstack-browser decides on the name to show [22:33:59] I'm sure it shows the id, but it could be changed to show the name :) [22:35:27] ok, so I can ssh to ssh db.webperformancetest.eqiad1.wikimedia.cloud [22:35:39] so that's working, sounds like all we need to change is openstack-browser [22:35:43] `{% set fqdn = server.name ~ '.' ~ data.id ~ '.eqiad1.wikimedia.cloud' %}`, so yeah we can replace `data.id` with `data.name` if the readable project name is also in DNS [22:36:09] * bd808 opens a bug report about openstack-browser [22:36:18] That's probably all it takes [22:36:45] T366679 was way ahead of me here. [22:36:46] T366679: openstack-browser support for projects where id != name - https://phabricator.wikimedia.org/T366679 [22:42:59] ugh. the use of project.id is endemic to the tool. [22:47:00] andrewbogott: I think the answer to this must be yes if you are making redundant DNS records, but to be sure: are project names guaranteed to be unique now that they are decoupled from project ids? [23:04:47] bd808: that's a good question! The are probably not guaranteed. [23:04:49] * andrewbogott tries it [23:07:01] keystone says: "Conflict occurred attempting to store project - it is not permitted to have two projects with either the same name or same id in the same domain: name is duplicatename, project id dc28a098a38d437d925c449106139ef1. (HTTP 409) (Request-ID: req-d5b474f6-1b73-4fe1-a68e-cce8f6d76d50)" [23:07:05] bd808: so we're good! [23:08:45] andrewbogott: w00t! I think that means I can make URLs pretty in https://openstack-browser.toolforge.org/ again :) [23:09:59] bd808: I also have the feature set in keystone to require project names to be valid in urls although I don't remember exactly what that means. [23:10:27] https://docs.openstack.org/keystone/pike/admin/identity-url-safe-naming.html [23:11:28] I would assume it means that a project name has to fit these character exclusions -- https://datatracker.ietf.org/doc/html/rfc3986#section-2.2 [23:12:08] "keystone supports the optional ability to ensure both projects and domains are named without including any of the reserved characters specified in section 2.2 of rfc3986." [23:12:33] likely. I vaguely remember some ambiguity about _ and - in domain names (although clearly - works since we use it all the time) [23:16:36] per RFC, hostnames cannot include `_`, but DNS names can is the simple rule. [23:17:13] This is why Bonjour uses records like _tcp.foo.bar [23:18:00] This is also why the S3 gateway stuff is so confusing [23:18:21] https://en.wikipedia.org/wiki/Hostname#Syntax [23:18:44] https://en.wikipedia.org/wiki/Domain_Name_System#Domain_name_syntax,_internationalization [23:20:07] I don't think i've ever been tempted to name a host with _s but I don't think I knew specifically not to do that [23:20:16] "The characters allowed in labels are a subset of the ASCII character set, consisting of characters a through z, A through Z, digits 0 through 9, and hyphen. This rule is known as the LDH rule (letters, digits, hyphen)." -- "label" here is a dot separated DNS component [23:21:42] hang on, does that mean that domain names can or can't have underscores? I feel like you've said both things [23:22:01] this all gets really confusing when you find out that most DNS clients and servers will let you name things WTFever you want. [23:22:23] ok, I guess the article says _ is ok for dns but /not/ if it refers to a host [23:22:50] which seems funny since a dns 'hostname' is barely a thing anymore [23:23:02] Or, I mean, we call things hostnames but they're always vhosts and proxies and such [23:23:08] * andrewbogott may still not be getting it [23:24:40] There is probably some way this is subtly wrong, but I think "hostname" in this sense can be thought of as the left most dotted component of a FQDN when the FQDN is referring to an A or AAAA record. [23:25:26] so "bd808-rocks.bd808.com" is good; "bd808_sucks.bd808.com" is not [23:25:34] Yeah, I'm confused that the wiki article refers to the hostname as containing the dots... [23:25:39] It sounds like openstack will allow project names to be created with_ , and foo.project_name.wikimedia.cloud will work, but be forever technically incorrect [23:26:48] there are at least 4 different RFCs at play here, so some things depend on when in the courseof history you froze the rules in your brain, code, or document [23:27:22] kubernetes will not let you use `_` in a namespace. this is the important bit for Toolforge [23:27:54] as long as we don't rename toolforge tool_forge everything should be fine [23:28:21] there were a number of tools that had to be renamed in the long ago [23:29:17] magnus had at least a couple with `_` in the name. Which honestly made sense in relation to how mediawiki handles spaces in titles itself. [23:31:05] the label definition from RFC1035 is the one that Kubernetes uses for namespace names. "The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen." [23:32:30] RFC1123 extended that to allow labels to start with a digit. [23:33:08] It's funny that they exclude ending with a - [23:33:16] makes the regex messier :) [23:35:42] rfc2181 clarified that when not talking about internet hosts DNS labels and full names can do pretty much whatever is desired as long as each lable is 1-63 octets long and the full name is <=255 octets. [23:36:23] * bd808 checks to see what rfc5892 did to make things messier [23:37:51] * andrewbogott notices that petscan has been rebuilt with bookworm! ok, rfc5892 is Internationalized Domain Names code point madness. run far away :)
I hope to never need to understand how internationalized urls work
its mostly a big list of unicode code points that someone decided would be confusing in a label (without punycode encoding)
andrewbogott wanders off for the evening