[10:40:31] FYI, I'm disabling puppet for a bit on hosts using profile::tlsproxy::envoy for a firewall change rollout [10:43:01] all hosts? [10:43:14] ah, I got it now [10:43:49] my mind added an extra coma on my first read [11:13:02] just 207 :-) But Puppet is re-enabled on these now [13:48:15] o/ I just had https://phabricator.wikimedia.org/T376438 (download to PDF makes error rate go brrrrrr) plop into my field of awareness. normally n.emo-yiannis would have a look on that, probably, but he's unexpectedly out for a bit longer - can someone ping us (content-transform team) if we need to do anything about it? yiannis seems to have a patch there to be deployed, but i don't know much/anything about it [13:51:57] ihurbain: in theory that patch is deploy-able by anyone with deployment group access, just following the instructions at https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments in the very first section -- you can actually skip the first ~3.5 steps even :) [13:52:47] cdanis: should i actually do that? [13:53:01] it seems like an excellent "first k8s deploy" tbh :) [13:53:10] THEN LET'S DO IIIIIIT [13:53:30] I think you should be able to +2 to kick things off [13:54:10] ah, no, i'm afraid i can't do that dave [13:54:13] hmm [13:54:25] (mmmh, we did saw that last time and i didn't fix it) [13:54:48] I don't understand who exactly has +2 on deployment-charts repo, it's not mwdeploy' [13:56:03] https://gerrit.wikimedia.org/r/admin/repos/operations/deployment-charts,access seems to say wmf-deployment and a bunch of others (mediawiki-services-mobileapps is *maybe* the tightest one?) [13:56:38] I did verify you have k8s deploy perms for this service, though, so I'm going to +2 and then you can take it from there [13:56:47] brilliant, thank you :) [13:56:59] thank you! [14:00:00] ihurbain: oh, confusingly, the service is actually called `proton` in production [14:00:05] haha :D [14:00:16] good call, that's where i was getting stuck.... just now. :D [14:00:18] idk why, ✨history or something [14:02:30] hysterical raisins [14:07:12] (ftr, a decent way to figure that out here is just `git grep chromium-render` under helmfile.d/services [14:07:30] ack [14:07:39] okay, NORMALLY i did the thing. [14:07:52] looks good to me :D [14:07:56] yay! [14:09:13] i guess now i need to ask for rights on deployment-charts [14:10:01] are there any serviceops or other sre around who know the intent of the current ACL on gerrit deployment-charts ? [14:12:18] ihurbain: https://grafana.wikimedia.org/d/U4TuF-lMk/proton?orgId=1&from=now-30m&to=now&viewPanel=56 things are heading in the right direction [14:22:24] thanks for handling that rollout! fwiw the redeploy will drop the error rate as it'll kill the swamped pods so we might need to wait to see if the fix has addressed the issue (but it really seems like it will) [14:26:03] hnowlan: hopefully not another thumbhammer use case [14:28:51] wunderbar :) [14:29:21] hnowlan: do you know anything about my question about deployment-charts repo ACL above? [14:30:47] this is the second time in a week I've had to +2 for someone who had rights to deploy to k8s anyway [14:30:49] hey on-callers, I am deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1091597 to limit /v2/_catalog in the docker registry to internal ips only. Nothing horrible is expected, ping me in case something looks weird [14:37:39] cdanis: I think the base expected group is the independent Gerrit wmf-deployment group (which includes ldap/ops) although there's a lot of mess in the other groups. [14:40:17] yeah, there is [14:40:24] wmf-deployment looks mediawiki-specific? [14:40:54] yeah it's traditionally the branch for being able to merge changes to mediawiki [14:41:32] which isn't a great 1:1 mapping, but it kinda makes sense in terms of comparative impact when it comes to deploying stuff in deployment-charts [14:41:43] but not ideal [14:41:59] huh, why wasn't bvibber in it then 😅 [14:43:46] O_o she should be [14:43:50] hnowlan: the thing is, the ability to deploy from deployment-charts is much broader than the +2 permission is ... which seems wrong, and I think creates some bad incentives for service owners [14:45:23] yeah that's true, it'd be worth revisiting the assumptions [14:45:56] which were made around when we had a services team afaict and now our expectations of developers are much broader in terms of deployment agility [14:46:02] I'll make a task for this [14:46:30] thanks <3 [15:10:38] If there is an alert or bad disk ticket for backup2012, that's me [15:10:46] We were doing some hw testing [17:42:20] question time: to have a specific version of a package installed on a specific class (profile) of hosts, do we already have something like a hiera configuration that if present is applied on every host or it's better to define it in the base class for that profile? [19:28:42] very much depends on the context, but have a look at the use of profile::airflow::airflow_version in profile::airflow and class airflow, it might be what you are looking for [19:29:48] if the differentiation is about two generations of packages (like exampled 2.x vs exampled 3.x), one option is also to install them from different repository components, then you don't need to fiddle with specific package versions [19:30:28] i.e. no need to specifically pin to 2.3-2 e.g., since this might change to 2.3-2+deb12u1 if there's an update in Debian