[14:12:21] o/ q about deployer groups in data.yaml [14:13:01] i need to add analytics-platform-eng-admins to profile::admin::groups on deployment servers [14:13:19] so that gmodena and others will be allowed to access the deploy_airflow ssh key there [14:13:31] i see that there are several *-deployers groups [14:13:51] which are usually just includes of other groups [14:13:59] e.g. gerrit-deployers is all of gerrit_root_admins [14:14:04] analytics-deployers is analytics admins [14:14:31] but, there are also some -admin groups listed directly in profile::admin::groups too [14:14:35] so [14:14:36] should I: [14:15:37] A. just add analytics-platform-eng-admins to profile::admin::groups [14:15:37] or [14:15:38] B. Create a new platform-eng-deployers group, include all of analytics-platform-eng-admins in that, and then add platform-eng-deployers to profile::admin::groups [14:17:17] _joe_ or mutante maybe have opinions (based on git blame?) [14:17:45] i'm going to opt for just using -admins, will make patch [14:18:36] oh and also research admins [14:19:06] (actually right now i need analytics-research-admins, but same q) [14:19:27] ottomata: the airflow groups have some privs (like unrestricted `journalctl` access) that might not be wanted on deploy* hosts, so a separate group would be better in that regard [14:19:48] ahhhHH!!! [14:19:49] thatt makes sense [14:19:49] since atm you can't have different privs on different hosts for the same group [14:19:50] thank you [14:20:59] hmmm, since i'm using thet same deploy ssh key for all of these, maybe i can just usue the same deployer group for deploy access? [14:21:07] the server access will be still controlled by the admin group [14:21:20] yes [14:22:04] although if you use a shared deployment key, then anyone in the shared group can use the key to log in to any of the servers as the deployment user even if they couldn't normally access it [14:22:16] hmmmm [14:22:47] i mean i don'tt expect that to be a problem, but also i just realized the the existent deployment group we have also is allowed to acceess another key for a different deployment [14:22:48] sooo [14:22:52] i'll just make a new deployer group [14:46:20] hmmm might have a problem [14:46:24] https://puppet-compiler.wmflabs.org/pcc-worker1003/34812/deploy1002.eqiad.wmnet/change.deploy1002.eqiad.wmnet.err [14:46:44] a user now included in multi groups that are in profile::admin::groups on this node [14:47:03] is it not possible to have the same user in multiple profile::admin::groups ? [14:48:20] it should be possible, for example I'm in both wmcs-roots and deployment which both have access to labweb hosts [14:53:48] yeah it does look like it should be. [14:53:48] hm [15:07:43] ottomata: maybe it breaks because you have the same person twice in the same group? [15:08:08] i was wondering that, i don't think i do but because the catalog doesn't compile, i'm having trouble debugging [15:08:33] but, looking at the unique_users puppet function [15:08:39] even if that were true, I think it would not cause this [15:08:44] users.flatten(2).uniq [15:09:33] okay, i will revisit this later [15:09:40] the group i need right now is the research-deployers one [15:09:43] which doesn't have this conflict [15:11:35] won't you need i/f meeting approval anyways? [15:17:24] i/f ? [15:17:41] i don't think so, this was already supposed to be done, just hadn't done it fully. [15:23:32] does anyone have a javascript hack or something for grafana's tiny, disappearing scrollbars? I tried the workaround listed here ( https://github.com/grafana/grafana/issues/17725#issuecomment-754751137 ) but it didn't seem to work on Mac Firefox 99 [15:33:55] taavi: i must be missing a crucial step [15:34:23] gmodena@deploy1002:~$ groups [15:34:23] wikidev deployment research-deployers [15:34:48] sudo cat /etc/keyholder-auth.d/deploy_airflow.yml [15:34:48] ... [15:34:48] research-deployers: [deploy_airflow] [15:35:41] SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -v -l analytics-research an-airflow1002.eqiad.wmnet [15:35:47] (analytics-research is the system user) [15:36:06] debug1: Offering public key: /etc/keyholder.d/deploy_airflow ED25519 SHA256:uGq4ly8yj7SQ48qm8NVgdRS1nr5sablcn7t4G3SiQoA agent [15:36:06] debug1: Server accepts key: /etc/keyholder.d/deploy_airflow ED25519 SHA256:uGq4ly8yj7SQ48qm8NVgdRS1nr5sablcn7t4G3SiQoA agent [15:36:06] sign_and_send_pubkey: signing failed: agent refused operation [15:36:33] ottomata: try to restart keyholder-proxy.service [15:36:35] do I need to do something for a new group to be able to access the ssh agent? [15:36:37] oh [15:36:56] AHHH that did it! [15:36:56] not sure if that will fix but in my experience it helps in most cases [15:37:20] thank you! [15:37:23] probably the puppetization should notify keyholder-proxy.service to automate that [15:41:39] volans: do you think that has to be done when the users in the group changes, or just when the mapping in keyholder-auth.d changes? [15:41:53] not sure [15:42:21] well, i'll probablby have to do this again in a few weeks, i'll make a patch to do the latter and try it out [15:44:05] i'd expect it to be the latter, i doubt keyholder proxy maintains a list of userrs [15:44:14] probably just checks that the user is in the group [18:51:55] have we discussed in the past formatting all our python files in our puppet repo with black? [18:53:50] yeah, broadly it's https://phabricator.wikimedia.org/T211750 [18:55:26] afaik the last decision we made was to try it in a couple of smaller repos before adding it to the puppet repo -- I don't think we ever decided whether or not to go on from there [18:56:08] rzl: thanks! My phabricator search foo failed me [18:56:27] j.bond did mail out a "format everything in ops/puppet" patch but it was about as tricky to review and merge as you'd expect [18:57:38] yeah, reformatting is a pain as far as git history is concerned, but hopefully if black has matured enough it is a one time pain [18:59:04] I tried black on https://github.com/wikimedia/operations-software-knead-wikidough and found it to pretty helpful. I didn't up merging the changes though, not for any particular reason [18:59:33] the fact that it produces a valid AST (as it should) is of course another big plus [19:00:52] I think doing it for Puppet is going to be a bigger ask so maybe we can try on a smaller project (happy to be the guinea pig) [19:01:27] we did with at least spicerack and maybe others? more and more guinea pigs seems like the right approach though [19:02:06] oh, interesting. I didn't know about doing it with spicerack [19:02:43] I stumbled upon the project from HN one day and found it interesting and thought maybe we should use this but didn't think more :P [19:03:16] one day we might even get formatting for puppet code, 🙏 https://github.com/puppetlabs/puppet-editor-services/issues/319 [19:04:48] lol @ that issue. short and succint [19:18:19] I'm noticing that nearly half the Memc errors seen by MW are from kube-mwdebug, https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts / https://logstash.wikimedia.org/app/dashboards#/view/memcached [19:20:22] I'm not seeing an obvious dash for kube-mwdebug traffic, but the old one I had from a while back suggests there isn't some benchmark or other traffic spike on it, so a bit weird. https://grafana-rw.wikimedia.org/d/8eIKRvInk/krinkle-k8s-mwdebug [21:48:32] jhathaway: I've replied directly in the task [21:48:46] sukhe: yep, see https://doc.wikimedia.org/spicerack/master/development.html#code-style for more details on how its integrated [21:53:35] volans: thanks for the detailed reply [21:59:56] anytime :) [22:31:51] Hi all, I'm in the process of reimaging clouddb1021; unlike similar hosts it is an HP device. In the serial console it has prompted me with an unexpected message: `Some of your hardware needs non-free firmware files to operate. The firmware can be loaded from removable media, such as a USB stick or floppy.` It mentions `The missing firmware files are: bnx2x/bnx2x-e2-7.13.21.0.fw bnx2x/bnx2x-e2-7.13.21.0.fw` [22:31:51] Is anybody familiar with this? [22:31:59] Full paste is at https://phabricator.wikimedia.org/P24619 [22:36:38] Since this is a new Debian version (the reimage is meant to go from 10->11) I wonder if there's something in our debian 10 apt repository that's not in debian 11 [22:43:54] razzi: those versions aren't availabe in firmware-bnx2x, https://packages.debian.org/bullseye/firmware-bnx2x [22:44:31] this bug is worisome as well, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006500 [22:52:07] jhathaway: that's good context... unfortunately I'm not sure what I can do with the server [22:52:07] Fortunately this host is only used at the start of the month to query data for the last month, so we have 2 weeks to figure this out [22:52:20] what is the kernel version? [22:54:45] I'm not sure how to get that without the host being up jhathaway [22:54:45] I'm sure it's in some puppet database [22:56:34] Hm... though since it's mid reimaging it isn't showing on https://puppetboard.wikimedia.org/nodes [23:02:51] From the debian installer menu I got the ash shell and ran this: [23:02:51] ~ # uname -a [23:02:51] Linux (none) 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux [23:03:07] That's the new linux version, that has this issue [23:05:01] hmm, yeah that doesn't make sense, since the bug was introduced in 5.17, according to my splunking [23:05:12] so 5.10 should be find, unless it was backported... [23:12:11] *fine