[09:29:36] jbond: dcaro: about puppet compiler FAIL_FAST ( https://gerrit.wikimedia.org/r/c/integration/config/+/743365 ) I don't know quite know how ppc is released or updated, maybe it is just about updating the Gemfile version in puppet.git then updating the job [09:29:59] anyway, I am around to update the CI jobs whenever needed [09:38:38] hashar: yep, we have pending doing the release yes, will try to tackle today [09:38:57] thanks for the review :) [09:44:37] _joe_: I'm moving forward with Traffic envoy tests, so I'll be merging the pending CRs (already reviewed) https://gerrit.wikimedia.org/r/q/topic:%22T271421%22+(status:open) [09:45:26] <_joe_> vgutierrez: sure, want me to take another look? [09:46:04] just pinging as a heads up if you see those envoy CRs moving around :) [09:47:09] <_joe_> ack thanks [10:07:30] hashar: also the changes can be deployed independendtly of each other. if jenkins starts sending the new enviornment variable early it will simply be ignored [10:08:00] and welcome back :) [10:08:04] jbond: thx!! [10:08:08] the change states "This depends on pcc being released though." [10:08:21] then I guess it is just for the new feature to be actually effective [10:08:26] I am going to deploy the jobs [10:08:36] ahh ok perhaps to strick phrasing. the new feature can't be used untill pcc is erealeased [10:08:45] either way not a problem will do that in the next hour [10:08:54] yes thx [10:09:31] done! [10:09:50] I take stuff too literally sometime :D [10:13:35] great thanks [10:29:48] say for a file in puppet.git I want its gitiles display link, how do you do it? other than macheting my way through gerrit -> gitiles [10:31:14] godog: just append your relative path to https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production [10:31:45] "just", but ok yeah that works [10:31:48] thanks volans [10:31:55] I usually type gitiles in my browser and pick the last repo visited [10:32:00] and replace accordingly [10:40:14] jbond: I'm going to start deploying pcc on the compiler hosts, my idea is to move the current dir, and let puppet re-clone+install, another option is manually fetching the latest master, and manually doing the setup, any preference? [10:41:06] hashar: is there any way for me to add/remove jenkins workers from the pool? (might help deploying the new pcc, as I can take out a compiler host, upgrade, bring it back in, make sure it works and continue) [10:41:26] otherwise some runs might fail :P [10:41:54] dcaro: yeah we can unpool the agents via the web ui [10:43:08] <_joe_> you need admin rights on jenkins, not sure if that was restored for global roots or not [10:43:12] dcaro: ack i was just prepering the CR for the release [10:43:44] dcaro: one can add themselves to the LDAP group `ciadmin` as documented on https://wikitech.wikimedia.org/wiki/Jenkins#Administration that brings you full admin rights on https://integration.wikimedia.org/ci/ [10:43:46] I think I don't have rightn :/ (don't see any option to unpool) [10:44:09] <_joe_> hashar: so we can self-add? [10:44:22] <_joe_> I thought we needed some sort of approval [10:44:32] https://integration.wikimedia.org/ci/computer/ has the list of agents and one can be disabled via eg https://integration.wikimedia.org/ci/computer/compiler1001.puppet-diffs.eqiad.wmflabs/ and clicking the `[Mark this node temporarily offline]`) [10:44:38] <_joe_> I mean I know I can technically add myself [10:44:47] I have no idea really [10:45:04] I think the idea was to have only knowledgeable people to be able to admin jenkins [10:45:09] cause it is too easy to break it entirely [10:45:29] with SRE being able to add themselves to the `ciadmin` group when they actually need it [10:45:37] like unpooling an agent [10:46:17] I am personally happy to have people do stuff directly as long as the CI Jenkins doesn't end up broken (which is unsurprisingly easy to do) [10:46:19] <_joe_> dcaro: ^^ if you don't know how to add yourself to an ldap group [10:46:51] all the authorizations limitations to the CI systems are really legacy [10:47:01] I don't think we ever revisited them beside the addition of that ciadmin group [10:47:25] <_joe_> dcaro: https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests#LDAP_access has all the info you need [10:48:15] * dcaro looking [10:48:44] jbond: if you want to bless yourself with some Jenkins privileges, it is all about adding your account to the `ciadmins` LDAP group [10:49:01] <_joe_> the tldr is going to mwmaint1002 and `modify-ldap-group ciadmin` [10:49:34] hashar: ack thanks [10:55:19] created T297364 [10:55:20] T297364: Grant Access to ciadmins for dcaro - https://phabricator.wikimedia.org/T297364 [10:59:55] so, that means I'm allowed to add myself? (instead of following the approval process?) [11:07:20] <_joe_> dcaro: as long as you remove yourself later, that's what hashar stated :) [11:07:33] <_joe_> probably you need that on the task as well though [11:11:41] oh a task, that is great for historical record :) [11:13:14] dcaro: please indeed self add yourself [11:13:20] ack [11:13:26] there is not much formal process really [11:13:36] I guess some kind of trust that you are not going to explode everything [11:13:54] and a task as you filed is always good when later we wonder why someone is in that specific group [11:14:26] I've worked with jenkins closely for many years, I can't ensure that will not happen xd [11:14:32] amazing [11:14:45] you know your motto: "the last person that touches it owns it" [11:14:47] congratulations [11:14:52] hahahaha [11:14:56] ;D [11:18:41] in any case if there's something I can help around with, feel free to ping me :) [11:45:30] hashar: my last place called this "Sysadmin jenga"; I even brought :jenga: along to Slack :) [11:51:22] godog: just fyi, I had to manually rerun puppet to make the 'assemble blackbox' exec work, not sure what was the problem, but rerunning worked (T297372) [11:51:22] T297372: Cloudmetrics hosts failing to run puppet - https://phabricator.wikimedia.org/T297372 [11:51:49] there might be some resource ordering issue [12:51:00] dcaro_lunch: yeah I fixed a bullseye/buster problem in config, perhaps that was it [13:02:56] 👍 [14:20:53] jbond: which cfssl intermediary should I use if I want to add to add tls support for the rabbitmq servers our openstack uses? [14:23:40] majavah: tl;dr if you just need a tls end point then use discovery. if doing mutual auth then create a new one: https://wikitech.wikimedia.org/wiki/PKI/Policy#When_to_create_a_new_intermediate_CA [14:24:47] ok, thanks, discovery it is then [15:31:22] <_joe_> majavah: wait [15:31:38] <_joe_> an openstack service should use a separate pki [15:31:57] <_joe_> not the one used for internal production services [15:41:19] _joe_: if there is no client auth going on then there is no reason imo to use a seperate CA. the fact that the default intermediate is called discovery is infortunate bu i think its fine to have one trusetd intermediate for all TLS endpoints [15:41:40] (not doing mutual auth) [15:42:02] <_joe_> jbond: I would actually like to have separation of production/cloud domains [15:45:51] _joe_: in which case we get to the point of where is the demarcation between cloud and production. currently cloud VM's cant use the production pki system (there is a cloud one for dev). however theses machines are on the production network and use the produciton puppet so from that PoV it becaomse a bit tricker [15:46:31] that said it probably is a good idea to spin up a new CA for cloud bare metal services so we don't make the issue worse [15:46:36] ok sold :) [15:46:44] <_joe_> yeah that was my point [15:46:46] <_joe_> :) [15:47:42] majavah: ^^^ feel free to go ahead with the discovery CA for now, but i will create a new CA for cloud service shortly and have you switrch to it [17:02:19] hello folks, I am going to reimage kafka-main2003 to buster [17:02:33] if you see any trouble lemme know [22:19:08] anyone around to +1 this change, https://gerrit.wikimedia.org/r/745612 [22:19:17] should be harmless [22:22:52] jhathaway: I did, and nitpicked about ticket link because I was curious if its codfw or upgrade :) [22:22:56] also, cool either way [22:23:17] sodium not being a spof anymore is great [22:24:00] mutante: thanks for jumping on and reviewing, I'll address both comments! [22:24:09] are they both going to pull from upstream or is there some kind of syncing between them planned ? [22:24:17] cool :) [22:25:23] The current plan is just two swap one for the other, so we won't be added redundancy, but we will be removing the older sodium box [22:26:11] jhathaway: got it, just upgrade for now. yes, also cool! [22:26:22] jhathaway: you are going to need another thing after this: [22:27:18] check in puppet repo in hieradata/role/common/acme_chief.yaml . that is where we map authorized hosts to domain names / certificates they are allowed to request from Letsencrypt [22:27:33] you will find sodium.wikimedia.org in there as a host that is authorized for mirrors.wikimedia.org [22:27:37] ok [22:27:41] you'll want to add the new host too [22:27:44] thanks [22:27:47] np [22:34:06] also yea, the part that you picked a chemical element name name means it couldnt be codfw. codfw is star names. coperninum threw me off for a second with the astronomy relation [22:34:31] copernicum / copernicium is like alumninum/alumnium [22:35:16] alternatively you can say you want to get rid of the last misc names like that and start a new tradition of mirror1001 [22:35:36] ^ this one would be my favorite please [22:35:50] computers that are named after what they do are the best computers [22:36:01] definitely would be my preference, but I am still finding my way around the china shop [22:37:10] jhathaway: go to https://wikitech.wikimedia.org/wiki/SRE/Infrastructure_naming_conventions#Servers , click edit and just create it :) [22:37:26] when you add it there it becomes official, heh [22:38:09] very true, unless a spam bot rejects my edit ;) [22:38:09] a new 'mirrors' line in that table seems just fine to me [22:38:35] it usually does that when google inserts a google.com into the URL. that's what happened to me the other day, heh [22:38:43] from copy/paste from docs.google [22:39:59] unfortunately you have to recreate the VM one more time for that [22:40:29] well that would be a worthwhile exercise for a newbie like me anyways [22:40:33] you can use the "decom" cookbook on the existing one, then create a fresh one [22:40:39] I was thinking that too [22:41:04] I'll chat with mortiz tomorrow, and probably rename to mirror or perhaps debmirror [22:41:35] also a brandnew thing is that Moritz did [22:41:42] we have owner annotations for puppet roles [22:41:45] like https://gerrit.wikimedia.org/r/c/operations/puppet/+/738426/3/hieradata/role/common/dragonfly/supernode.yaml [22:42:30] so when you add a new "cluster name" that normally goes with a role which goes with a team now [22:42:44] sounds good, *nod* [22:43:18] sounds nice [22:45:47] https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Remove_from_production [22:46:03] see the "Run the sre.hosts.decommission cookbook, " part there [22:46:18] that's what you could try for the old VM if you decide to not use that name [22:46:23] laters [22:48:35] perfect