[08:35:03] XioNoX, topranks: FYI; we're currently changing the signers for pwstore and I've added Riccardo and myself to the @netops group in pwstore (so that we're able to get the passphrase for homer after cumin reboots and to be able to re-encrypt the secrets if the users for @netops change) [08:35:33] great, thanks! [08:36:08] this however requires you one (as a one time change) re-encrypt the following files (so that our new keys are added): [08:36:10] ARIN CLOUDINFRA_README gtt homer-key-passphrase management-scs network-root network-ztproot snmp-community Wikimedia-ARIN-RPKI.pem [08:36:23] so [08:36:27] pws rc $filename [08:36:51] some of those might also be related to the @dcops group, though [08:40:59] alright, on it [08:43:29] moritzm: even after updating the keyring I get "Warning: No key found for keyid 59E8F3AF321F239B" [08:49:30] ah, sorry. you need to run "pws update-keyring" as well [08:49:48] this regenerates the key db after the change to the .users file [08:54:31] running it a 3rd time solved that one too :) [08:54:56] I don't have access to CLOUDINFRA_README [08:56:07] moritzm: not an issue but I have to run it twice to not have the error [08:56:23] https://www.irccloud.com/pastebin/3CMaW70z/ [08:57:23] not sure what CLOUDINFRA_README actually contains, it was added by Andrew in 2019, he might have simply missed to set the correct access: header line, I'll ping him, it can probably just be removed [08:57:31] snmp-community also I don't have access [08:57:38] not like they're very secret :) [08:59:02] moritzm: alright, done, let me know how it is [09:01:02] thanks! I think the issue with gtt was that is was previously signed with an older key of Faidon, which got replaced by a newer key (but that got resolved with the re-encypt) [09:01:12] thanks! I can confirm I can access them all [09:03:49] Seems I missed the party [09:04:30] Fwiw I could never open the snmp community one, I’ve been grabbing it from the plaintext in the router config if I ever needed it ;) [09:07:42] snmp-community is most certainly owned by the @dcops group [09:08:16] it currently only has Papaul in there which isn't ideal either and if you both need it, we should just move it to be owned by netops as well? [09:08:57] not strictly needed but better to keep things clean [09:10:40] could please raise this in the next dcops-IF syncup meeting with Papaul? I can help with making the actual ACL change once confirmed [09:11:11] moritzm: confirmed what? [09:15:24] to just move the snmp-community secret to @netops (which Papaul is also part in as far as pwstore is concerned) [09:16:42] excuse my ignorance but can both groups have access? Dc-ops probably need for some things? PDU setup possibly? [09:17:27] I can confirm too I can access those re-encrypted files [09:17:30] sry - you said only papaul is in there - so yeah moving to netops makes sense [09:18:49] yeah, I guess that makes it all simpler [09:38:56] moritzm: I confirm :) [09:39:03] no need for a sync up meeting [09:40:50] ok :-) [09:42:07] I'll send a mail to Papaul, you, Cathal and Riccardo with the steps (Papaul needs to resign it given he's currently the only one with access) [09:42:27] thx! [09:48:27] done, sent a mail [09:55:56] thx [11:19:20] hi all, i just wanted to say i recived the gifts from you all. Have not managed to play the game yet but definetly will this xmas looks right up our street. and the book looks great cant wait to start working out way through thjanks you very much <3 [11:20:27] <3333 [11:21:39] <3 enjoy! [11:22:11] jbond: glad to hear, have a nice break (as you didn't got one :-P ) and enjoy! [11:22:53] lol yes last day tomorrow then a nice break :) [11:27:23] enjoy and have a nice xmas break :-) [11:30:26] thanks you too :) [11:31:09] moritzm: we have a small problem with cumin1002... it doesn't have homer's ssh key because not yet homer-configured [11:31:24] that means that all coolboks using ssh to connect to network devices aren't working [11:31:53] that means at least the provision/decommission/configure-switch-interfaces ones [11:32:11] then let's simply move forward with the move of the repo from cumin1001 to cumin1002? that would fix it [11:32:27] yeah I think it's the easiest path, I can probably do it right now [11:32:34] cc topranks as you wanted to be involved [11:32:50] yeah, probably the cleanest and simplest way to fix [11:33:50] either that or temporary "disable" cumin1002, but it was already communicated [11:33:51] sry yeah I was supposed to take a look at that [11:34:00] let's take a look now [11:34:35] XioNoX: what are you thoughts? [11:34:58] yep, let's roll forward, and test it well before the break :) [11:35:42] breaking for the break :D [11:36:06] :) [11:36:07] volans: what's the process exactly? [11:36:13] ok I'm disabling puppet, scp'ing the repo, changing the remote [11:36:18] I see there is "remote_peer" config in .git [11:36:20] can someone prepare the puppet patch to re-enable homer? [11:36:58] right so just scp from cumin1001 to cumin1002 and then change the remote on cumin2002? [11:37:21] !log disabled puppet on cumin hosts for homer's private repo move [11:37:21] volans: Not expecting to hear !log here [11:37:29] yeah I kno stashbot was a private logging :D [11:37:43] hahaha [11:41:34] volans: I'm not 100% sure what the puppet patch needs to re-enable homer? [11:41:53] check profile::homer::disable [11:41:58] ah [11:44:52] copied repo, adjusted .git/config, adjusted permissions (I had to scp as rsync is not installed), updating /srv/homer/private/README to use it as a test commit [11:46:15] https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/984642/ [11:46:43] sorry that's the wrong link [11:46:46] +1ed [11:46:54] https://gerrit.wikimedia.org/r/c/operations/puppet/+/984820 [11:46:57] ok [11:48:08] feel free to merge puppet is disabled [11:48:35] ok will do [11:48:48] should we add "profile::homer::disable: true" for cumin1001 now as well? [11:50:14] probably let's leave it for later [11:50:56] ok [11:51:37] ok tested commits on both cumin1002 and cumin2002, they show up in the other peer [11:51:49] topranks: let me know once that patch is puppet-merged [11:52:17] I'm documenting what I did on the README in the privte repo (and used the update to commit) [11:53:17] *used that update as test commit [11:53:49] voalns: merged now [11:54:06] great, running on cumin2002 first [11:54:10] ok [11:54:20] it should be a noop [11:54:32] wait a sec [11:54:37] did we change cumin2002's hiera? [11:55:01] topranks: hieradata/hosts/cumin2002.yaml needs to be updated too [11:55:16] profile::homer::private_git_peer: cumin1001.eqiad.wmnet [11:55:23] sorry I missed that [11:56:19] no my bad should have thought of it [11:56:23] https://gerrit.wikimedia.org/r/c/operations/puppet/+/984821 [11:56:40] +1ed thx [11:56:51] cheers [11:57:09] merging now [11:57:13] * volans waiting for jenkins [11:57:18] thx [11:58:44] volans: ok merged now if you want to try again [11:58:48] ack thx [11:59:14] topranks, XioNoX: you both used cumin1001 for cookbooks, please stop and start using cumin1002/cumin2002 [11:59:28] will try :) [12:00:24] Homer working on cumin1002 now :) [12:00:26] I removed it from .ssh/known_hosts.d/wmf-prod to get a nudge in case I forget [12:01:03] topranks: I didn't yet run puppet nor armed keyholder... [12:01:50] keyholder was already armed when I checked [12:01:59] there was no key [12:02:10] are you sure you weren't on 1001? [12:02:37] still not there [12:02:40] yeah 100% [12:02:46] no key in /etc/keyholder.d [12:02:55] it's being added as we speak [12:03:06] sorry..... I was on 2002 [12:04:55] running homer 'cr2-eqiad.wikimedia.org' diff [12:07:48] still waiting, running another one against a switch with verbose [12:08:04] we're doing WAYYYYY too many api calls to netbox [12:08:19] https://phabricator.wikimedia.org/T271864 :) [12:08:33] I know, I opened it :D [12:08:50] INFO:homer:Homer run completed successfully on 1 devices: ['asw2-a-eqiad.mgmt.eqiad.wmnet'] [12:08:53] yay [12:08:56] waiting for the router [12:09:05] the more we get rid of row wide VCs, the less api calls we're going to do [12:09:34] 356 API calls is not that bad as we have to query all servers connected to those switches [12:09:55] we should probably look at a "regular" switch, and a "regular" router to have a better idea [12:10:44] but could probably be done with GQL in a better way [12:11:00] yep [12:11:11] but then it's quite a change [12:11:38] sorry I'm stupid, it was on cumin2002 that it was successful, I was trying both old and new [12:12:18] GQL definitely speeds things up :) https://gerrit.wikimedia.org/r/c/operations/software/homer/+/928795/ [12:12:42] https://phabricator.wikimedia.org/T341968 is relevant too [12:13:03] ahem... [12:13:09] if I don't deploy homer it will be hard to run it [12:13:47] :facepalm: [12:16:26] volans: hahaha [12:16:49] I'm very glad you did that, I was feeling extremely stupid myself after doing so [12:17:53] I was trying to work out what was wrong but that makes sense [12:18:09] "/srv/deployment/homer/venv/bin/homer" wasn't there homer command was just hanging on me [12:18:27] now running homer for real [12:18:27] :D [12:19:10] it works \o/ [12:19:29] INFO:homer:Homer run completed successfully on 1 devices: ['asw2-a-eqiad.mgmt.eqiad.wmnet'] [12:19:32] for real now [12:19:36] re-trying my cookbok patch [12:19:53] \o/ [12:20:50] XioNoX: you have a terminal open on asw2-a1-eqiad [12:21:46] closed, not sure it's a blocker though [12:22:01] it's scary :D [12:23:06] how can we make sure people don't try to use cookbooks on cumin1001 anymore? [12:23:22] at least the network related ones [12:23:30] https://phabricator.wikimedia.org/P54506 [12:25:05] bigger MOTD like deploy1002? :D I'm checking SAL/cumin logs every couple of days [12:25:34] volans: can we prevent the unpriv cookbooks from running there? [12:26:32] yes but prevents only some part of dcops, not sure how that helps the bgger problem [12:28:26] they're the team that uses the most the network related cookbooks [12:30:45] or simply send a followup the mail I sent some days ago about cumin1002, telling folks that homer is now also moved and that means cookcooks really won't work? [12:30:54] (if they use homer under the hood) [13:25:57] the cookbooks don't use homer under the hood [13:26:35] they do rely on the Homer ssh key being in keyholder though, so if we de-arm that it would have similar result (cookbooks won't succeed) [13:26:56] ack,ok [15:45:55] FYI I just forced a run of check-homer-diff.service on cumin1002 to check if all works fine instead of discovering it tomorrow morning [16:50:16] XioNoX, topranks: the check-homer-diff.service run was ok but I didn't get any email [16:50:34] logs are not very helpful, at lest journalctl -u check-homer-diff.service [16:50:58] I'm about to logoff, could you maybe have a look tomorrow if anyone of you is working? [16:54:05] I'm off tomorrow too but I'll have a little look now see if I can work it out [16:54:16] If we're not speaking have a great break and happy new year! [16:59:26] seems if I execute the "mail" command similar to the diff script the mail is accepted by mx1001 [17:00:07] https://phabricator.wikimedia.org/P54508 [17:02:02] volans: seems I got my test mail, but I also got a rancid-diff mail at 16:14 [17:02:12] which originated on cumin1002, so seems like it's working? [17:04:11] doh [17:04:34] how I missed that [17:04:35] lol [17:04:42] I marked it as read [17:04:49] without noticing was the one I was looking for [17:04:55] topranks: thanks for fact-checking me [17:04:58] it all seems to work fine [17:05:31] ah no worries! [17:05:38] better than it was left in a broken state for sure :) [17:21:59] (SystemdUnitFailed) firing: update-tails-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:37:01] (SystemdUnitFailed) resolved: update-tails-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed