[12:22:51] hi there, I have somehow been removed from RelEng's LDAP group: https://ldap.toolforge.org/group/releng I can see some other team members are missing too [12:22:59] mutante,eoghan: maybe this is something you're familiar/can help with? [12:23:42] Hm, not sure why you might have been removed. But I can add you back. Do you know who else is missing? [12:24:58] yep, Brennen, Jeena and Andre (all from RelEng) [12:27:36] <_joe_> eoghan: let's try to understand how that happened [12:27:40] Yep [12:27:47] Not adding back yet [12:28:01] <_joe_> it might be something sinister or a simple fat-fingering [12:28:45] I have a suspicion that it might be something to do with something I did. I didn't add or remove any users from any LDAP group. [12:29:10] The group says it was last modified on 2023-12-01 125904Z. [12:29:33] But not sure if that counts user changes [12:29:53] jnuche: Do you know when you noticed the permissions missing? Could they have been gone since December 1st? [12:30:12] The work I did was on this ticket: https://phabricator.wikimedia.org/T355352 - I was working on finding out why the archiva-deployers group was no longer allowing users to upload artifacts to archiva. [12:31:03] grepping the sldap audit log on serpens suggests dancy would have been removed at 20231201125904Z from mwmaint2002 [12:31:32] eoghan: I just noticed it but I was on vacation + life event leave shortly after our offsite in early Dec, so it's possible they've been gone since that date [12:31:52] jnuche: Got it, ta. [12:31:58] Oh thanks taavi - so it's probably unrelated to what I did then. [12:32:28] the timing matches with T352334 [12:32:29] T352334: Grant Access to wmf, releng, ciadmin for sandeeps - https://phabricator.wikimedia.org/T352334 [12:32:52] Huh. [12:35:54] huh, I might be wrong - the same log on seaborgium (the other ldap primary) shows it just adding sandeeps as expected [12:38:15] Are the groups the same on both hosts? i.e., would it be possible that replication is somehow incorrect? [12:44:06] they're the same on both, and contain dancy on both - did you just add them back? [12:44:20] I didn't make any changes yet [12:45:31] sigh, I've confused two people, haven't I? [12:46:19] Well, I'm confused. Does that make three? :D [12:52:28] yeah I was looking for the logs for a wrong person [12:54:38] oh, sry about that, dancy isn't missing, it's these four users: [12:54:39] https://ldap.toolforge.org/user/brennen [12:54:39] https://ldap.toolforge.org/user/jhuneidi [12:54:39] https://ldap.toolforge.org/user/jnuche [12:54:39] https://ldap.toolforge.org/user/aklapper [12:55:27] that just makes things more confusing, the logs shows jnuche being added to cn=wmf, cn=ciadmin (those two are ok), and cn=archiva-deployers (???) at a time that matches T301149 [12:55:28] T301149: Grant Access to wmf, releng, ciadmin for jnuche - https://phabricator.wikimedia.org/T301149 [12:58:07] But no removal, right? [13:01:20] I am not seeing any traces except https://phabricator.wikimedia.org/T301149#7690777 that jnuche was ever in that group [13:04:08] How about brennen? [13:06:35] nope [13:07:03] I noticed because I was getting my admin access here through the `releng` group: https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/blob/master/conf/releasing/casc/jenkins.yaml?ref_type=heads#L62 [13:07:10] let's restore the ldap backup from before whatever was done in https://phabricator.wikimedia.org/T355352 and see what the group looked like in there? [13:07:14] and I had to add me explicitly for it to work again: https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/blame/master/conf/releasing/casc/jenkins.yaml?ref_type=heads#L67 [13:09:16] is https://wikitech.wikimedia.org/wiki/Bacula#Restore_(aka_Panic_mode) something I can just do myself or should I ask someone who knows more about backups to do it for me? [13:25:33] I think you should be able to DIY, but jynus is the backup expert [13:26:04] Yeah, that is written so it be done by anyone [13:26:14] but let me know if you need help [13:27:05] in general, for backup patches I only ask notification to help afterwards, there is no need for permission to do things [13:28:38] is it re: archiva1002.wikimedia.org-Monthly-1st-Thu-productionEqiad-var-lib-archiva ? [13:29:46] taavi: hopefully this helps: https://phabricator.wikimedia.org/P55738 [13:30:32] or if you have a concrete request, I can recover the files for you [13:32:09] standing by for now [13:33:47] jynus: thanks, I was looking for the openldap backup from seaborgium or serpens but I managed to do it myself [13:33:59] perfect! :-D [13:34:21] that's good news, it means the procedure worked! [13:34:55] it worked great, I was looking for a backup before a certain date and it had a function to do exactly that [13:35:48] I am so happy about this- 99% of my work is expecting the worse and normally it doesn't happen, so it makes me happy when it is get used [13:38:41] I just restored another backup that's a bit older and that also worked flawlessly [13:40:02] unfortunately that also does not solve the mystery. the version of cn=releng before T352334 (last modified 20230619070929Z) also does not contain jnuche [13:40:03] T352334: Grant Access to wmf, releng, ciadmin for sandeeps - https://phabricator.wikimedia.org/T352334 [13:40:54] there is 3 months of backups, maybe it can be bisected [13:41:05] or the issue could be somewhere else? [13:41:37] oh, I see 202306 [13:41:52] unfortunately last June is more than three months away :/ [13:43:04] _joe_: eoghan: I'm starting to run out of ideas. the phab comment in https://phabricator.wikimedia.org/T301149#7690777 is the only record I'm finding at all of jnuche ever being in cn=releng. [13:43:39] I don't get it, if I was already missing in June, I shouldn't have been able to get admin access to our releases Jenkins instance at least since then [13:43:43] and that was working until recently [13:43:46] weird [13:46:03] <_joe_> uhm [13:46:26] <_joe_> and I would suggest we don't restore from backups tbh [13:46:52] <_joe_> moritzm: shouldn't the script that checks accounts find out such discrepancies? [13:47:06] <_joe_> people being in the releng group in data.yaml but not in ldap? [13:47:23] _joe_: he was only comparing/checking, to my knowledge (aka restoring to the side) [13:47:33] <_joe_> yeah I got that [13:47:53] But I agree at this point to involve infra security [13:48:05] maybe there is something missing but better be sure [13:49:03] I'm not sure how the script could notice someone *not* being in a group, unless you start keeping a copy of all ldap group memberships in the puppet repo somewhere [13:49:16] the script only checks whether all users within _any_ NDA-relevant script is tracked in data.yaml, we don't have a mapping between groups in LDAP and data.yaml specifically [14:32:45] Would there be any benefit making data.yaml the source of truth for group ownership? At least it would be a little easier to see when things changed. Could keep it in private puppet, or require a script to be run manually on a locked down machine or something. [14:32:50] -ownership [14:33:03] membership. That's the one [14:47:10] once we've switched LDAP group membership management towards Bitu such errors would be a thing of the past: no error-prone LDAP commands for such permission changes and we'll see within the IDM who made a change in reprospect (instead of just the shared LDAP DN in the ldap auth log) [14:51:23] moritzm: Ah, excellent. [15:29:33] quick site.pp change if anyone has time to review https://gerrit.wikimedia.org/r/c/operations/puppet/+/992547 [15:29:53] the check experimental failure is expected as the hostname will change [15:32:34] eyes... [15:33:42] jhathaway looks like sukhe already got it. Thanks to you both! [15:33:49] indeed [15:36:05] I tried to look myself but thought jesse beat me :) [15:54:29] good news for those who missed my first CR ;P https://gerrit.wikimedia.org/r/c/operations/puppet/+/993150 [15:55:03] Forgot to remove the host from the LB pool ;( [16:03:31] ^^ nm, that's reviewed too [19:11:46] godog (if still around), did you or anyone try out the debian-12.0-nopuppet image in the 'monitoring' project? I'm hoping to get some feedback on https://phabricator.wikimedia.org/T326818 before I write a doc page