[00:24:45] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:24:45] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:59:34] slyngs: would it be possible to have access back to netbox-next? I need to use it for some tests [06:00:02] or any idea when it will be back? (I can wait a bit) [06:40:12] Sure, I can just flip it back, just a sec [06:51:15] Okay, a bit more, I forgot to point it to production IDP [06:54:40] XioNoX: Okay, it's working again [07:05:13] thx [07:06:59] slyngs: I did the main thing I needed and can be good for a while if you plan on working on it today [07:07:36] Thank, I just need to read some documentation first. I'll give a heads up if/when I switch back [07:57:32] I'll switch netbox-next back to OIDC, apparently I'm not smart enough figure out the issue without trying it [08:10:51] XioNoX: fyi i fixed a few of the last failing tests in https://gerrit.wikimedia.org/r/c/operations/software/homer/+/928795/7..8/ but also see the comments, at least one of them im not sure if the test was rong or if the code was wrong [08:11:34] jbond: thanks, I finished tackling them with PS10 [08:11:59] jbond: yeah I wanted to discuss your comments with you as I'm not sure what to do [08:12:55] we may need riccardo but this is the comment im moist worried about https://gerrit.wikimedia.org/r/c/operations/software/homer/+/928795/comments/4f857b40_109e79a4 [08:15:16] Jbond .... I MADE IT WORK [08:15:24] :D indeed [08:15:28] jbond: looking at it again, that's fine, the netbox_object is not defined during the inventory call anymore, but after the targets have been defined, that's what I try to explain with https://gerrit.wikimedia.org/r/c/operations/software/homer/+/928795/comments/cb3701c7_11523f52 [08:15:45] Coffee, and then Puppet patch [08:16:27] XioNoX: ack i havn't taken a look at the code since that comment il try to do a new pass with fresh eyes today [08:24:45] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:51:26] not read yet but look intresting (using the word loosly) https://sigops.org/s/conferences/hotos/2023/papers/jia.pdf (cc moritzm ) [11:01:46] thanks, will have a closer look. we've disabled unprivileged bpf since 2016, so fortunately the verifier isn't used/needed either [11:02:13] indeed :) [11:31:22] jbond: https://gerrit.wikimedia.org/r/c/operations/puppet/+/932389 <- The only way I've been able to get it constantly working is added a memberOf attribute to OIDC. From something like 6 million tried it seems like you're correct. It first checked either memberOf and then group, we can't really change group, because we need it for OIDC clients, so the solution seems to be to add memberOf as an attribute. [11:32:02] I don't want to merge it today, as it moves things around for existing OIDC clients [11:33:12] sorry, missing part of the sentens. It checks either memberOf or group, when that successes it also checks the other. [11:33:39] So now I believe it just checks memberOf twice [11:36:13] slyngs: thanks for digging into this, it feels like wiered behaviour and we could be missing something. but i think that the the trade of is fine [11:41:09] 10netbox, 10Infrastructure-Foundations: Netbox: PuppetDB import script error with VMs - https://phabricator.wikimedia.org/T340190 (10ayounsi) p:05Triage→03High [11:42:10] Yeah, there's no real downside, other than adding an additional OIDC claim, which is just identical to group, so it's not like we're leaking any information that wasn't already available. [11:47:33] agreeded [11:47:59] Just notice a spelling error in my config, cas.authn.oidc.core.user-defined-scopes.member=memberOf [11:48:15] that was meant to say memberOf, so I don't think it does anything [11:48:34] I'll try removing it [11:48:49] ack [11:51:18] Protip: When messing around with CAS, always restart fully, and always wait for it to do whatever housekeeping it is that it does, otherwise it will mix in cached data with your new config [11:53:10] slyngs: normally the service definitions auto loading works well but for the cas.properties +1 [11:56:56] If you mess with both on/off, I'd recommend just restarting. I had a few attempts where I got logged in after changing group to memberOf in the service definition, but then after a restart it failed. So sure what the exact rule is, I think it's only an issue if you poke at a lot of things in rapid succession [11:57:23] ack good to know [12:09:45] (SystemdUnitFailed) firing: (8) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:12:22] Just a last complaint about CAS from me today: Someone of the OIDC stuff seems like of only works by chance. The claims and scopes doesn't completely make sense. cas.authn.oidc.discovery.claims: List of supported claims.... SURE, but why does that have ANY barring on the internal attributes in CAS. If you don't add memberOf, then you can't use it in requiredAttributes ... WHY? [12:37:27] 10CFSSL-PKI, 10Infrastructure-Foundations: Investigate SCEP proxy options - https://phabricator.wikimedia.org/T340193 (10jbond) p:05Triage→03Medium [12:50:32] 10CFSSL-PKI, 10Infrastructure-Foundations: Investigate SCEP proxy options - https://phabricator.wikimedia.org/T340193 (10ayounsi) FYI https://www.juniper.net/documentation/us/en/software/junos/vpn-ipsec/topics/topic-map/security-configuring-ca-and-local-certificates.html#id-understanding-cmpv2-and-scep-certifi... [13:09:45] (SystemdUnitFailed) firing: (8) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:54:15] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE: Kryo memcached transcoder broken in CAS 6.3/6.4 - https://phabricator.wikimedia.org/T273867 (10Pppery) [17:09:45] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:32:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10fundraising-tech-ops, 10WMF-NDA: reconfigure 1:1 NAT for new eqiad frmon host - https://phabricator.wikimedia.org/T340252 (10Dwisehaupt) [21:09:45] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed