[00:04:45] (SystemdUnitFailed) firing: (6) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:04:45] (SystemdUnitFailed) firing: (6) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:04:45] (SystemdUnitFailed) firing: (6) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:58:54] moritzm: I guess debmonitor2003 is still in "experimental" setup, should it be downtimed/have notifications disabled to avoid the above spam? [09:05:26] fyi, I'm on vacation starting tomorrow, back on July 12th, let me know if there is anything specific I should look at today [09:05:50] moritzm: yeah, I'll extend the downtime, it has expired over the weekend [09:05:58] volans: yeah, I'll extend the downtime, it has expired over the weekend [09:06:21] ack, thx [09:06:31] XioNoX: world peace [09:06:44] :) [10:55:34] debmonitor is showing out-of-date information for me regarding presto-server. Is there something I should do to kick the hosts or something? https://debmonitor.wikimedia.org/packages/presto-server They're all running 0.281-1 [10:59:04] btullis: how did you upgrade them? [10:59:30] when did you upgrade them [10:59:30] ? [10:59:55] With cumin. `apt install presto-server` - This morning, a couple of hours ago. [11:00:53] I was going to use debdeploy, but I got an error from `generate-depbdeploy-spec` that I didn't understand, so I switched to cumin for ease. [11:02:58] In case you're interested, this is the error from `generate-debdeploy-spec` [11:03:04] https://www.irccloud.com/pastebin/J7maeZ99/ [11:03:50] did the package change name? [11:05:23] anyway that's for morit.z ;) [11:05:47] as for debmonitor btullis do you have the output of any of the upgrades? to see the lines related to debmonitor [11:05:58] No, no changes of name. [11:07:23] I think I probably quit out of cumin half-way through, thinking that I needed to change the command to `apt install -y presto-server` but it turns out that I didn't. So I lost the output. [11:07:54] :( [11:08:35] so I tested running the timer that daily runs a full import into debmonitor and an-presto1002 is correctly shown as upgraded [11:09:00] jbond: FYI I'm disabling puppet on the production IDP and testing my OIDC fix on test, before rolling out to prod. [11:09:08] so within 24h they should all be in sync. As on the why this happened without any output is hard to guess [11:09:30] let me try one thing first [11:09:42] volans: OK, cool. Thanks. Make sense. I didn't know about that timer. [11:10:21] it's described here: https://wikitech.wikimedia.org/wiki/DebMonitor#DebMonitor_client [11:10:37] s/cron/timer/ [11:10:44] slyngs: ack [11:11:50] btullis: also, if you run a cumin command without batching (-b/--batch) ctrl+c is most likely not doing what you want ;) [11:13:10] I just tested to upgrade a package on sretest1001 (bullseye like your host) and it got updated correctly: [11:13:13] INFO:debmonitor:Got 1 updates from dpkg hook version 3 [11:13:15] INFO:debmonitor:Successfully sent the dpkg_hook update to the DebMonitor server [11:14:19] volans: Got it! Thanks. I'll put this one down to experience. [11:14:43] The upgrade worked anyway, which is the main thing :-) [11:18:36] if the package restarts the server it probably restarted all of them at the same time though... [11:21:26] jbond: Cool, Netbox-Next now runs on OIDC and IDP didn't break :-) [11:22:21] nice! [11:23:33] I am kinda pleased with myself :-) [11:28:57] nice work! [11:32:19] slyngs: awesome :) [12:03:58] volans: https://github.com/netbox-community/netbox/issues/13002 <- I think this conflicts with their commercial version, which I believe have a similar feature, but let's see [13:47:43] 10netbox, 10Infrastructure-Foundations: Markdown bug in Netbox-next - https://phabricator.wikimedia.org/T340444 (10ayounsi) [14:18:53] 10CAS-SSO, 10Infrastructure-Foundations: idp.wikimedia.org has text overlap problems at intermediate screen widths - https://phabricator.wikimedia.org/T297525 (10joanna_borun) 05Open→03Resolved a:03joanna_borun [14:33:55] 10CAS-SSO, 10Infrastructure-Foundations, 10GitLab (Auth & Access), 10Release-Engineering-Team (Radar): Attempting to login to gitlab.wikimedia.org sometimes results in CAS 500 Internal Server Error - https://phabricator.wikimedia.org/T291964 (10jbond) 05Open→03Resolved a:03jbond Closing due to no res... [14:34:34] slyngs: so the error on netbox's puppet is this one: [14:34:35] Error 500 on SERVER: Server Error: Function lookup() did not find a value for the name 'profile::netbox::oidc_secret' [14:34:41] was not actually disabled, was just failing [14:34:46] (and still is) [14:34:59] Okay, just a sec, we can just fix that [14:35:16] thx, no hurry can be after the meeting [14:35:50] 10CAS-SSO, 10Infrastructure-Foundations: Investigate making cas capable of handling case insensitive usernames - https://phabricator.wikimedia.org/T279974 (10jbond) [14:35:59] 10CAS-SSO, 10Infrastructure-Foundations: Investigate making cas usernames case sensitive - https://phabricator.wikimedia.org/T256656 (10jbond) [14:40:25] volans: Should work on next run [14:42:02] rying [14:42:04] *trying [14:42:52] 10CAS-SSO, 10Infrastructure-Foundations, 10GitLab (Auth & Access), 10Release-Engineering-Team (Priority Backlog 📥), 10User-brennen: GitLab sessions expire frequently - https://phabricator.wikimedia.org/T330359 (10jbond) @thcipriani I think that's sounds about right to me, gitlab is the only service using... [14:43:08] puppet runs fine [14:43:12] thx for the fix [14:43:57] And Netbox still runs :-) [14:46:56] yeah that too [14:49:20] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE: WebAuthn FIDO2 support in CAS - https://phabricator.wikimedia.org/T277841 (10jbond) now targeted for cas 7.0 [14:49:34] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [15:04:16] 10netops, 10Infrastructure-Foundations: Implement better filter on BGP_Customer_out - https://phabricator.wikimedia.org/T340448 (10ayounsi) p:05Triage→03Low [15:18:08] 10CAS-SSO, 10Infrastructure-Foundations, 10User-jbond: Deprecation of U2F API in Chrome / Enable web auth in CAS - https://phabricator.wikimedia.org/T296629 (10MoritzMuehlenhoff) [15:18:20] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE: WebAuthn FIDO2 support in CAS - https://phabricator.wikimedia.org/T277841 (10MoritzMuehlenhoff) [15:19:55] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Validate user lockout - https://phabricator.wikimedia.org/T233946 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This has been implemented a while ago the sre.idm.logout cookbook. I runs various logout scripts (e.g. one whic... [15:20:01] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10Security-Team, 10User-jbond: Further steps for CAS/web SSO - https://phabricator.wikimedia.org/T233921 (10MoritzMuehlenhoff) [15:26:26] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Document IDP MFA policy and processes - https://phabricator.wikimedia.org/T284725 (10MoritzMuehlenhoff) [15:34:28] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [15:54:57] * jbond [15:58:58] moritzm: do you want to review my patch to add user ca support to ssh, https://gerrit.wikimedia.org/r/c/operations/puppet/+/931694, I didn't want to submit the patch, if you had it in your review queue [16:03:19] I haven't had a look, I can do so tomorrow, but otherwise also feel free to go ahead and merge already [16:06:10] sounds good, thanks [16:50:35] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Spicerack: Add support for knams as PoP in tooling and automation - https://phabricator.wikimedia.org/T340465 (10Volans) p:05Triage→03Medium [17:09:45] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:19:45] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:47:09] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10Andrew)