[02:23:26] (SystemdUnitFailed) firing: debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:24:24] (SystemdUnitFailed) firing: debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:16:10] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) I switched GitLab oidc login back to produciton idp (https://gerrit.wikimedia.org/r/c/operations/puppet/+/939345). I get th... [10:25:43] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Unrelated DNS diffs shown if decommission and makevm cookbooks run at the same time - https://phabricator.wikimedia.org/T342130 (10Volans) Indeed what John said. For context the integrity of the data is currently ensured by git fast-forward only. If tw... [10:28:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:08:48] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10SLyngshede-WMF) @Jelto I think you have the wrong client id. Should be: "gitlab_replica_oidc". CAS will check the serviceId / URL... [12:13:22] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) We are using `gitlab_replica_oidc` on the replicas (`"identifier" => "gitlab_replica_oidc"` in `/etc/gitlab/gitlab.rb`.). T... [12:18:07] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10SLyngshede-WMF) When I attempt a login I get the following: ` 2023-07-24 12:14... [12:29:38] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) Yes we had the same error before, I reported it in T320390#8930839. After my vacations the error was gone, so I assumed som... [13:39:38] 10SRE-tools, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10Volans) The actual command that detects a Debian Installer is actually `grep -q "BOOT_IMAGE=debian-installer" /proc/cmdline`. Has this changes in bookworm? I've d... [14:28:26] (SystemdUnitFailed) firing: debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:35:28] 10netops, 10Infrastructure-Foundations, 10SRE: mr1 port utilization alerts shouldn't mention hash page in their IRC logs - https://phabricator.wikimedia.org/T281055 (10cmooney) a:03cmooney [14:36:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability, 10good first task: Add Icinga check for SRX cluster status - https://phabricator.wikimedia.org/T271298 (10joanna_borun) [14:39:05] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability: Add Icinga check for SRX cluster status - https://phabricator.wikimedia.org/T271298 (10joanna_borun) [15:27:17] its too bad gittea wasn't around when we made the gitlab decision, it would have been an interesting comparison, especially since at present you they are avoiding an opencore model like gitlab [15:38:04] volans: do you remember why firmware 22.X fails with PXE? at some point we will have to upgrade past 22 [15:38:37] no, sorry, papaul know more ;) [15:38:51] I hope they'll release a newer one that works at some point, or bookworm works maybe? [15:39:21] XioNoX: maybe it works for bookworm, but certainly fails for bullseye [15:39:30] sukhe: nah it's at the pxe level [15:39:42] (I think?) [15:39:44] oh really? for us the failures were within d-i [15:40:54] that must be it then, loong time I looked into it [16:37:21] The interface doesn't come "up" at the PXE-boot stage if I remember correctly [16:37:40] server console appears to be doing the PXE-stage DHCP, but if you look on the switch the port remains down [18:29:24] (SystemdUnitFailed) firing: debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:33:26] (SystemdUnitFailed) firing: debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed