[00:33:54] (SystemdUnitFailed) firing: (3) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:33:54] (SystemdUnitFailed) firing: (3) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:39:49] 10CAS-SSO, 10Data-Platform-SRE, 10Infrastructure-Foundations: Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10Stevemunene) Expanding/adding the `AUTH_OIDC_SCOPE` doesn't seem to have had much impact on the SSO process, we are still getting the same error. ` 2023-08-28 14... [06:08:54] (SystemdUnitFailed) firing: (4) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:46:11] 10CAS-SSO, 10Data-Platform-SRE, 10Infrastructure-Foundations: Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10SLyngshede-WMF) One issue we ran into with Gitlab also involved Gitlab not being able to locate OIDC attributes. This was as a result of how CAS returns the attri... [07:07:26] 10CAS-SSO, 10Data-Platform-SRE, 10Infrastructure-Foundations: Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10Stevemunene) >>! In T305874#9125628, @SLyngshede-WMF wrote: > One issue we ran into with Gitlab also involved Gitlab not being able to locate OIDC attributes. Thi... [07:24:40] moritzm, jbond: are the above debmonitor-maintenance-gc.service Failed on debmonitor2003 alerts something I should look at or part of the new puppet setup WIP? [07:26:08] WIP, it expired, I'll silence it again [07:26:58] done [07:29:32] ok thx, lmk if you need a hand with that if needed [07:36:38] there's currently no blockers, because we haven't worked on it for a while, if that changes I'll let you know :-) [07:40:31] sgtm :D [07:45:47] 10CAS-SSO, 10Data-Platform-SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10Stevemunene) >> You can try switching the format to "FLAT" as with Gitlab, that might help datahub locate the attributes >> >> Example fro... [07:58:03] 10CAS-SSO, 10Data-Platform-SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10Stevemunene) [08:00:57] I'm not sure if it was already discussed internally but I came across this VM request that probably needs our attention: https://phabricator.wikimedia.org/T344164 [08:08:50] thanks for the pointer, I'll follow up there later [08:11:33] thanks! [09:37:29] 10netops, 10Infrastructure-Foundations: xe-3/2/1: down -> Transport: cr1-esams:xe-0/0/7 (Lumen, BDFS2448 80ms 10Gbps wave) {#2013} - https://phabricator.wikimedia.org/T345138 (10Clement_Goubert) [09:39:25] 10netops, 10Infrastructure-Foundations: xe-3/2/1: down -> Transport: cr1-esams:xe-0/0/7 (Lumen, BDFS2448 80ms 10Gbps wave) {#2013} - https://phabricator.wikimedia.org/T345138 (10ops-monitoring-bot) ===== Automated diagnostic for Netbox interface ID cr1-esams:xe-0/0/7 --- **Interface cr1-esams:xe-0/0/7** - adm... [09:49:54] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Puppet (Puppet 7.0): Cumin: update config to use new puppet7 infrastructure - https://phabricator.wikimedia.org/T341497 (10Volans) [10:55:00] (SystemdUnitFailed) firing: puppet-agent-timer.service Failed on debmonitor1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:18:55] (SystemdUnitFailed) resolved: puppet-agent-timer.service Failed on debmonitor1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:21:17] 10Puppet, 10netbox, 10Infrastructure-Foundations, 10SRE, and 3 others: Netbox: use the netbox to also sync networks and network devices - https://phabricator.wikimedia.org/T329272 (10fgiunchedi) >>! In T329272#9122382, @ayounsi wrote: > Looking at the `parents` field. > So far we've been defining them man... [12:24:00] 10netops, 10Infrastructure-Foundations, 10SRE: xe-3/2/1: down -> Transport: cr1-esams:xe-0/0/7 (Lumen, BDFS2448 80ms 10Gbps wave) {#2013} - https://phabricator.wikimedia.org/T345138 (10ayounsi) > Currently your circuit is being affected by a higher level outage. I will continue to provide updates as I recei... [12:40:11] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [13:01:52] 10SRE-tools, 10Ganeti, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-MoritzMuehlenhoff: cookbook sre.ganeti.makevm fails when no group is set - https://phabricator.wikimedia.org/T344813 (10Volans) a:03Volans [13:45:34] 10SRE-tools, 10Ganeti, 10Infrastructure-Foundations, 10Spicerack, and 2 others: cookbook sre.ganeti.makevm calls wrong netbox_ganeti_codfw_sync.service - https://phabricator.wikimedia.org/T344812 (10Volans) 05Open→03In progress a:03Volans [13:45:42] 10SRE-tools, 10Ganeti, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-MoritzMuehlenhoff: cookbook sre.ganeti.makevm fails when no group is set - https://phabricator.wikimedia.org/T344813 (10Volans) 05Open→03In progress [13:45:48] 10SRE-tools, 10Ganeti, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-MoritzMuehlenhoff: cookbook sre.ganeti.makevm fails when no group is set - https://phabricator.wikimedia.org/T344813 (10Volans) p:05Triage→03Medium [13:46:06] 10SRE-tools, 10Ganeti, 10Infrastructure-Foundations, 10Spicerack, and 2 others: cookbook sre.ganeti.makevm calls wrong netbox_ganeti_codfw_sync.service - https://phabricator.wikimedia.org/T344812 (10Volans) p:05Triage→03Medium [14:14:21] (ProbeDown) firing: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:19:21] (ProbeDown) resolved: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:40:43] 10SRE-tools, 10Ganeti, 10Infrastructure-Foundations, 10User-MoritzMuehlenhoff: cookbook sre.ganeti.makevm fails when no group is set - https://phabricator.wikimedia.org/T344813 (10Volans) 05In progress→03Resolved [14:40:52] 10SRE-tools, 10Ganeti, 10Infrastructure-Foundations, 10Spicerack, 10User-MoritzMuehlenhoff: cookbook sre.ganeti.makevm calls wrong netbox_ganeti_codfw_sync.service - https://phabricator.wikimedia.org/T344812 (10Volans) 05In progress→03Resolved