[02:28:40] (SystemdUnitFailed) firing: netbox_ganeti_esams_sync.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:28:40] (SystemdUnitFailed) firing: netbox_ganeti_esams_sync.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:41:42] moritzm, jbond, ^ what's the clean way to delete a systemd timer manually? The current puppet code doesn't take care of it (only for the active/passive node - https://github.com/wikimedia/operations-puppet/blob/57400a16a06dbe47ff4e869d0cf1baeae36afc68/modules/profile/manifests/netbox.pp#L371 ) [06:48:31] having a look [06:54:29] I've removed it manually (sysctlctl stop on the broken unit, rm on the .rimer and .service units), systemctl daemon-reload and final systemctl daemon-reload [06:54:31] (SystemdUnitFailed) resolved: netbox_ganeti_esams_sync.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:55:27] this is really an edge case, it's the first timer we've ever removed a Ganeti cluster from production after all :-) [06:57:06] yeah, doesn't need to be done through puppet [06:57:11] thanks [08:14:48] 10netops, 10Infrastructure-Foundations, 10SRE: Default allowed SSH parameters on upgraded Juniper mgmt routers prevent some connections - https://phabricator.wikimedia.org/T320272 (10ayounsi) For the record, another possible workaround: ` mr1-esams> start shell % ssh root@10.80.128.6 -m hmac-... [08:17:31] 10netops, 10Infrastructure-Foundations, 10SRE: Implement better filter on BGP_Customer_out - https://phabricator.wikimedia.org/T340448 (10ayounsi) 05Open→03Resolved All done. [11:26:45] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Netbox to 3.6.x - https://phabricator.wikimedia.org/T336275 (10MoritzMuehlenhoff) Updating also fixes CVE-2023-37625: https://github.com/netbox-community/netbox/issues/12205 https://github.com/benjaminpsinclair/Netbox-CVE-2023-37625 [11:35:41] 10netops, 10Infrastructure-Foundations, 10SRE: Add non-EVPN L3 Switch routing policy definitions to Homer - https://phabricator.wikimedia.org/T344601 (10cmooney) p:05Triage→03Low [11:35:52] 10netops, 10Infrastructure-Foundations, 10SRE: Add non-EVPN L3 Switch routing policy definitions to Homer - https://phabricator.wikimedia.org/T344601 (10cmooney) [11:35:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Consolidate Automation Templates for DC Switches - https://phabricator.wikimedia.org/T312635 (10cmooney) [20:03:40] (SystemdUnitFailed) firing: (2) ferm.service Failed on aux-k8s-ctrl1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed