[10:11:51] 10netops, 10Infrastructure-Foundations, 10ops-codfw: Upgrade new codfw switches to Juniper recommended - https://phabricator.wikimedia.org/T341670 (10ayounsi) [10:24:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10ayounsi) Could potentially help {T341670} [13:07:10] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Upgrade new codfw switches to Juniper recommended - https://phabricator.wikimedia.org/T341670 (10Papaul) @ayounsi we can still factory reset them and do ZTP again. [13:28:48] andrewbogott: I will update the documentation but you can find the list in modules/profile/manifests/lvs/configuration.pp [13:28:56] we will update the above to make sure it reflects that, thanks [13:29:27] it would be nice to get back the cumin aliases that were rolledback.... [13:30:01] sukhe: great! In the meantime... I'm a bit worried that I still have a change in there that's pending a restart, can you confirm if that's true? [13:30:50] andrewbogott: checking [13:30:57] ty! [13:32:40] andrewbogott: looks good, the alert was expected during that temporary state [13:32:57] great, thank you for checking my work! [13:34:02] volans: in theory yes, but that aliases were tied in to the commits that were rolled back. I will check again this week to see if we can just put in the aliases [13:35:39] they would, among other things, easily allow to automate those restarts with a cookbook that knows whatt to do, in which order and what to check to ensure to do it without outages or alerts/pages [13:36:35] (alias or similar easy way to query for the various lvs hosts based on role and group) [13:37:08] yes that's fair. the reverts were important at that time because we couldn't provision the hosts. but we should revisit those and I will [13:53:42] (SystemdUnitFailed) firing: user@0.service Failed on cp1078:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:58:42] (SystemdUnitFailed) resolved: user@0.service Failed on cp1078:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed