[05:48:26] (SystemdUnitFailed) firing: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:18:26] (SystemdUnitFailed) resolved: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:30:43] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) The fix of using a `FLAT` profile with GitLab oidc was deployed to all idp servers (including wmcs/cloud). Thanks for @SLyn... [10:27:19] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10Fabfur) [10:27:31] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10Fabfur) Included dc-ops [13:30:43] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm [14:03:57] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm executed with er... [14:08:44] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm [14:14:22] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet failure on Beta Cluster role::beta::docker_services boxes - https://phabricator.wikimedia.org/T342038 (10Jdforrester-WMF) CCing you on this task now I've found it. :-) [14:40:07] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm completed: - sre... [14:46:06] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10jbond) > I tried to reproduce it with bookworm on sretest1002 but I got an unrelated error in d-i because of the recent point release 12.1. I've updat... [14:47:40] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10jbond) [15:58:27] (SystemdUnitFailed) firing: httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:18:51] 10Puppet, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Puppet failure on Beta Cluster role::beta::docker_services boxes - https://phabricator.wikimedia.org/T342038 (10Andrew) The attached patch will fix some but not all failures. Because the port wasn't actually used here the different uses of this pro... [16:25:10] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10dancy) In https://gitlab.wikimedia.org/repos/releng/gitlab-settings/-/blob/main/group-management/helpers.py#L223 the ldap group sy... [16:30:46] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet failure on Beta Cluster role::beta::docker_services boxes - https://phabricator.wikimedia.org/T342038 (10Andrew) 05Open→03Resolved a:03Andrew https://gerrit.wikimedia.org/r/c/operations/puppet/+/942690 resolved puppet compilation on the 3 hosts I spot-tested. [16:58:27] (SystemdUnitFailed) resolved: httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:41:10] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10cmooney) I've done some work on this to allow for serving the JunOS image as part of the process. In the initial commits...