[00:31:46] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-openldap-exporter.service on seaborgium:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:51:46] FIRING: [3x] SystemdUnitFailed: netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:16:46] FIRING: [3x] SystemdUnitFailed: netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:16:46] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-openldap-exporter.service on seaborgium:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:43:46] 10netops, 06Infrastructure-Foundations, 06SRE: magru network setup - https://phabricator.wikimedia.org/T362421#9811967 (10cmooney) >>! In T362421#9808627, @ayounsi wrote: > The Telxius community doesn't seem to be of any effect so far, I'll wait for their reply, maybe they changed or need to be enabled on th... [11:16:46] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-openldap-exporter.service on seaborgium:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:46:39] hello folks [12:48:30] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE, 13Patch-For-Review: Problem re-imaging hosts on row-wide vlan on EVPN switches - https://phabricator.wikimedia.org/T365204#9812357 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host sretest2002.... [13:58:34] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Problem re-imaging hosts on row-wide vlan on EVPN switches - https://phabricator.wikimedia.org/T365204#9812556 (10cmooney) So some interesting findings when testing today. I was able to reproduce the issue with sretest2002, and t... [14:02:09] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Problem re-imaging hosts on row-wide vlan on EVPN switches - https://phabricator.wikimedia.org/T365204#9812566 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1002 for host sretest2002.wikimedia... [14:33:12] 10CFSSL-PKI, 06Infrastructure-Foundations: Establish a process to periodically upgrade the CFSSL infrastructure - https://phabricator.wikimedia.org/T365361#9812632 (10CDanis) [14:33:28] 10CFSSL-PKI, 06Infrastructure-Foundations: Alert and automate the renewal of CFSSL intermediate CAs - https://phabricator.wikimedia.org/T365362#9812633 (10CDanis) [15:16:46] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-openldap-exporter.service on seaborgium:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:34:21] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review, 10Puppet (Puppet 7.0): Spicerack puppetserver.destroy() raises an exception when certificate does not exist - https://phabricator.wikimedia.org/T360293#9812818 (10Volans) 05Open→03Resolved This is now live. [15:36:27] the PKI project in cloud should be on bullseye with puppet fixed [15:36:30] in theory :D [16:09:53] 10SRE-tools, 10Spicerack: Spicerack: expand Supermicro support in the Redfish module - https://phabricator.wikimedia.org/T365372 (10Volans) 03NEW p:05Triage→03Medium [16:10:05] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: [spicerack] python-kafka does not support python 3.12, there's a fix but there has not been any releases since 2020 - https://phabricator.wikimedia.org/T354410#9813004 (10Volans) [16:10:27] 10netbox, 10Cumin, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cumin: add backend for Netbox - https://phabricator.wikimedia.org/T205900#9813005 (10Volans) [19:16:46] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-openldap-exporter.service on seaborgium:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:46:15] 10Mail, 06Infrastructure-Foundations, 06SRE: Postfix outbound rollout sequence, mx-out - https://phabricator.wikimedia.org/T365395 (10jhathaway) 03NEW [20:34:01] 10Mail, 06Infrastructure-Foundations, 06SRE: Postfix outbound rollout sequence, mx-out - https://phabricator.wikimedia.org/T365395#9814198 (10jhathaway) p:05Triage→03Medium [21:45:03] 10Mail, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Postfix outbound rollout sequence, mx-out - https://phabricator.wikimedia.org/T365395#9814406 (10jhathaway) [23:16:46] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-openldap-exporter.service on seaborgium:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:17:10] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Problem re-imaging hosts on row-wide vlan on EVPN switches - https://phabricator.wikimedia.org/T365204#9814633 (10Jhancock.wm) @Papaul still getting an error on provisioning of the new server. 100.0% (1/1) success ratio (>= 100.... [23:39:16] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Problem re-imaging hosts on row-wide vlan on EVPN switches - https://phabricator.wikimedia.org/T365204#9814670 (10Papaul) @Jhancock.wm it looks like we have another sretest2002 setup in b7 the switch has that configuration already...