[00:06:25] FIRING: SystemdUnitFailed: sync-puppet-volatile.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:16:25] RESOLVED: SystemdUnitFailed: sync-puppet-volatile.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:18:06] 10netops, 06Infrastructure-Foundations: mr1-eqiad: move from OSPF to BGP - https://phabricator.wikimedia.org/T421238#11762957 (10ayounsi) Overall that LGTM, you need to add BGP to `security_zones -> production -> services: ['ssh', 'ping', 'traceroute', 'snmp', 'ospf', 'ospf3', 'bgp']` [08:31:29] 10netops, 06DC-Ops, 06Infrastructure-Foundations: Standardize management routers interfaces - https://phabricator.wikimedia.org/T421674 (10ayounsi) 03NEW p:05Triage→03Low [12:14:16] 10SRE-tools, 06Infrastructure-Foundations, 06ServiceOps new, 06SRE, and 2 others: Support locking cookbooks run except for switchover related cookbooks - https://phabricator.wikimedia.org/T330997#11764012 (10MLechvien-WMF) @Blake did you use that in recent switchover? We didn't account for capacity in Q4 s... [12:18:05] 10SRE-tools, 06Infrastructure-Foundations, 06ServiceOps new, 06SRE, and 2 others: Support locking cookbooks run except for switchover related cookbooks - https://phabricator.wikimedia.org/T330997#11764022 (10Blake) @MLechvien-WMF This was not completed in time for the switchover. I'm in the middle of a sig... [12:23:02] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 3 others: Replace Spamassassin with Rspam for VRTS on Postfix - https://phabricator.wikimedia.org/T402260#11764036 (10ABran-WMF) The new training flow keeps the existing VRTS export unchanged: `vrts.TicketExport2Mbox.pl` still produ... [12:44:00] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops-radar: Add --min-uptime to cookbooks - https://phabricator.wikimedia.org/T419967#11764147 (10Ajuanca) What's the simplest cookbook I can run to check the changes? I have tried with `sre.maps.roll-restart-reboot` but I get missing `/etc/cumin/config.yaml` [14:55:15] 10SRE-tools, 06Infrastructure-Foundations, 06ServiceOps new, 06SRE, and 2 others: Support locking cookbooks run except for switchover related cookbooks - https://phabricator.wikimedia.org/T330997#11765780 (10Blake) Moving this to the backlog for now. [14:55:45] 10netops, 06Infrastructure-Foundations: mr1-eqiad: move from OSPF to BGP - https://phabricator.wikimedia.org/T421238#11765782 (10cmooney) p:05Triage→03Medium [15:36:28] FIRING: KeyholderUnarmed: 1 unarmed Keyholder key(s) on netmon2002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [15:46:28] FIRING: [2x] KeyholderUnarmed: 1 unarmed Keyholder key(s) on netmon1003:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [16:01:28] RESOLVED: [2x] KeyholderUnarmed: 1 unarmed Keyholder key(s) on netmon1003:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [16:40:56] elukey: Sorry, I had missed your reply. I'm talking about icinga - the context manager does indeed schedule a second downtime, but both the manual downtime and the contextual one are removed by the reboot-single cookbook [17:14:46] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: pfw-eqiad NAT for frmx1002.wikimedia.org - https://phabricator.wikimedia.org/T421750 (10Jgreen) 03NEW