[07:09:19] 10netops, 06Infrastructure-Foundations, 10Observability-Logging: ~5k/logs/sec from netdev - https://phabricator.wikimedia.org/T412143#11700757 (10ayounsi) > Resolved-In > junos:23.4R1 junos:23.4R2 junos:24.1R1 [08:21:16] FIRING: ProbeDown: Service idp1005:443 has failed probes (http_idp_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/CAS-SSO#Alerting - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:26:15] RESOLVED: ProbeDown: Service idp1005:443 has failed probes (http_idp_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/CAS-SSO#Alerting - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:51:45] 10Packaging, 10Thumbor, 10Wikimedia-SVG-rendering: Update librsvg to version ≥ 2.54 - https://phabricator.wikimedia.org/T381674#11701151 (10TheDJ) [09:33:25] FIRING: SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:38:25] RESOLVED: SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:39:16] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11701269 (10ayounsi) [10:05:41] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11701359 (10tappof) [10:54:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11701471 (10ayounsi) I think the factory reset helped. I then temporarily copied the TLS config from asw1-22, and ran the TLS cookbook and we're all good. So now... [14:15:29] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11702362 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=cf71bad1-aeb4-4596-b577-d88e4e171aab) set by ayounsi@cumin1003 for 0:30:00 on 24 host(s) and their servi... [14:21:59] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11702397 (10ayounsi) BGP bounce done by running those 2 commands "at the same time": ` tools network-instance default protocols bgp neighbor 10.64.128.17 reset-peer tools network-in... [14:27:01] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11702422 (10ayounsi) 05Open→03Resolved All servers have been repooled. [14:39:11] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr3-eqsin:9804) - https://phabricator.wikimedia.org/T419854 (10tappof) 03NEW [14:39:37] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr3-eqsin:9804) - https://phabricator.wikimedia.org/T419855 (10tappof) 03NEW [14:39:53] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr1-esams:9804) - https://phabricator.wikimedia.org/T419856 (10tappof) 03NEW [14:40:23] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr1-esams:9804) - https://phabricator.wikimedia.org/T419857 (10tappof) 03NEW [14:40:36] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr3-eqsin:9804) - https://phabricator.wikimedia.org/T419858 (10tappof) 03NEW [14:40:54] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr3-eqsin:9804) - https://phabricator.wikimedia.org/T419859 (10tappof) 03NEW [14:45:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:50:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:03:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:08:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:07:46] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11704748 (10Papaul) Last update from Nokia today ` The following was added as a limitation under release notes: Management Release:25.10.2 Section:...