[04:34:26] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:29:26] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:32:04] inflatador: it was alerting as it was in status "active" but not in Puppet. I set it to "failed" in Netbox to clear the alert. I also couldn't find a relevant task when I looked for why it was alerting. [09:22:01] 10CAS-SSO, 06Infrastructure-Foundations, 06SRE: Update CAS to 7.0 - https://phabricator.wikimedia.org/T367487#10080378 (10SLyngshede-WMF) 05Open→03Resolved [09:24:10] 10CAS-SSO, 06Infrastructure-Foundations: Decommission CAS 6 hosts - https://phabricator.wikimedia.org/T372997 (10SLyngshede-WMF) 03NEW [09:24:13] 10CAS-SSO, 06Infrastructure-Foundations: Decommission CAS 6 hosts - https://phabricator.wikimedia.org/T372997#10080392 (10SLyngshede-WMF) p:05Triage→03Low [10:55:38] 10netops, 06Infrastructure-Foundations, 06SRE: Packet loss reflected in NELs for traffic to Reliance Jio Infocomm Ltd over BBIX Singapore - https://phabricator.wikimedia.org/T373015 (10cmooney) 03NEW p:05Triage→03Low [11:23:34] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Packet loss reflected in NELs for traffic to Reliance Jio Infocomm Ltd over BBIX Singapore - https://phabricator.wikimedia.org/T373015#10080912 (10cmooney) Path is now avoided again and packet loss is no longer observable: ` cmooney@bast500... [12:20:23] 10netops, 06Infrastructure-Foundations, 06serviceops, 06Traffic: weighted maglev viability for low-traffic services - https://phabricator.wikimedia.org/T368545#10081016 (10Vgutierrez) 05Open→03Resolved [12:54:37] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: pynetbox incompatibility with Netbox >= 4.0.6 - https://phabricator.wikimedia.org/T371890#10081172 (10ayounsi) p:05High→03Medium I build a pynetbox 7.4.0 using the new pipeline : https://gitlab.wikimedia.org/repos/sre/pynetbox https://gitlab.wik... [12:56:34] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: New hosts with "Netbox status: unknown" - https://phabricator.wikimedia.org/T371653#10081175 (10ayounsi) 05Open→03Resolved Updated pynetbox package has been pushed to cumin hosts to unblock the situation. [13:31:34] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: lvs2014: move uplink to lsw1-d2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370897#10081293 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=f69725b2-8f49-49fe-8766-ce7bb9ffa253) set... [13:33:55] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: lvs2014: move uplink to lsw1-d2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370897#10081302 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=b60d0e72-74ce-4dee-9bed-2acca82f8655) set... [13:55:31] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts - https://phabricator.wikimedia.org/T368513#10081413 (10ayounsi) Confirmed that cr1-eqiad stopped generating those logs for 10.64.0.82 (prometheus10... [14:03:18] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts - https://phabricator.wikimedia.org/T368513#10081484 (10ayounsi) 05Open→03Resolved a:03ayounsi All done ! [14:32:50] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: PuppetDB import failing for lvs2014 - https://phabricator.wikimedia.org/T372931#10081634 (10cmooney) 05Open→03Resolved a:03cmooney Working ok following update to the puppetdb import script. The issue was not actually caused by Ne... [14:47:27] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: lvs2014: move uplink to lsw1-d2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370897#10081718 (10cmooney) a:05cmooney→03None Work completed, no issues to report (although I had to downgrade the NIC fi... [14:55:28] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10081765 (10Clement_Goubert) [15:35:34] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10081944 (10Clement_Goubert) [20:49:26] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:44:26] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed