[02:04:26] RESOLVED: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:09:26] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:48:58] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: pynetbox incompatibility with Netbox >= 4.0.6 - https://phabricator.wikimedia.org/T371890#10076351 (10ops-monitoring-bot) Deployed netbox to netbox-dev2003.codfw.wmnet with reason: Update Netbox-next wheels - ayounsi@cumin1002 - T371890 [07:05:44] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:18:31] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: pynetbox incompatibility with Netbox >= 4.0.6 - https://phabricator.wikimedia.org/T371890#10076365 (10ops-monitoring-bot) Deployed netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Update Netbox wheels - ayounsi@cumin1002 - T371890 [10:00:28] 10netops, 06Infrastructure-Foundations, 06serviceops, 06Traffic: weighted maglev viability for low-traffic services - https://phabricator.wikimedia.org/T368545#10076853 (10Vgutierrez) A quick test using IPVS maglev implementation with mh-port flag enabled (to include the source port as part of the load bal... [11:22:37] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878 (10Clement_Goubert) 03NEW [11:25:27] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10077064 (10Clement_Goubert) p:05Triage→03High [11:41:52] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10077137 (10Clement_Goubert) From what I can gather the automation is there with the `--move-vlan` option to the reimage cookbook, I th... [12:20:11] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10077212 (10ayounsi) > I need to check that the physical cabling changes are ok before we start Physical cabling is on the new switches... [14:31:24] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad: cr1-eqiad: disk failure - https://phabricator.wikimedia.org/T372781#10077810 (10VRiley-WMF) @ayounsi I've checked the device and there doesn't seem to be any failure notifications (Physically anyway). Would it be possible to open up a RMA or Su... [14:43:25] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad: cr1-eqiad: disk failure - https://phabricator.wikimedia.org/T372781#10077862 (10cmooney) >>! In T372781#10077810, @VRiley-WMF wrote: > @ayounsi I've checked the device and there doesn't seem to be any failure notifications (Physically anyway).... [14:44:49] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad: cr1-eqiad: disk failure - https://phabricator.wikimedia.org/T372781#10077866 (10VRiley-WMF) Sounds like a plan. Thank you! I will be at the ready. [15:02:33] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: lvs2012: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370862#10077948 (10cmooney) 05Open→03Resolved >>! In T370862#10035781, @Papaul wrote: > @cmooney links removed.... [15:14:39] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10078004 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw2291 to... [15:15:17] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10078010 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host w... [15:17:14] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10078014 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikik... [15:28:38] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10078072 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host w... [15:48:05] Let me know if you have ideas of VMs in codfw that can be moved (or have new nodes) to the routed ganeti setup. Ideally nothing critical, it can be test VMs or redundant services for example. https://phabricator.wikimedia.org/T372909 [15:50:11] XioNoX: one of the idp-tests perhaps [15:51:03] good idea! I'll chat with slyngs :) [15:51:24] next, move one of the docker registries ;) [15:52:21] When is moritz back? Maybe we can migrate it all by then :) [15:56:35] I heard in 2 weeks [15:57:27] 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: Netbox ProvisionServer script fails vlan verification - https://phabricator.wikimedia.org/T372654#10078233 (10cmooney) The above patch will prevent this causing an issue when we follow the normal workflow - selecting a vlan 'type' (public/privat... [16:00:23] 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: Netbox ProvisionServer script fails vlan verification - https://phabricator.wikimedia.org/T372654#10078237 (10cmooney) 05Open→03Resolved a:03cmooney [16:16:06] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10078336 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikik... [16:25:27] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: lvs2014: move uplink to lsw1-d2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370897#10078401 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=24f68f00-c864-474e-a3e6-c044aab86afa) set... [16:25:39] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: lvs2014: move uplink to lsw1-d2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370897#10078403 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=612388e5-b8df-408f-81be-6f237cee6e7c) set... [16:54:16] XioNoX I understand you reached out about wdqs2024 being in FAILED in netbox...I'm about to fix using https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Active_-%3E_Failed and reimage. LMK you have additional context [16:56:22] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: lvs2014: move uplink to lsw1-d2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370897#10078582 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host lvs2014.... [18:04:02] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: lvs2014: move uplink to lsw1-d2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370897#10078855 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1002 for host lvs2014.codf... [18:42:38] 10netops, 06Infrastructure-Foundations, 06SRE: PuppetDB import failing for lvs2014 - https://phabricator.wikimedia.org/T372931 (10cmooney) 03NEW [18:46:04] 10netops, 06Infrastructure-Foundations, 06SRE: PuppetDB import failing for lvs2014 - https://phabricator.wikimedia.org/T372931#10078999 (10ssingh) As another data point, we most certainly have not reimaged any LVS host //after// the Netbox migration was finished. So yeah, it might be related to that. [21:24:26] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:19:26] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed