[00:43:42] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-web_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:45:06] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE: Add network devices fingerprints to known_hosts - https://phabricator.wikimedia.org/T327643 (10jbond) > Netbox would be better. +1 this would also allow use to have them in the netbox-hiera pipeline which in turn makes it easier to add them all to... [09:44:37] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE: DHCP traffic to install server is missing - https://phabricator.wikimedia.org/T337345 (10cmooney) >>! In T337345#8878207, @Jclark-ctr wrote: > @ayounsi the provisioning script is still failing in row e/f. dbproxy1026 dbproxy1027 I tested there... [14:21:42] (SystemdUnitFailed) firing: (3) httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:31:42] (SystemdUnitFailed) firing: (3) httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:36:42] (SystemdUnitFailed) resolved: (3) httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:05:00] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=03f7b2ab-bdea-4c56-ac41-3ec30004db4a) set by cmooney@cumin1001 for 0:30:00 on 2 host(s... [15:21:08] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=cf76e0ba-8648-48a0-beed-fe7b60f79656) set by cmooney@cumin1001 for 0:30:00 on 2 host(s... [15:34:03] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=8f44dd48-0cac-4bfd-907a-512dfa686d40) set by cmooney@cumin1001 for 0:30:00 on 2 host(s... [15:57:16] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=c43be552-7ced-4f58-99c1-a10b5984bf3a) set by cmooney@cumin1001 for 0:30:00 on 2 host(s... [16:11:36] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=37545969-c51e-450d-9ef0-5fadfd151520) set by cmooney@cumin1001 for 0:30:00 on 3 host(s... [16:48:51] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Create Quality of Service design for WMF internal networks - https://phabricator.wikimedia.org/T316358 (10cmooney) [16:49:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Expose sub-rated circuit speeds to Homer templates - https://phabricator.wikimedia.org/T328313 (10cmooney) 05Open→03Resolved Merged and shapers set on codfw to eqsin link. [17:06:16] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10cmooney) We migrated a bunch of network <-> network links today without issue (crossed them out in above table). Didn't touch the LVS's aft... [17:41:40] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10cmooney) Step 2 - Move CR Uplinks has now been completed. We are also 50% of the way through steps 3 and 4. Will continue with... [18:17:01] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Netbox: use Custom Model Validation - https://phabricator.wikimedia.org/T310590 (10cmooney) ^^ this patch is to deal with an issue I hit modifying cable (changing status from planned to connected). ` >>> cable = nb.dcim.cables.get(label="2023020702"... [20:43:42] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:53:42] (SystemdUnitFailed) firing: (2) httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:18:42] (SystemdUnitFailed) firing: (2) httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:43:42] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:15:13] (DiskSpace) firing: Disk space krb1001:9100:/ 5.969% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=krb1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace