[00:02:13] (DiskSpace) resolved: Disk space puppetmaster1001:9100:/ 5.059% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=puppetmaster1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [02:03:39] (SystemdUnitFailed) resolved: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:13:40] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:03:40] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:03:40] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:03:40] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:00:09] 10netops, 10Infrastructure-Foundations, 10SRE: Put Dell SONiC switches in production - https://phabricator.wikimedia.org/T335028 (10ayounsi) [09:00:17] 10netops, 10Infrastructure-Foundations, 10SRE: Bring Juniper switches in eqiad racks E5-7 and F5-7 online and ready for servers - https://phabricator.wikimedia.org/T334230 (10ayounsi) [09:00:25] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Cabling for Eqiad racks E5-8 and F5-8 - https://phabricator.wikimedia.org/T334231 (10ayounsi) 05Open→03Resolved Thanks! I'll open a different task if needed to debug any remaining issue, but looks like it's a Dell bug so far. [09:03:40] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:18:40] (SystemdUnitFailed) firing: (3) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:34:05] 10netops, 10Infrastructure-Foundations, 10SRE: Bring Juniper switches in eqiad racks E5-7 and F5-7 online and ready for servers - https://phabricator.wikimedia.org/T334230 (10cmooney) 05Open→03Resolved All devices online and ready for servers in their racks. [09:38:29] 10netops, 10Infrastructure-Foundations, 10SRE: CRs ECMP traffic to LVS VIPs despite higher MED on backup route - https://phabricator.wikimedia.org/T348446 (10cmooney) 05Open→03Resolved a:03cmooney Closing this task. As mentioned internal traffic may not always go to the activve LVS until we complete T... [09:48:40] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:20:33] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10cmooney) Commands for change later on: ` wmcs-openstack port unset ca4cb8c7-bfb8-440b-8e41-74bb8e834717 --fixed-ip subnet=clo... [10:31:24] 10netops, 10Infrastructure-Foundations, 10SRE: Automate L3 Switch to Core Router BGP peerings (and remove OSPF on drmrs switches) - https://phabricator.wikimedia.org/T349125 (10cmooney) 05Open→03Resolved [10:31:32] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Consolidate Automation Templates for DC Switches - https://phabricator.wikimedia.org/T312635 (10cmooney) [10:32:10] 10netops, 10Infrastructure-Foundations, 10SRE: Add non-EVPN L3 Switch routing policy definitions to Homer - https://phabricator.wikimedia.org/T344601 (10cmooney) 05Open→03Resolved [10:32:17] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Consolidate Automation Templates for DC Switches - https://phabricator.wikimedia.org/T312635 (10cmooney) [10:37:02] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Consolidate Automation Templates for DC Switches - https://phabricator.wikimedia.org/T312635 (10cmooney) [10:47:20] 10netops, 10Infrastructure-Foundations, 10SRE: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) >>! In T344547#9108360, @ayounsi wrote: > ` > set policy-options policy-statement Switch_out term from_bgp from as-path core_and_local_LVS > set policy-... [13:33:04] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10User-dcaro: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10dcaro) [13:33:56] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10User-dcaro: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10dcaro) 05In progress→03Resolved This went as expected, and all the changes have been applied :) Thanks a lot @cmooney ! [13:48:40] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:58:21] (ProbeDown) firing: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:03:21] (ProbeDown) firing: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:18:21] (ProbeDown) resolved: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:25:21] (ProbeDown) firing: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:05:21] (ProbeDown) resolved: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:48:40] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:48:25] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: [Maintenance] Netflow/pmacct: use forwardingStatus - https://phabricator.wikimedia.org/T331707 (10Ahoelzl) [18:52:24] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: [Maintenance] Netflow/pmacct: use forwardingStatus - https://phabricator.wikimedia.org/T331707 (10Ahoelzl) 05Open→03Resolved [18:52:39] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: [Maintenance] Netflow/pmacct: use forwardingStatus - https://phabricator.wikimedia.org/T331707 (10Ahoelzl) [19:33:26] 10netops, 10Infrastructure-Foundations, 10SRE: Put Dell SONiC switches in production - https://phabricator.wikimedia.org/T335028 (10VRiley-WMF) [20:43:37] Hey, is there any task regarding the .deb CI packaging in gitlab that I can follow? Is this the right team to be asking? :) [21:48:40] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:33:13] (DiskSpace) firing: Disk space puppetmaster1001:9100:/ 5.937% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=puppetmaster1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace