[00:16:35] (PurgedHighEventLag) resolved: (2) High event process lag with purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [00:16:35] (PurgedHighBacklogQueue) resolved: (2) Large backlog queue for purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [00:50:45] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [01:23:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers to make space for new switches in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Papaul) [01:25:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers to make space for new switches in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Papaul) [01:32:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers to make space for new switches in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Papaul) [07:39:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10conftool, and 2 others: Scap deploy failed to depool codfw servers - https://phabricator.wikimedia.org/T327041 (10Joe) 05Open→03Resolved This is now fully resolved. [08:51:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10fundraising-tech-ops: Upgrade fasw to Junos 21 - https://phabricator.wikimedia.org/T316542 (10ayounsi) Some notes from {T316532} Make sure console access works. Before the upgrade, remove this configuration stanza, otherwise the `request system software add... [09:38:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10Sustainability (Incident Followup): Cr1-eqiad comms problem when moving to 40G row D handoff - https://phabricator.wikimedia.org/T320566 (10ayounsi) Seeing what happened with codfw row B, it's safe to assume that only a reboot of the faulty switch member wil... [10:35:34] 10netops, 10Infrastructure-Foundations: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [10:53:38] (LVSHighRX) firing: Excessive RX traffic on lvs3005:9100 (ens3f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [10:58:38] (LVSHighRX) resolved: Excessive RX traffic on lvs3005:9100 (ens3f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [18:41:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: neutron: cloudnet nodes use VRRP over VXLAN to instrument HA and they require to be on the same subnet - https://phabricator.wikimedia.org/T319539 (10fnegri) [18:46:38] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10fnegri) [18:51:35] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Join ARIN waiting list to request additional IPv4 resources. - https://phabricator.wikimedia.org/T288342 (10fnegri) [18:59:56] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team, 10Epic: CloudVPS: network architecture - https://phabricator.wikimedia.org/T209460 (10fnegri) [19:28:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: ceph: test and decide 1 network interface setup - https://phabricator.wikimedia.org/T325531 (10fnegri) [19:29:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team, 10Patch-For-Review: Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10fnegri) [19:33:11] 10netops, 10DNS, 10Infrastructure-Foundations, 10SRE, and 2 others: Cloud: define relationship between wikimediacloud.org domain, CIDR prefixes and netbox automation - https://phabricator.wikimedia.org/T266331 (10fnegri) [19:34:17] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: CloudVPS: IPv6 early PoC - https://phabricator.wikimedia.org/T245495 (10fnegri) [19:42:18] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10fnegri) [19:44:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10IPv6, and 2 others: Fix IPv6 autoconf issues once and for all, across the fleet. - https://phabricator.wikimedia.org/T102099 (10BBlack) Bump - these issues continue to affect us sometimes. There seem to be some cases where Juniper can mis-route an RA to an... [19:44:56] 10netops, 10Data-Services, 10Infrastructure-Foundations, 10Wikidata, and 5 others: Do not rate limit dumps from internal network - https://phabricator.wikimedia.org/T222349 (10fnegri) [19:46:52] 10Acme-chief, 10cloud-services-team, 10IPv6, 10Patch-For-Review: tools-acme-chief-01 is attempting to validate DNS challenge against cloud authdns IPv6 addresses - https://phabricator.wikimedia.org/T245937 (10fnegri) [19:48:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 (10fnegri) [19:58:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10IPv6, and 2 others: Fix IPv6 autoconf issues once and for all, across the fleet. - https://phabricator.wikimedia.org/T102099 (10BBlack) I fixed all these cases noted above for now. Note that in the lvs1017 case, this could've potentially caused a public ser...