[09:25:44] 10netops, 10Infrastructure-Foundations, 10SRE: Link failure between mr1-eqiad and asw2-a8-eqiad Aug 13th 2021 - https://phabricator.wikimedia.org/T288834 (10ayounsi) Looks like the whole of FPC8 failed, I opened JTAC case 2021-0816-0128. Good thing you saved the logs as they now fully rolled over. [09:35:18] 10netops, 10Infrastructure-Foundations, 10SRE: Link failure between mr1-eqiad and asw2-a8-eqiad Aug 13th 2021 - https://phabricator.wikimedia.org/T288834 (10cmooney) Thanks @ayounsi yeah I was looking there that seems to be the case. Logs from a host connected also seem to confirm the entire switch died: `... [09:41:01] 10netops, 10Infrastructure-Foundations, 10SRE: Switch failure: asw2-a8-eqiad Aug 13th 2021 - https://phabricator.wikimedia.org/T288834 (10cmooney) [09:42:33] 10netops, 10Infrastructure-Foundations, 10SRE: Switch failure: asw2-a8-eqiad Aug 13th 2021 - https://phabricator.wikimedia.org/T288834 (10cmooney) [09:50:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10Datacenter-Switchover, 10User-fgiunchedi: Record traffic flows in and out of eqiad during switchover - https://phabricator.wikimedia.org/T286038 (10ayounsi) @fgiunchedi anything left to do for netops or is it ok to close the task? [10:00:56] 10netops, 10Infrastructure-Foundations, 10SRE: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10ayounsi) Some thoughts: * We need to find the good balance between config complexity and low latency for users, otherwise it's going to be a cat and mouse game, fixing special... [10:19:39] 10netops, 10Infrastructure-Foundations, 10SRE: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10cmooney) Agreed we need to balance complexity and usefulness. A few points: - I think it's too complex to consider doing this for peers at IXPs. - I would only anticipate... [12:59:24] 10netops, 10Infrastructure-Foundations, 10SRE: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10ssingh) (Thanks Cathal for filing this task!) >>! In T288843#7284411, @ayounsi wrote: > Some thoughts: > [...] > @ssingh what's the timeline for Wikidough? So we know how to p... [16:45:06] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Switch failure: asw2-a8-eqiad Aug 13th 2021 - https://phabricator.wikimedia.org/T288834 (10ayounsi) p:05Triage→03High a:03Cmjohnson JTAC pointed out that the switch failure matches with a VCP (to the backup spine) going down: ` ayounsi@asw2... [17:13:10] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Switch failure: asw2-a8-eqiad Aug 13th 2021 - https://phabricator.wikimedia.org/T288834 (10ayounsi) p:05High→03Low a:05Cmjohnson→03ayounsi Keeping it a bit for monitoring, will close if no more interfaces errors. [17:37:42] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): Join ARIN waiting list to request additional IPv4 resources. - https://phabricator.wikimedia.org/T288342 (10Andrew) @aborrero is on holiday for a bit -- I'd like to hear from him as well but here's my current thought: With our...