[09:36:26] 10netops, 10SRE, 10Traffic: Wikimedias eqsin datacenter has network connectivity issues (?) - https://phabricator.wikimedia.org/T284986 (10jcrespo) [09:36:58] 10netops, 10SRE, 10Traffic, 10Wikimedia-Incident: Wikimedias eqsin datacenter has network connectivity issues (?) - https://phabricator.wikimedia.org/T284986 (10Majavah) [09:38:40] 10netops, 10SRE, 10Traffic, 10Wikimedia-Incident: Wikimedias eqsin datacenter has network connectivity issues (?) - https://phabricator.wikimedia.org/T284986 (10Peachey88) [09:54:18] 10netops, 10SRE, 10Traffic, 10Wikimedia-Incident: Wikimedia's eqsin datacenter (Asia Pacific) had network connectivity issues - https://phabricator.wikimedia.org/T284986 (10jcrespo) [09:55:13] 10netops, 10SRE, 10Traffic, 10Wikimedia-Incident: Wikimedia's eqsin datacenter (Asia Pacific) had network connectivity issues - https://phabricator.wikimedia.org/T284986 (10jbond) Once telia issues have been resolved we need to repool ESQIN. @ayounsi can you confirm when we are good to repool [11:05:22] 10netops, 10SRE: Cleanup confed BGP peerings and policies - https://phabricator.wikimedia.org/T167841 (10ayounsi) [11:05:30] 10netops, 10SRE, 10Sustainability (Incident Followup): ospf link-protection - https://phabricator.wikimedia.org/T167306 (10ayounsi) 05Open→03Resolved Closed! After 4 years and 1 week. [17:31:08] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10serviceops: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10Dzahn) I can confirm since a while these have been happening. The pattern is always: - only mgmt - only codfw - ran... [18:06:04] 10netops, 10SRE, 10Traffic, 10Wikimedia-Incident: Wikimedia's eqsin datacenter (Asia Pacific) had network connectivity issues - https://phabricator.wikimedia.org/T284986 (10cmooney) I made a typo in the commit msg so this didn't link: https://gerrit.wikimedia.org/r/c/operations/dns/+/699957 [18:42:34] 10netops, 10SRE, 10Traffic, 10Wikimedia-Incident: Wikimedia's eqsin datacenter (Asia Pacific) had network connectivity issues - https://phabricator.wikimedia.org/T284986 (10cmooney) Ok @volans was kind enough to explain how I could just revert the original change instead: https://gerrit.wikimedia.org/r/c/... [19:12:46] 10netops, 10SRE, 10Traffic, 10Wikimedia-Incident: Wikimedia's eqsin datacenter (Asia Pacific) had network connectivity issues - https://phabricator.wikimedia.org/T284986 (10cmooney) CR merged and DNS updated. All looks good, dns servers are returning the eqsin IPs again and traffic is back to normal level...