[03:07:44] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [03:34:34] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [06:54:15] 10netops, 10Infrastructure-Foundations, 10SRE: ripe-atlas-codfw is down - https://phabricator.wikimedia.org/T267714 (10elukey) Hello folks! Not sure if already scheduled but it seems that the current icinga checks for the codfw ripe atlas are getting a 410 gone, do we need to update the `ripeatlas_measuremen... [06:57:43] jbond: thank you for the review/merge of systemd timers yesterday. They work all fine with some `splay` https://phabricator.wikimedia.org/T292729#7443288 ;) [08:37:13] 10netbox, 10Infrastructure-Foundations: Agree how to document intra-DC patch panels in Netbox - https://phabricator.wikimedia.org/T293221 (10Volans) >>! In T293221#7441958, @cmooney wrote: > That said if doing it the "proper" Netbox way messes up automation or reports then maybe that is a reason to hold off.... [12:18:35] so I've been noticing on procurement requests [12:18:46] the forms say "OS Distro: Buster (default unless otherwise specified)" [12:19:01] is that known to you all? I fear this may be just inertia and not what the intention really is [12:19:10] moritzm: ^ perhaps :) [12:21:01] paravoid: although used only for VMs right now and to be decommissioned, the default in the DHCP setting set in puppet still points pxelinux.pathprefix to buster by default [12:21:17] so my guess is that it wasn't yet officially moved to bullseye [12:21:30] (the default distro) [12:22:12] but it's a good point, I suppose these are generated from a Phab template, will talk to Rob to get it updated [12:22:49] if only to reduce confusion during racking [12:23:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: ripe-atlas-codfw is down - https://phabricator.wikimedia.org/T267714 (10ayounsi) >>! In T267714#7443286, @elukey wrote: > Hello folks! Not sure if already scheduled but it seems that the current icinga checks for the codfw ripe atlas are ge... [12:26:09] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: ripe-atlas-codfw is down - https://phabricator.wikimedia.org/T267714 (10cmooney) 05In progress→03Resolved Cool, thanks @ayounsi. Good insight into how those alerts are configured. I'll know for the next time to update them too :) [14:01:57] 10netops, 10Infrastructure-Foundations, 10SRE: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10cmooney) IRC update from Brandon. Traffic are checking if option 2B is viable with management. > Brandon Black > topranks: question_mark is going to talk with f... [15:09:18] XioNoX: topranks: did you see this "IPv6 Connectivity issue Telia-Level3" mail to noc@ [15:09:28] seems like the no-export stuff might have had some unexpected fallout? [15:10:18] cdanis: interesting indeed [15:11:26] might be related to https://phabricator.wikimedia.org/T288843 [15:11:33] but shouldn't impact v6 [15:13:46] hmm yeah and not for existing prefixes [15:13:48] interesting [15:16:30] just confirmed that v6 in general and that specific prefix is not impacted by the Anycast tuning [15:38:52] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10aborrero) a:05aborrero→03ayounsi [15:38:53] emailed Telia's NOC [15:39:26] I'm wondering if we shouldn't drop the BGP session to Telia [15:50:04] catching up... [15:50:18] You mean bouncing the session to Telia to see if something changes? [15:51:05] clear out probably is worth a shot alright. I'm not sure dropping the session completely would achieve much. [15:52:15] topranks: it would force Level3 to learn the prefix from somewhere else [15:52:39] (another of our transits) [15:52:58] and thus restoring connectivity, but it would also make troubleshooting more difficult [15:53:23] fair point yeah. We could also maybe just withdraw that one prefix out to Telia for a similar result. [15:54:58] yep, a bit more tricky as it means adapting filters [15:55:06] indeed. [15:55:23] as there is no signs of larger issue I suggest we leave it as it for now [15:55:38] I'd maybe try a clear outbound on session to Telia first, to see if they add the no-export again upon receipt of a new UPDATE for it. (checking in Level 3 looking glass for it afterwards). [15:55:40] hopefully telia will get to the bottom of it quickly [15:56:07] topranks: the no-export is not in their routers [15:56:47] it's in the Level 3 output though [15:57:07] so whoever is setting it, Telia outbound or Level 3 inbound, they may not set it again if they re-learn the route. [15:57:08] there is only 1 unknown bgp community 1299:1000 [15:57:13] Chances are they *will* [15:58:42] I bounced the session, let's see [15:59:04] ah sorry, yeah looking at Level 3 looking glass I don't see the "no export", it's in the email from the end user, supposedly from Level 3 looking glass so I took that at face value. [15:59:07] https://lg.twelve99.net/?type=bgp&router=mei-b3&address=2620:0:862::/48 [15:59:46] I didn't double check what the user said though [16:00:55] nevermind, it's there https://lookingglass.centurylink.com/ " Communities: 3356:2 3356:22 3356:86 3356:502 3356:601 3356:666 3356:901 3356:2090 3356:11281 no-export" [16:01:02] Yeah [16:01:12] Only in Marseille though. [16:01:17] Not in Paris or Dublin. [16:01:46] did they screw something up when configuring our Marseille port? [16:04:34] Ah yeah good thinking, might explain it, some typo on a policy or something [16:05:30] now we know the scope is very small [16:05:53] one pop for single homed providers [16:05:59] and v6 only [16:06:30] seems like it yeah [16:13:28] XioNoX: topranks: also this shouldn't be user-affecting I think -- happy eyeballs should kick in [16:13:44] yep [16:15:05] yeah good old v4 internet to the rescue :) [16:15:26] NELs don't show any more than the usual background rate of `tcp.address_unreachable` from France or from elsewhere [16:30:30] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [16:47:03] 10Puppet, 10Infrastructure-Foundations, 10SRE: package_builder puppet tests failing - https://phabricator.wikimedia.org/T293912 (10Legoktm) [17:00:13] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [17:36:48] 10Puppet, 10Infrastructure-Foundations, 10SRE: package_builder puppet tests failing - https://phabricator.wikimedia.org/T293912 (10Dzahn) a:03Dzahn I'll take a look [19:57:33] 10SRE-tools, 10Observability-Logging, 10Spicerack: Create a cookbook for managing Logstash cluster restarts - https://phabricator.wikimedia.org/T293929 (10colewhite) [21:09:40] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) db2078.mgmt mw2253.mgmt [22:02:18] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [23:11:34] 10netops, 10Infrastructure-Foundations, 10SRE: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10RobH)