[08:41:02] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10MoritzMuehlenhoff) [08:42:03] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10MoritzMuehlenhoff) [09:06:08] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10jcrespo) [10:42:36] XioNoX, topranks netbox interfaces list is an accurate source of truth to determine which switches in codfw have 10G ports available? [10:44:41] according to that, we don't have SFP+ ports available on row A besides on switches A2 and A7, that could explain the current state of LVS there :) [10:45:20] and same happens to row D, rows B and C have better availability [10:48:21] I was about to say "yes" to your question. [10:49:18] I'm not familiar with that hardware.. could we add some SFP+ interfaces to the existing switches? [10:49:24] But I believe Netbox is being a little bit inaccurate here. There are empty SFP+ ports on asw-a4-codfw. [10:49:44] All switches are fixed-format, we cannot change the number/types of ports on them. [10:49:49] ack [10:49:54] https://netbox.wikimedia.org/dcim/interfaces/?q=&site=codfw&type=10gbase-x-sfpp&enabled=False&mgmt_only=&mac_address= [10:49:57] I'm using that query [10:51:07] I'll need to discuss with XioNox how we deal with this. [10:51:20] To an extent there is a quirk in how JunOS represents the interfaces. [10:51:30] meanwhile... should I ssh the particular switches? [10:51:33] my understanding was that the switches LVSes are on had somehow better connectivity to other switches to that row, but without netbox access or particular network design knowledge I might be totally wrong here [10:51:43] Switch 4 is a 48xSFP+ device. However some of those slots have 1000Base-T copper SFPs in them. [10:51:52] Which causes JunOS to name them "ge-4/0/x". [10:52:05] majavah: yeah.. LVS require 10G SFP+ ports [10:52:07] Those on the same switch with 10G SFP+ modules it's naming "xe-4/0/x" [10:52:43] We appear to have the unused ports on this device, in Netbox, named using the "ge-" variant. [10:53:13] The type there is 100% incorrect - so we will change that to SFP+, but there is a bit of work to be done I think. [10:53:39] To your question I'm not sure how to answer - other than right now Netbox is obscuring some facts. [10:53:59] Is there a particular thing you are looking at right now? New LVS? Or row you want to know about capacity? [10:54:16] I'm working on T286881 [10:54:16] T286881: Audit eqiad & codfw LVS network links - https://phabricator.wikimedia.org/T286881 [10:54:34] so right now codfw LVS are using switches 2 and 7 on each row [10:54:45] considering that we have 4 LVS that's a little bit problematic in terms of HA [10:54:56] I'd like to add at least another switch to the equation per row [10:55:03] Ok. FYI switch 2 and 7 are our "spines". [10:55:26] i.e. they are the ones which are connected directly to the CRs, which is where the LVS BGP peer to and where they send/receive traffic. [10:55:31] So there is a certain logic to that. [10:55:57] right [10:56:06] In Row A there are further 10G ports on asw-a4-codfw, all the free ports showing on that are incorrectly labelled SFP/1G now. We will correct that. [10:56:17] I can verify the situation for other rows if you want? [10:56:55] but that's the same situation on eqiad (2 and 7 being the spines) and we're succesfully using switch 4 on each row as well [10:57:43] yes switch 4 is redundantly connected to switches 2 and 7, so there shouldn't really be any reason not to use it. [10:57:48] topranks: so.. from my netbox query I can see some availability on xe-4 in rows B and C, but row D seems to be on the same state as row A [10:57:53] I was just trying to flesh out the picture a bit [10:57:59] so if you could check row D for me It would be great [10:58:26] topranks: right, I'm new as well in terms of choosing ports for LVS, so all the information is welcome :) [10:59:11] probably b.black arachnid sense is twitching right now [10:59:43] asw-d4-codfw is also an SFP+ based device. So similar setup to row A (devices 2, 4 and 7 are 10G). [11:03:35] In Netbox it looks like all the ports on asw-d4-codfw are taken though. But it's a 48 port SFP+ device, so should be more free there too. [11:03:46] I think I we need XioNoX [11:03:57] * vgutierrez turns on the X signal [11:04:34] "is it a bird, is it a plane....." [11:06:58] Each codfw row have 3x10G ToR: 2, 4, 7 [11:07:11] and I'm sure they all have free ports [11:07:26] so I just need to mimick eqiad lvs setup on codfw [11:07:30] so you can leave it to dcops to pick a specific port [11:07:34] and benefit from switch 4 as well [11:07:53] cool, I'll do a proposal based on that, thanks guys <3 [11:08:01] In terms of Netbox is our policy not to show free ports on them? [11:08:23] Like asw-a4-codfw should have ports xe-4/0/[21-48] right? [11:08:41] ^ sry asw-d4-codfw [11:09:01] asw-a4-codfw does have free ports showing, but they incorrectly have port set to SFP not SFP+, and naming convention uses "ge-" [11:12:22] well, we don't know if they are xe or something else until there is an optic in them [11:12:39] so they keep their last state until something else is connected [11:12:49] then there is a report that kicks in [11:13:31] Ok. But I guess the ultimate answer is "it's up to DC-Ops" to find free ports. [11:14:37] There is no way to know from netbox directly, unless you know the hardware models and can infer it from what's missing, or what's showing as 1G but could actually be 10G. [11:15:38] vgutierrez: For this exercise there are free 10G ports on asw-a4-codfw and asw-d4-codfw, so you should be able to add LVS there in similar topology to what you mentioned we have in eqiad. [11:16:13] yep exactly [11:16:28] 48 - #of connected cables [11:18:15] cool thanks. [11:19:58] maybe we could pre-fill dummy ports, like 0/0/0 without prefix, but I'd guess it will bring its own problems [11:20:18] Yeah there are no perfect answers here. [11:20:47] One thing Cisco did on the Nexus line was to stop using "FastEthernet", "GigabitEthernet", "TenGigabitEthernet" etc. [11:21:04] Ports are just "EthernetX/Y", regardless of configured speed or module present. [11:25:14] interesting [13:31:45] 10Traffic, 10DC-Ops, 10SRE, 10Sustainability (Incident Followup): Audit eqiad & codfw LVS network links - https://phabricator.wikimedia.org/T286881 (10Vgutierrez) |Host |Row |Host iface |switch iface| |lvs2007|**A**|ens2f0np0|xe-2/0/45| |lvs2008|A|ens2f1np1|xe-7/0/45| |lvs2009|A|ens2f1np1|xe-2/0/43| |lvs20... [14:31:11] 10Traffic: DNS Discovery for active/passive failover within a data centre - https://phabricator.wikimedia.org/T287584 (10BTullis) [15:19:26] 10Traffic, 10SRE: DNS Discovery for active/passive failover within a data centre - https://phabricator.wikimedia.org/T287584 (10Legoktm) p:05Triage→03Medium [17:17:08] 10Traffic, 10Maps, 10Product-Infrastructure-Team-Backlog, 10SRE, and 2 others: Support maps serving for affiliate sites via an allow list - https://phabricator.wikimedia.org/T261694 (10Legoktm) I merged @AntiCompositeNumber's patch and tiles now work on https://barriere.wikimedia.it/ - sorry about the dela... [17:31:28] 10Traffic, 10Maps, 10Product-Infrastructure-Team-Backlog, 10SRE: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Legoktm) [18:02:08] 10Traffic, 10cloud-services-team (Kanban): Puppet broken on diffscan.traffic.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T287612 (10Andrew) [19:34:39] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Legoktm)