[03:17:23] 10SRE-tools, 10Infrastructure-Foundations: Netbox accounting report: exclude removed hosts - https://phabricator.wikimedia.org/T320955 (10wiki_willy) Hi @Volans - thanks for reaching out with the suggestion. We definitely could start doing that going forward. In this additional tab on the Accounting Spreadsh... [07:34:42] 10netops, 10Infrastructure-Foundations: BFD flapping between cr1-eqiad and cr2-drmrs - https://phabricator.wikimedia.org/T321034 (10ayounsi) p:05Triage→03High [07:36:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10dcaro) > You also have to bear in mind, with some tasks like like Ceph initial syncing, that a well tuned/performant system will use whatever bandwidth is... [08:45:41] 10netops, 10Infrastructure-Foundations, 10SRE: BFD flapping between cr1-eqiad and cr2-drmrs - https://phabricator.wikimedia.org/T321034 (10cmooney) I think I may have solved this, although through nothing logical, similar to the earlier BGP bounce restoring the IPv6. I disabled OSPF for the interface and re... [09:07:54] 10netops, 10Infrastructure-Foundations, 10SRE: BFD flapping between cr1-eqiad and cr2-drmrs - https://phabricator.wikimedia.org/T321034 (10ayounsi) 05Open→03Resolved a:03cmooney Awesome, thanks! I cleared the Icinga downtimes now that it's all back to normal. [10:17:59] 10Mail, 10Data Engineering Planning, 10Data-Engineering-Operations, 10SRE: Add xcollazo@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T315486 (10BTullis) Thanks @Ladsgroup - noted. So it seems that we have five topic or team based `-alerts` lists on mailman already:... [11:16:57] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10cmooney) > I'll give the test a go somewhere just to see what the throughput bottleneck looks like in grafana 👍 Cool. If you're starting with iperf I c... [11:22:33] 10SRE-tools, 10Infrastructure-Foundations: Netbox accounting report: exclude removed hosts - https://phabricator.wikimedia.org/T320955 (10Volans) @wiki_willy for the format whatever is easier for you based on your workflow, here a couple of alternative options that comes to mind, but feel free to propose somet... [14:19:16] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Reduce the count of Netbox devices with incorrect status - https://phabricator.wikimedia.org/T320696 (10Volans) >>! In T320696#8313511, @Volans wrote: >> As a first step I suggest that we identify on https://wikitech.wikimedia.org/wiki/File:Server_L... [15:07:15] 10netops, 10Cloud Services Proposals, 10Infrastructure-Foundations, 10SRE: Separate WMCS control and management plane traffic - https://phabricator.wikimedia.org/T314847 (10cmooney) I had a good chat with @aborrero today on some ideas on how to progress towards this goal. Some notes / additional thoughts... [15:13:13] 10Mail, 10Data Engineering Planning, 10Data-Engineering-Operations, 10SRE: Add xcollazo@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T315486 (10Ladsgroup) >>! In T315486#8324480, @BTullis wrote: > Thanks @Ladsgroup - noted. So it seems that we have five topic or te... [15:13:47] 10Mail, 10Data Engineering Planning, 10Data-Engineering-Operations, 10SRE: Add xcollazo@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T315486 (10Ladsgroup) Another thing. Mailman2 had many many issues but mailman3 (the current infra) is much easier to use and handle. [15:14:03] 10netops, 10Cloud Services Proposals, 10Infrastructure-Foundations, 10SRE: Separate WMCS control and management plane traffic - https://phabricator.wikimedia.org/T314847 (10aborrero) >>! In T314847#8325727, @cmooney wrote: > I had a good chat with @aborrero today on some ideas on how to progress towards th... [16:37:08] 10netops, 10Cloud Services Proposals, 10Infrastructure-Foundations, 10SRE: Separate WMCS control and management plane traffic - https://phabricator.wikimedia.org/T314847 (10taavi) > Probably makes sense to choose a /16 from 172.16.0.0/12 for the supernet, and allocate per-rack /24s from this. Please keep i... [18:10:48] 10netops, 10Cloud Services Proposals, 10Infrastructure-Foundations, 10SRE: Separate WMCS control and management plane traffic - https://phabricator.wikimedia.org/T314847 (10cmooney) >> /32 Service IPs should be from the cloud realm public /24 (185.15.56.0/24) if the service needs to be reachable from inter... [18:28:13] 10netops, 10Infrastructure-Foundations, 10SRE: Set consistent MTUs - https://phabricator.wikimedia.org/T315838 (10cmooney) FWIW I didn't get to the bottom of the MTU difference. But I was able to confirm that it is a real issue, i.e. there is a 4-byte "blackhole" where the switches will transmit packets wit... [19:21:20] 10SRE-tools, 10Infrastructure-Foundations: Netbox accounting report: exclude removed hosts - https://phabricator.wikimedia.org/T320955 (10wiki_willy) Got it, thanks @Volans! I'll sync up with my team to get their thoughts and feedback on Thursday, and get back to you afterwards. [20:02:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10Sustainability (Incident Followup): Cr1-eqiad comms problem when moving to 40G row D handoff - https://phabricator.wikimedia.org/T320566 (10cmooney) Myself and @ayounsi were able to narrow down the issue a bit more during testing yesterday. It seems the iss...