[00:06:21] 06Traffic, 06Data-Engineering-Radar: 14Lock-in Varnish and VarnishKafka versions - 14https://phabricator.wikimedia.org/T304617#9691260 (10BCornwall) 05In progress→03Resolved 14Thanks, @ssingh for the patch! [06:35:19] 10netops, 06Infrastructure-Foundations: eqiad-drmrs transport down (April 2024) - https://phabricator.wikimedia.org/T361825#9691663 (10ops-monitoring-bot) ===== Automated diagnostic for Netbox circuit ID 108 --- **Interface cr1-drmrs:xe-0/1/2** - admin-status: up - oper-status: up - interface-flapped: 2024-... [06:36:09] 10netops, 06Infrastructure-Foundations: 14eqiad-drmrs transport down (April 2024) - 14https://phabricator.wikimedia.org/T361825#9691664 (10ayounsi) 05Open→03Resolved a:03ayounsi 14> RFO: The unavailability of the link was due to problems with optical modules and cards at the Marseille and Paris, Fra... [06:53:53] sukhe XioNoX: sorry, I somehow missed your answers :/ I'm going to have a look at turnilo, and if I come up empty, I'll DM you and we'll see what we come up with, thanks! [06:55:56] brouberol: sure, don't hesitate [06:56:05] thanks! [08:03:47] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: use old asw switches from row A and B as msw switches in row C and D - https://phabricator.wikimedia.org/T361871#9691816 (10ayounsi) We first need to discuss if we want to start using managed switches for management switches (except the agg... [09:21:44] 06Traffic: 14Connection failed for a few minutes - 14https://phabricator.wikimedia.org/T360982#9692031 (10Aklapper) 05Open→03Resolved 14No reply from @Ztimes3; boldly resolving [09:36:18] 06Traffic, 06DC-Ops, 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9692080 (10cmooney) >>! In T350179#9690121, @ssingh wrote: > Any other opinions/thoughts on how we can try and fix this and where? I am... [09:45:37] 06Traffic, 06DC-Ops, 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9692099 (10cmooney) >>! In T350179#9586432, @ayounsi wrote: > Last maybe we could explore relying less on PXE, for example is it possibl... [10:53:46] 06Traffic, 06DC-Ops, 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9692338 (10MoritzMuehlenhoff) We could also consider to pass this over to Dell support? [11:11:31] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: use old asw switches from row A and B as msw switches in row C and D - https://phabricator.wikimedia.org/T361871#9692392 (10cmooney) >>! In T361871#9691816, @ayounsi wrote: > We first need to discuss if we want to start using managed switch... [12:10:38] 10netops, 10Ganeti, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: 14Investigate Ganeti in routed mode - 14https://phabricator.wikimedia.org/T300152#9692520 (10ops-monitoring-bot) 14cookbooks.sre.hosts.decommission executed by ayounsi@cumin1002 for hosts: `testvm2006.codfw.wmnet` - testvm2... [13:23:18] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: use old asw switches from row A and B as msw switches in row C and D - https://phabricator.wikimedia.org/T361871#9692687 (10Papaul) @ayounsi @cmooney thanks for all the inputs. What I am asking is to use the Juniper old switches as dummies... [14:20:31] 06Traffic, 06DC-Ops, 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9692885 (10ssingh) >>! In T350179#9692080, @cmooney wrote: >>>! In T350179#9690121, @ssingh wrote: >> Any other opinions/thoughts on how... [14:29:26] 06Traffic, 06DC-Ops, 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9692933 (10ssingh) >>! In T350179#9692338, @MoritzMuehlenhoff wrote: > We could also consider to pass this over to Dell support? My onl... [14:38:35] 06Traffic, 06DC-Ops, 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9692988 (10cmooney) >>! In T350179#9692885, @ssingh wrote: > All cp hosts in eqiad are in rows A, B, C, and D, so that does look worth t... [15:19:25] 06Traffic, 06DC-Ops, 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9693083 (10ssingh) Continuing to trying to isolate the possible causes of this, I noticed when dumping the facter output between the dif... [17:46:25] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9693484 (10Ottomata) Very cool!