[06:41:11] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10295141 (10Papaul) [06:46:29] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10295142 (10Papaul) There will be some maintenance in magru sometime next week and the site will be de-pool we can take advantage of this maintenance window to upgrade the router th... [08:12:22] 10netops, 06Infrastructure-Foundations, 06SRE, 10SRE-tools, 13Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485#10295209 (10Volans) >>! In T336485#10294334, @cmooney wrote: > I don't see that forced in /etc/ssh/ssh_config though. Also w... [08:45:50] 06Traffic: Provide Debian packetization - https://phabricator.wikimedia.org/T377613#10295261 (10Fabfur) 05In progress→03Resolved [08:45:58] 06Traffic: Puppet configuration for haproxykafka - https://phabricator.wikimedia.org/T377614#10295263 (10Fabfur) 05In progress→03Resolved [08:46:18] 06Traffic: Puppet spec tests for haproxykafka module - https://phabricator.wikimedia.org/T378330#10295265 (10Fabfur) 05In progress→03Resolved [08:47:55] 06Traffic, 06Data-Engineering, 13Patch-For-Review: Rollout haproxykafka on all hosts - https://phabricator.wikimedia.org/T378578#10295269 (10Fabfur) [08:49:31] 06Traffic, 06Data-Engineering, 13Patch-For-Review: Rollout haproxykafka on all hosts - https://phabricator.wikimedia.org/T378578#10295271 (10Fabfur) Ticket description has been updated to reflect the outcome of latest meeting with @gmodena and @Ahoelzl . [10:58:55] 06Traffic, 06Data-Engineering, 13Patch-For-Review: Rollout haproxykafka on all hosts - https://phabricator.wikimedia.org/T378578#10295610 (10Fabfur) [11:19:04] XioNoX: I think I'm ready to enable BGP for lvs1013 on netbox [11:22:11] vgutierrez: nice! [11:22:51] BTW... scapy has its own routing table.. so it comes pretty handy to test an offline load balancer [11:23:28] without messing with the system routing table [11:24:05] vgutierrez: let's do it after lunch if that's ok for you [11:24:07] but I guess that nowadays I could spawn a shell in a different netns [11:24:10] XioNoX: #define lunch [11:24:34] but sure, no problem, I'm not in a rusgh [11:24:36] *rush [11:24:50] ~now to ~1h later :) French time [11:25:05] probably in ~2h then [11:25:15] that colllides with my daily walk with the dogs [11:25:24] no pb! [13:58:31] XioNoX: whenever you're ready :) [13:58:42] cool, let's do it [13:59:35] switched the BGP flag, running homer as diff right now [14:00:09] diff looks good [14:00:17] https://www.irccloud.com/pastebin/vZ8HtM7z/ [14:00:35] nice [14:00:36] vgutierrez: should I push it now? [14:00:50] let me depool ncredir@eqiad first :) [14:01:08] sounds good :) [14:07:32] ok... clients honoring TTL should be safe now [14:07:36] XioNoX: go ahead :D [14:08:02] vgutierrez: running [14:09:03] I'll hit gobgp with a neighbor reset as soon as it's ready [14:09:34] hmmm on need apparently... the switch went active :) [14:09:38] vgutierrez: done, prefixes received [14:09:56] Communities: 14907:11 [14:10:00] lovely [14:10:57] but the core routers still prefer 1017 for 208.80.154.232 [14:11:13] just for the ipv4 one? [14:11:36] I think I know what's up [14:11:58] for v6 too [14:12:09] https://www.irccloud.com/pastebin/VAy17FLF/ [14:12:16] things look sane on my side, let me know :) [14:13:08] vgutierrez: yeah, just a missing rule, nothing bad, just need more work [14:13:15] ack [14:13:23] no requests have been harmed so far.. so we are ok ;P [14:14:59] basically the CRs don't know they have to prioritize that prefix learned from the spine switches (and thus from lvs1013) [14:16:02] topranks: if you have another minute: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1087918 [14:16:54] +1, but I think we may have missed it other places too [14:18:21] ah? [14:20:21] maybe not [14:20:54] it needs to be matched on routes learnt by switches from CRs, but maybe that's covered by the same policy [14:20:56] trying to find [14:21:42] for that specific usecase it shouldn't be needed, but better not leave holes [14:22:28] perhaps, depends on if the test LVS peers with CRs or switches [14:22:49] ToR switch at the moment [14:23:00] I can adjust it if needed [14:23:13] seems that uses the BGP_Infra_In from the file you changed [14:25:05] XioNoX: it's all good, the same policy is used on both sides (spine switches and CRs) [14:25:10] so you're change covers both directions [14:25:28] *your [14:25:52] still waiting for homer to run on cr*eqiad* :) [14:25:55] thx for checking [14:29:32] vgutierrez: alright, lvs1013 is now prefered from cr1 [14:30:27] cool [14:30:38] yeah.. it's already handling some traffic [14:31:20] I'll repool it at DNS level [14:38:09] vgutierrez: done on cr2 too [14:41:30] 06Traffic: liberica puppetization - https://phabricator.wikimedia.org/T377127#10296386 (10Vgutierrez) 05Open→03Resolved [14:42:19] 10netops, 06Infrastructure-Foundations: Testing liberica with ncredir@eqiad - https://phabricator.wikimedia.org/T378453#10296381 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez Thx @ayounsi & @cmooney. lvs1013 running liberica is now the primary load balancer for ncredir@eqiad [14:45:07] vgutierrez: what's the best way to not forget to rollback the changes from https://phabricator.wikimedia.org/T378453 once the tests are completed? [14:46:52] XioNoX: let me create a task for deciding the final BGP config for libeica :) [14:47:07] perfect [14:50:36] 10netops, 06Traffic, 06Infrastructure-Foundations: BGP settings for liberica - https://phabricator.wikimedia.org/T379164 (10Vgutierrez) 03NEW [14:50:37] 10netops, 06Traffic, 06Infrastructure-Foundations: BGP settings for liberica - https://phabricator.wikimedia.org/T379164#10296452 (10Vgutierrez) p:05Triage→03Medium [14:51:24] vgutierrez: XioNoX: very cool <3 [15:04:09] I personally would be somewhat in favour of keeping the current config [15:04:33] Liberica may not use it of course, but I don't think having the community there and the match/local-pref in place is a bad thing in general [15:17:15] 10netops, 06Traffic, 06Infrastructure-Foundations: BGP settings for liberica - https://phabricator.wikimedia.org/T379164#10296519 (10cmooney) I personally don't think the current config is a bad thing to have in general (we have a lower pref/normal pref/higher pref community defined). None of the community... [15:51:23] 06Traffic, 10Data-Engineering (Q2 2024 October 1st - December 31th), 13Patch-For-Review: Rollout haproxykafka on all hosts - https://phabricator.wikimedia.org/T378578#10296649 (10Ahoelzl) [16:44:21] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296813 (10Papaul) [16:45:42] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296815 (10Papaul) [16:45:58] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296817 (10Papaul) [17:07:26] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296916 (10ssingh) `cr1-eqiad` is stated for Nov 13 but note that T376737 is also scheduled for that period (Nov 13, 8 CT) and it might make tricky for both `magru` and `eqiad` to... [17:12:45] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296934 (10Papaul) [17:13:40] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296938 (10Papaul) @ssingh thanks i forgot about the 13th I update the dates. [17:13:53] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296937 (10Joe) I see there is a maintenance planned for codfw now, and that the plan is to depool the datacenter. Does this mean we're doing a datacenter switchover? Because oth... [17:18:57] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296958 (10akosiaris) > Upgrades should follow the standard process The standard process docs are outdated I fear. > Depool site (optional) > (optional) if codfw, drain mw traff... [17:18:57] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296959 (10Papaul) >>! In T364092#10296937, @Joe wrote: > I see there is a maintenance planned for codfw now, and that the plan is to depool the datacenter. Does this mean we're do... [17:22:02] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296980 (10Papaul) Thanks @akosiaris @Joe we can hold back on codfw for now and work on eqiad. when we switch back to eqiad we can schedule the upgrade for codfw. [17:22:34] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296985 (10Papaul) [17:28:03] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10297017 (10akosiaris) >>! In T364092#10296980, @Papaul wrote: > Thanks @akosiaris @Joe we can hold back on codfw for now and work on eqiad. when we switch back to eqiad we can sche... [17:30:46] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10297026 (10cmooney) >>! In T364092#10296958, @akosiaris wrote: > codfw will be the primary during that set of dates, it should NOT be depooled. Agreed. It should also be possible... [17:33:18] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297053 (10cmooney) @Jclark-ctr could you also let me know what ports on the fmsw these two were plugged i... [18:28:38] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297301 (10cmooney) >>! In T377381#10250655, @Jgreen wrote: > There are 6 servers being replaced: > {T3695... [18:50:59] 10Domains, 06Traffic, 06SRE, 13Patch-For-Review: Acquire enwp.org - https://phabricator.wikimedia.org/T332220#10297394 (10BCornwall) Hi, @violetwtf, has Thomas responded? Thanks for getting on this. :) [19:21:06] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297478 (10cmooney) All, just to be aware I hit another snag this evening which may be problematic. When... [20:09:07] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297583 (10Dwisehaupt) > Thanks @Jgreen . Looking at the existing ports on the switch I think it might ma... [20:22:19] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297595 (10Jclark-ctr) @cmooney replaced 1g dac cables with sfpt and cat6 cables. These two switches ha... [20:54:45] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297675 (10cmooney) >>! In T377381#10297595, @Jclark-ctr wrote: > These two switches have been removed fro...