[09:31:17] 10Traffic, 10Patch-For-Review: tcp-mss-clamper doesn't work on bullseye / kernel 5.10 - https://phabricator.wikimedia.org/T353657 (10CodeReviewBot) vgutierrez merged https://gitlab.wikimedia.org/repos/sre/tcp-mss-clamper/-/merge_requests/13 Test IPv6 MSS clamping [09:43:46] 10Traffic, 10Patch-For-Review: tcp-mss-clamper doesn't work on bullseye / kernel 5.10 - https://phabricator.wikimedia.org/T353657 (10Vgutierrez) 05Open→03Resolved [09:56:02] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10ayounsi) >>! In T352893#9450804, @akosiaris wrote: > I 've been fearing this and started thinki... [10:47:23] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Support PyBal routes announced with lower priority than "backup" - https://phabricator.wikimedia.org/T354839 (10cmooney) p:05Triage→03Medium [11:01:10] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10cmooney) >>! In T352893#9452446, @ayounsi wrote: >>>! In T352893#9450804, @akosiaris wrote: >>... [11:02:21] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate lvs2013 and lvs2014 codfw row A-B connections to new switches - https://phabricator.wikimedia.org/T348218 (10cmooney) [11:03:42] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, and 2 others: Move lvs2014 link to row A and connect to new row A/B vlans - https://phabricator.wikimedia.org/T352758 (10cmooney) 05Open→03Resolved All work completed on this, lvs2014 made active for several hours and no issues. [11:08:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate mr1-codfw from asw-a1-codfw to lsw1-a2-codfw - https://phabricator.wikimedia.org/T348164 (10cmooney) Traffic has now been re-routed over the new link. Old interfaces from mr1-codfw to asw-a1-codfw has been disabled, as have the sub-interf... [12:40:15] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10ayounsi) > The problem remains that the switch name is not going to be enough to know what to... [12:42:28] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10cmooney) >>! In T352893#9452969, @ayounsi wrote: > Yep, I mentioned it in the loooong Gerrit CR... [13:18:10] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Support PyBal routes announced with lower priority than "backup" - https://phabricator.wikimedia.org/T354839 (10ayounsi) > Once agreed it probably makes sense to remove profile::pybal::override_bgp_med from the puppet class, and replace it with some... [13:36:20] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Support PyBal routes announced with lower priority than "backup" - https://phabricator.wikimedia.org/T354839 (10cmooney) >>! In T354839#9453034, @ayounsi wrote: > On the implementation I'm wondering if instead of introducing a new BGP community, we... [14:02:39] 10netops, 10Infrastructure-Foundations, 10SRE: Codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10cmooney) [14:38:20] 10netops, 10Infrastructure-Foundations, 10SRE: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869 (10cmooney) p:05Triage→03Medium [14:38:59] 10netops, 10Infrastructure-Foundations, 10SRE: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869 (10cmooney) [14:39:09] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, 10serviceops: Test IP-renumbering on kubestage2002.codfw.wmnet - https://phabricator.wikimedia.org/T352883 (10cmooney) [14:39:17] 10netops, 10Infrastructure-Foundations, 10SRE: Codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10cmooney) [14:39:29] 10netops, 10Infrastructure-Foundations, 10SRE: Codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10cmooney) [14:39:37] 10netops, 10Infrastructure-Foundations, 10SRE: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869 (10cmooney) [14:39:57] 10netops, 10Infrastructure-Foundations, 10SRE: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869 (10cmooney) [14:40:05] 10netops, 10Infrastructure-Foundations, 10SRE: Codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10cmooney) [14:40:15] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10cmooney) [14:46:12] 10netops, 10Infrastructure-Foundations, 10SRE: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869 (10cmooney) [15:42:13] 10netops, 10Infrastructure-Foundations, 10SRE: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869 (10Volans) [15:48:57] 10netops, 10Infrastructure-Foundations, 10SRE: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869 (10cmooney) [15:55:30] 10netops, 10Infrastructure-Foundations, 10SRE: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869 (10cmooney) [19:17:10] 10Traffic, 10SRE: Show a better error page when returning an HTTP 429, not the "Our servers are currently under maintenance" one for 5xxs - https://phabricator.wikimedia.org/T354718 (10A_smart_kitten) Admittedly I’m inexperienced here (and so may well be missing something), but in T354858, I received 429 error... [20:17:40] (VarnishHighThreadCount) firing: (7) Varnish's thread count on cp2028:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:22:21] er [20:22:41] (VarnishHighThreadCount) firing: (7) Varnish's thread count on cp2028:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:27:40] (VarnishHighThreadCount) resolved: (7) Varnish's thread count on cp2028:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [22:35:08] 10Traffic, 10SRE: Show a better error page when returning an HTTP 429, not the "Our servers are currently under maintenance" one for 5xxs - https://phabricator.wikimedia.org/T354718 (10Tgr) @A_smart_kitten usually what happens is that the first few users get a HTTP 500, then the throttling logic detects that u...