[00:38:16] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: use old asw switches from row A and B as msw switches in row C and D - https://phabricator.wikimedia.org/T361871#9792816 (10Papaul) 05Open→03Resolved All the old mgmt switch are back in place [09:28:36] 10netops, 06Infrastructure-Foundations, 06SRE: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929#9793592 (10ayounsi) @cmooney what do you think of duplicating the other POPs allocation scheme? For example looking at eqiad as example, keep 2a02:ec80:a000::/40 as "reserved for future growth" Then... [10:06:49] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9793797 (10gmodena) >>! In T351117#9781136, @Ottomata wrote: >> adopt topic names that follow EP conventions: . 10netops, 06Infrastructure-Foundations, 06SRE: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929#9793972 (10cmooney) >>! In T187929#9793592, @ayounsi wrote: > @cmooney what do you think of duplicating the other POPs allocation scheme? > For example looking at eqiad as example, keep 2a02:ec80:a00... [12:08:06] 06Traffic, 10MoveComms-Support, 10MW-on-K8s, 06serviceops, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9794186 (10Clement_Goubert) We are currently holding at 85% of global traffic, and as such not reimaging anymore... [12:46:30] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9794313 (10Fabfur) >>! In T351117#9793797, @gmodena wrote: >>>! In T351117#9781136, @Ottomata wrote: >>> adopt topic names th... [12:58:15] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9794349 (10gmodena) >>! In T351117#9794313, @Fabfur wrote: >>>! In T351117#9793797, @gmodena wrote: >>>>! In T351117#9781136,... [13:04:11] moritzm: could you take a look to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1031436 :? [13:05:28] firewall::service isn't an option there cause ferm::service doesn't meet our requirements for lvs::realserver::ipip [13:08:00] I'll have a look in a few [13:17:56] thx <3 [13:58:05] 06Traffic, 10Data-Platform-SRE (2024.05.06 - 2024.05.26), 13Patch-For-Review: LVS hosts: Monitor/alert when pooled nodes are outside broadcast domain - https://phabricator.wikimedia.org/T363702#9794749 (10Volans) I think that the proposed check covers only a very specific failure scenario that is unlikely to... [13:58:38] FIRING: [4x] LVSRealserverMSS: Unexpected MSS value on 198.35.26.112:443 @ cp4049 - TODO - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=ulsfo&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [14:03:38] RESOLVED: [4x] LVSRealserverMSS: Unexpected MSS value on 198.35.26.112:443 @ cp4049 - TODO - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=ulsfo&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [14:15:56] 06Traffic, 10Data-Platform-SRE (2024.05.06 - 2024.05.26), 13Patch-For-Review: LVS hosts: Monitor/alert when pooled nodes are outside broadcast domain - https://phabricator.wikimedia.org/T363702#9794864 (10cmooney) >>! In T363702#9794749, @Volans wrote: > I think that the proposed check covers only a very spe... [14:53:05] 06Traffic, 10Data-Platform-SRE (2024.05.06 - 2024.05.26), 13Patch-For-Review: LVS hosts: Monitor/alert when pooled nodes are outside broadcast domain - https://phabricator.wikimedia.org/T363702#9795030 (10bking) @Volans Can you share more details about the external command? One of the problems with this scen... [14:59:56] 06Traffic, 10Data-Platform-SRE (2024.05.06 - 2024.05.26), 13Patch-For-Review: LVS hosts: Monitor/alert when pooled nodes are outside broadcast domain - https://phabricator.wikimedia.org/T363702#9795063 (10bking) >>! In T363702#9794749, @Volans wrote: > I still think that a better solution could be achieved... [15:29:49] 06Traffic, 10Data-Platform-SRE (2024.05.06 - 2024.05.26), 13Patch-For-Review: LVS hosts: Monitor/alert when pooled nodes are outside broadcast domain - https://phabricator.wikimedia.org/T363702#9795235 (10Volans) Before any more digression I'd like a chime in from Traffic to clarify if an external binary mon... [16:08:36] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Extend BGP peer automation via Netbox to include VMs - https://phabricator.wikimedia.org/T364480#9795483 (10ops-monitoring-bot) Deployed homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.5 update to add modified... [16:08:42] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9795486 (10Papaul) [16:09:21] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9795489 (10Papaul) [16:10:53] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9795496 (10Papaul) [16:25:05] 06Traffic, 06Movement-Insights: Disable Chrome Private Prefetch Proxy - https://phabricator.wikimedia.org/T364126#9795849 (10OSefu-WMF) p:05Triage→03High [17:16:37] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9796326 (10Papaul) [17:24:21] 10netops, 06Infrastructure-Foundations, 06SRE: Extend BGP peer automation via Netbox to include VMs - https://phabricator.wikimedia.org/T364480#9796348 (10cmooney) 05Open→03Resolved Patch to Homer wmf plugin merged now, so BGP to VMs at POPs / on L3 switches now under automation too. [17:40:44] 06Traffic, 06DC-Ops, 10ops-ulsfo: Q4: install PCIe NVMe SSDs into ulsfo text cp40(3[789]|4[01234] - https://phabricator.wikimedia.org/T364891 (10RobH) 03NEW [17:41:16] 06Traffic, 06DC-Ops, 10ops-ulsfo: Q4: install PCIe NVMe SSDs into ulsfo text cp40(3[789]|4[01234] - https://phabricator.wikimedia.org/T364891#9796419 (10RobH) [17:41:35] 06Traffic, 06DC-Ops, 10ops-ulsfo: Q4: install PCIe NVMe SSDs into ulsfo text cp40(3[789]|4[01234] - https://phabricator.wikimedia.org/T364891#9796420 (10RobH) [17:42:04] 06Traffic, 06DC-Ops, 10ops-ulsfo: Q4: install PCIe NVMe SSDs into ulsfo text cp40(3[789]|4[01234] - https://phabricator.wikimedia.org/T364891#9796422 (10RobH) [18:01:20] 10netops, 06Data-Platform-SRE, 06Infrastructure-Foundations: an-worker1165.eqiad.wmnet and increased network activity resulting in page on May 13 2024 - https://phabricator.wikimedia.org/T364893#9796533 (10CDanis) To add some context: The ports that saturated weren't ports for individual machines on the acc... [18:16:12] 06Traffic, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team, 07Security, 05SUL3: Create a Wikimedia login domain that can be served by any wiki - https://phabricator.wikimedia.org/T363695#9796638 (10Tgr) [18:16:28] 06Traffic, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team, 07Security, 05SUL3: Create a Wikimedia login domain that can be served by any wiki - https://phabricator.wikimedia.org/T363695#9796623 (10Tgr) [[https://wikitech.wikimedia.org/wiki/Obsolete:Secure.wikimedia.org|secure.wikimedia.org... [20:38:46] 06Traffic: Elevated 503 backend fetch failed reported by users - https://phabricator.wikimedia.org/T364691#9797328 (10Ladsgroup) Third one from third user: {F53225493} I got a fourth one too, I'll send them over tomorrow. [21:57:20] 10netops, 06Data-Platform-SRE, 06Infrastructure-Foundations: an-worker1165.eqiad.wmnet and increased network activity resulting in page on May 13 2024 - https://phabricator.wikimedia.org/T364893#9797744 (10cmooney) Thanks for the task and analysis. > it seems like it was an-worker1165.eqiad.wmnet and 10.64.... [22:07:32] 06Traffic: Craft geo-maps file to create lowest-latency routes from south america - https://phabricator.wikimedia.org/T363722#9797776 (10GreenReaper) For what it's worth, when I was setting up the São Paulo cache with Oracle Cloud for [Inkbunny](https://inkbunny.net), I found some ISPs used undersea cables along...