[00:06:38] (LVSRealserverMSS) firing: Unexpected MSS value on 198.35.26.98:443 @ ncredir4002 - TODO - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=ulsfo&var-cluster=ncredir - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [00:11:38] (LVSRealserverMSS) firing: (2) Unexpected MSS value on 198.35.26.98:443 @ ncredir4001 - TODO - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=ulsfo&var-cluster=ncredir - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [00:16:38] (LVSRealserverMSS) firing: (2) Unexpected MSS value on 198.35.26.98:443 @ ncredir4001 - TODO - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=ulsfo&var-cluster=ncredir - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [00:21:38] (LVSRealserverMSS) resolved: (2) Unexpected MSS value on 198.35.26.98:443 @ ncredir4001 - TODO - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=ulsfo&var-cluster=ncredir - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [10:05:09] (LVSHighRX) firing: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [10:10:09] (LVSHighRX) resolved: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [10:30:20] 10Traffic, 10Content-Transform-Team-WIP, 10RESTBase, 10RESTBase Sunsetting, and 6 others: PCS caching and pregeneration when restbase is decommissioned - https://phabricator.wikimedia.org/T319365#9566873 (10CodeReviewBot) jgiannelos merged https://gitlab.wikimedia.org/repos/content-transform/nodejs-cassand... [10:37:53] 10Traffic, 10netops, 10DC-Ops, 10Data-Persistence, and 5 others: Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9566904 (10Marostegui) What is the idea? Will codfw remain depooled for a week or two? For DBAs this would be good so we can perform some maintenance in... [10:43:08] 10Traffic, 10netops, 10DC-Ops, 10Data-Persistence, and 6 others: Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9566921 (10Marostegui) [10:50:07] 10Traffic, 10netops, 10DC-Ops, 10Data-Persistence, and 6 others: Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9566974 (10Marostegui) [12:47:48] 10Traffic, 10netops, 10DC-Ops, 10Data-Persistence, and 6 others: Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9567401 (10Marostegui) I'd love if it can be a bit longer than 7 days as we can do lots of operational maintenance and save a bunch of time, but anyway,... [13:01:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9567450 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1034.eqiad.wmnet with OS... [13:08:04] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120#9567484 (10Clement_Goubert) [13:42:27] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9567605 (10aborrero) [13:47:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9567631 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1034.eqiad.wmnet with OS book... [13:54:41] 10netops, 10Infrastructure-Foundations, 10SRE: Control IPv6 RA generation on core routers - https://phabricator.wikimedia.org/T358220#9567655 (10cmooney) p:05Triage→03Low [13:55:17] 10netops, 10Infrastructure-Foundations, 10SRE: Control IPv6 RA generation on core routers - https://phabricator.wikimedia.org/T358220#9567676 (10cmooney) [13:55:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544#9567677 (10cmooney) [15:05:42] 10Traffic, 10netops, 10DC-Ops, 10Data-Persistence, and 6 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9568015 (10jijiki) [15:07:09] (LVSHighRX) firing: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [15:11:11] 10Traffic, 10Content-Transform-Team, 10MW-on-K8s, 10SRE, and 3 others: Create parsoid mediawiki deployment and migrate parsoid-php.discovery.wmnet traffic to it - https://phabricator.wikimedia.org/T357392#9568026 (10akosiaris) [15:12:09] (LVSHighRX) resolved: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [15:12:52] I love the smell of self-resolves in the morning [15:37:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544#9568184 (10cmooney) [15:38:43] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate servers in codfw rack A8 from asw-a8-codfw to lsw1-a8-codfw - https://phabricator.wikimedia.org/T355874#9568181 (10cmooney) 05Open→03Resolved a:03cmooney Closing this, thanks all for the help! [15:42:44] 10Traffic, 10netops, 10DC-Ops, 10Data-Persistence, and 6 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9568203 (10jijiki) [15:43:32] 10Traffic, 10netops, 10DC-Ops, 10Data-Persistence, and 6 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9543213 (10jijiki) [15:47:19] topranks: cp2031 and 32 all yours, good luck! [15:48:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868#9568216 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=93a3c441-2097-4840-a202-5694f260c1b5... [15:49:07] sukhe: awesome thanks! [15:49:13] I'll ping back shortly when we're done [15:56:15] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868#9568300 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=90864fe1-6d91-45db-a2a5-2bb22463c114... [16:08:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868#9568400 (10cmooney) All hosts moved successfully and back responding to pings. [16:08:38] sukhe: all done our side, you can repool when you want [16:08:40] thanks :) [16:09:24] thanks topranks! [16:09:27] will do shortly [16:13:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868#9568428 (10MatthewVernon) Swift is back OK, thanks. [16:32:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9568564 (10cmooney) p:05Triage→03Medium [16:33:40] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9568586 (10cmooney) [16:33:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868#9568588 (10Fabfur) cp2031 and cp2032 are ok and repooled [16:39:11] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9568622 (10cmooney) All interfaces on asw-a-codfw are set to 'disabled' apart from the uplinks to ssw's, and no mac's learnt on SSW side so proceeding to delete those links... [16:56:54] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536#9568747 (10dancy) [17:02:02] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw, 10Patch-For-Review: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9568765 (10cmooney) Ok I've removed the configuration for the ESI-LAG between the codfw spine switches and asw-a-codfw both sides now. DC-Ops you can... [17:02:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw, 10Patch-For-Review: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9568799 (10cmooney) [17:43:50] 10Traffic, 10SRE: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569109 (10cmooney) p:05Triage→03Low [17:44:41] 10Traffic, 10SRE: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569128 (10cmooney) [18:01:49] 10Traffic, 10SRE: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569247 (10cmooney) So I'm realising the RAs are how the LVS is determining the attached v6 subnet and creating the auto-assigned eui-64 addresses on each vlan interface.... [18:12:01] 10Traffic, 10SRE: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569280 (10cmooney) >>! In T358260#9569247, @cmooney wrote: > I notice there is a //"net.ipv6.conf..accept_ra_defrtr"// which from what I can tell will not add a... [18:19:54] 10netops, 10Infrastructure-Foundations, 10SRE: Do we need to generate aggregates for LVS service IP ranges? - https://phabricator.wikimedia.org/T350354#9569320 (10cmooney) 05Open→03Resolved a:03cmooney >>! In T350354#9312533, @BBlack wrote: > I don't suspect it serves any real purpose at present, unles... [18:52:17] 10Traffic, 10SRE: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569458 (10cmooney) [19:04:03] 10Traffic, 10SRE: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569509 (10cmooney) FWIW this was the test I ran on one of our bookworm hosts. Starting with primary interface down, and vlan interface which is built on it also down, plu... [20:07:45] 10Traffic: Fix ncredir nginx access log tailing - https://phabricator.wikimedia.org/T358281#9569849 (10BCornwall) [20:09:07] 10Traffic: Fix ncredir nginx access log tailing - https://phabricator.wikimedia.org/T358281#9569892 (10BCornwall) [20:09:31] 10Traffic: Fix ncredir nginx access log tailing - https://phabricator.wikimedia.org/T358281#9569894 (10BCornwall) 05Open→03In progress p:05Triage→03Medium [20:24:18] 10Traffic: Fix ncredir nginx access log tailing - https://phabricator.wikimedia.org/T358281#9569921 (10BCornwall) I suspect this has to do with the upgrade to Bookworm. The cp nodes are still on Bullseye/0.6.3 while ncredir is on Bookworm/0.6.4 [21:12:15] 10Traffic: Fix ncredir nginx access log tailing - https://phabricator.wikimedia.org/T358281#9570105 (10BCornwall) It looks like perhaps some sort of race condition: It looks like ncredir4001's log socket is messed up (giving connection refused errors) and ncredir4002 is functioning as expected.