[02:52:17] 06Traffic, 10envoy, 06serviceops, 06SRE: Upgrade Envoy to v1.26.8 and drop buster - https://phabricator.wikimedia.org/T402584#11109389 (10RLazarus) For posterity -- I fatfingered the `reprepro include` the first time and included the _source.changes without the _amd64.changes, so for a couple hours we had... [06:18:52] 06Traffic, 06SRE: Add pageview information to turnilo's webrequest_sampled_live - https://phabricator.wikimedia.org/T402612 (10Joe) 03NEW [07:54:51] FIRING: FermMSS: Unexpected MSS value on 10.2.1.30:9200 @ cirrussearch2113 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=elasticsearch - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [07:59:51] FIRING: [2x] FermMSS: Unexpected MSS value on 10.2.1.30:9200 @ cirrussearch2102 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=elasticsearch - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [08:01:20] Hi sukhe is it ok to depool the dse-k8s servers till we figure out the calico block we have atm? [08:14:51] FIRING: [2x] FermMSS: Unexpected MSS value on 10.2.1.30:9200 @ cirrussearch2102 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=elasticsearch - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [08:19:51] RESOLVED: [2x] FermMSS: Unexpected MSS value on 10.2.1.30:9200 @ cirrussearch2102 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=elasticsearch - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [08:52:00] vgutierrez: how do you think we are doing with the IPIP stuff in eqiad? [08:52:26] specifically I'm thinking about T402432 - and our available fibre bundles between areas of the DC in eqiad [08:52:27] T402432: Eqiad: new structured cabling needed between cages to eqiad 2025/6 switch refresh - https://phabricator.wikimedia.org/T402432 [08:53:25] but I'm wondering if we might be able to re-use the link that had previously been going to lvs1017 (currently unused - we didn't recable to lvs1016) and the one that goes to lvs1018 [08:54:14] no pressure - we need to order more fibres anyway, the only question is if we need to do that right now for current projects or have to wait a few months. so not a huge difference for us. [09:41:25] FIRING: SystemdUnitCrashLoop: varnish-frontend-slowlog.service crashloop on cp3072:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [09:51:25] RESOLVED: SystemdUnitCrashLoop: varnish-frontend-slowlog.service crashloop on cp3072:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [10:07:25] FIRING: SystemdUnitCrashLoop: varnish-frontend-slowlog.service crashloop on cp3072:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [10:22:25] RESOLVED: SystemdUnitCrashLoop: varnish-frontend-slowlog.service crashloop on cp3072:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [10:23:56] 06Traffic, 06Data-Engineering, 13Patch-For-Review: Reduce noise from duplicate sequence-gap alerts on HaProxy-webrequests - https://phabricator.wikimedia.org/T401383#11110120 (10Vgutierrez) p:05High→03Medium sampling on `webrequest_sampled` has been fixed by merging https://gerrit.wikimedia.org/r/1181033 [10:28:59] topranks: high-traffic LVS in eqiad are using IPIP for all their traffic [10:29:25] topranks: low-traffic for non-k8s services too, k8s services are a WIP [10:29:50] so we currently need L2 adjacency on lvs-low-traffic and lvs-secondary [10:32:26] 06Traffic: varnish-frontend-slowlog service restarts with decoding error - https://phabricator.wikimedia.org/T402634 (10Fabfur) 03NEW [10:53:59] vgutierrez: yep, thanks! [10:54:27] iirc the last time we spoke you said you'd prefer to keep those links connected to the high-traffic LVS "just in case" we had to revert [10:54:41] I guess I'm wondering is if the confidence-level has improved and we are now ready to consider removing them [10:56:57] I think so [10:57:12] it's been running with IPIP for months without known issues that can be pinned down to it [11:35:40] 10netops, 06Infrastructure-Foundations, 06SRE: Investigate using BGP addpath for unicast IBGP spine/leaf pods - https://phabricator.wikimedia.org/T402640 (10cmooney) 03NEW p:05Triage→03Medium [11:42:25] 10netops, 06Infrastructure-Foundations, 06SRE: Investigate using BGP addpath for unicast IBGP spine/leaf pods - https://phabricator.wikimedia.org/T402640#11110357 (10cmooney) [13:30:14] stevemunene: hi. the ones in codfw you mean? [13:34:16] I am not up to date with the current status is but there are some considerations here, such as the depool threshold [16:32:49] 06Traffic, 10Hiddenparma: Add known-client-ingestion-source objects an logic - https://phabricator.wikimedia.org/T402014#11111174 (10Scott_French) Thanks for writing this up, @JMeybohm. > Validation/Safeguards: > * We need to make sure imported networks don't overlap with already existing networks in other kn... [17:21:28] 06Traffic, 10envoy, 06serviceops, 06SRE, 13Patch-For-Review: Upgrade Envoy to v1.26.8 and drop buster - https://phabricator.wikimedia.org/T402584#11111269 (10Dzahn) [19:36:59] 06Traffic, 06SRE: Intermittent access issues to English Wikipedia on desktop/laptop - https://phabricator.wikimedia.org/T402142#11111793 (10Josve05a) I say that this ticket can be closed, as yet another of the VRT tickets has been confirmed to have had their issues fixed. [19:40:28] 06Traffic, 06SRE, 07SecTeam-Processed: Intermittent access issues to English Wikipedia on desktop/laptop - https://phabricator.wikimedia.org/T402142#11111803 (10sbassett) 05Open→03Resolved a:03sbassett