[00:02:16] 06Traffic: ncmonitor should set ncredir entries to their similar counterparts - https://phabricator.wikimedia.org/T368692 (10BCornwall) 03NEW [00:02:19] 06Traffic: ncmonitor should set ncredir entries to their similar counterparts - https://phabricator.wikimedia.org/T368692#9933145 (10BCornwall) p:05Triage→03Low [00:09:24] 06Traffic: Add prometheus metrics to ncmonitor - https://phabricator.wikimedia.org/T368693 (10BCornwall) 03NEW [00:09:35] 06Traffic: Add prometheus metrics to ncmonitor - https://phabricator.wikimedia.org/T368693#9933168 (10BCornwall) p:05Triage→03Medium [00:13:31] 06Traffic: ncmonitor should not submit new CRs if there are still some yet to be reviewed - https://phabricator.wikimedia.org/T368694 (10BCornwall) 03NEW [00:13:37] 06Traffic: ncmonitor should not submit new CRs if there are still some yet to be reviewed - https://phabricator.wikimedia.org/T368694#9933186 (10BCornwall) p:05Triage→03Medium [00:32:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5024 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [00:37:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5024 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [00:39:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5024 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [00:40:54] 06Traffic, 06DC-Ops, 10ops-eqsin, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9933201 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5024.eqsin.wmnet with OS bullseye compl... [00:44:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5024 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [00:46:05] 06Traffic, 06DC-Ops, 10ops-eqsin, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9933202 (10BCornwall) [04:03:19] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9933337 (10Papaul) [07:13:24] 10netops, 06Traffic, 06Infrastructure-Foundations, 06serviceops: IPIP encapsulation considerations for low-traffic services - https://phabricator.wikimedia.org/T368544#9933423 (10ayounsi) IPIP encapsulation is a necessary step in the good direction, whatever solution we decide on for load balancing, for th... [07:18:26] 06Traffic: Replace ping offload servers with eBPF - https://phabricator.wikimedia.org/T367973#9933433 (10ayounsi) Indeed, amazing ! Just a few lines of code to replace multiple VMs and router policies :) > Is part of the rationale for the ping servers not to reduce traffic (as well as load) on the LVS? I don't... [07:50:02] 06Traffic, 10Data-Platform-SRE (2024.06.17 - 2024.07.07), 13Patch-For-Review: Migrate Cloudelastic load balancing to IPIP encapsulation (LVS) - https://phabricator.wikimedia.org/T367511#9933480 (10Gehel) >>! In T367511#9915026, @Don-vip wrote: > Did this change fix T365154? video2commons became live agai... [08:33:19] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Configure QoS marking and policy across network - https://phabricator.wikimedia.org/T339850#9933530 (10cmooney) [09:13:22] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e7-eqiad - https://phabricator.wikimedia.org/T365988#9933563 (10cmooney) 05Open→03Resolved Thanks all for the help with this one! [10:28:50] 10netops, 06Traffic, 06Infrastructure-Foundations, 06serviceops: IPIP encapsulation considerations for low-traffic services - https://phabricator.wikimedia.org/T368544#9933828 (10cmooney) >>! In T368544#9933423, @ayounsi wrote: > An `ip route 0/0` rule would be needed to "clamp" the outbound MTU or MSS (us... [12:12:30] 06Traffic, 10Data-Platform-SRE (2024.06.17 - 2024.07.07), 13Patch-For-Review: Migrate Cloudelastic load balancing to IPIP encapsulation (LVS) - https://phabricator.wikimedia.org/T367511#9934107 (10Don-vip) Yes, taavi pointed us towards the change that repaired video2commons: https://phabricator.wikimedia... [12:34:21] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322#9934200 (10cmooney) @fgiunchedi I was perhaps a little cheeky and merged this, but it was clear the volume of new metrics was well withi... [14:59:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=esams&var-instance=cp3066 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:59:50] hmmmmm [15:04:40] FIRING: [6x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:05:31] seems to have subsided? [15:05:50] getting better yep [15:09:40] FIRING: [7x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:14:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:24:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:29:40] FIRING: [4x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:34:41] RESOLVED: [3x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:22:47] 06Traffic: [ncmonitor] Detect, ignore, and notify about duplicate domain name entries in MarkMonitor - https://phabricator.wikimedia.org/T368758 (10BCornwall) 03NEW [16:22:53] 06Traffic: [ncmonitor] Detect, ignore, and notify about duplicate domain name entries in MarkMonitor - https://phabricator.wikimedia.org/T368758#9935230 (10BCornwall) p:05Triage→03Medium [16:25:05] 06Traffic: [ncmonitor] Detect, ignore, and notify about duplicate domain name entries in MarkMonitor - https://phabricator.wikimedia.org/T368758#9935234 (10BCornwall) [18:25:06] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322#9935568 (10cmooney) I may have spoken too soon when I said things were working fine. It seems in codfw since the change we are only get... [20:46:28] 06Traffic: [ncmonitor] Detect, ignore, and notify about duplicate domain name entries in MarkMonitor - https://phabricator.wikimedia.org/T368758#9936142 (10BCornwall) 05Open→03In progress [21:58:27] 06Traffic, 10DNS, 10fundraising-tech-ops, 06SRE, 13Patch-For-Review: Cleanup unused DNS subdomains - https://phabricator.wikimedia.org/T367012#9936334 (10Dwisehaupt) Adding @AKanji-WMF on this to coordinate with Major Gifts for the benefactors site. Anil: The previous tasks associated with this are: T10...