[00:02:56] (HAProxyEdgeTrafficDrop) firing: 69% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:07:56] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:41:50] 10netops, 10Infrastructure-Foundations: Netbox DNS changes are not updating - https://phabricator.wikimedia.org/T315630 (10Papaul) [00:47:16] (VarnishTrafficDrop) firing: (3) Varnish traffic in eqsin has dropped 59.43635104608516% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [00:47:56] (HAProxyEdgeTrafficDrop) firing: (4) 22% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:52:16] (VarnishTrafficDrop) firing: (5) Varnish traffic in drmrs has dropped 15.572431413440153% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [00:52:56] (HAProxyEdgeTrafficDrop) firing: (6) 58% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:57:16] (VarnishTrafficDrop) resolved: (8) Varnish traffic in drmrs has dropped 15.572431413440153% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [00:57:56] (HAProxyEdgeTrafficDrop) resolved: (6) 58% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [03:26:57] (HAProxyEdgeTrafficDrop) firing: 24% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [03:31:56] (HAProxyEdgeTrafficDrop) resolved: 24% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:03:06] 10netops, 10Infrastructure-Foundations, 10SRE: Netbox DNS changes are not updating - https://phabricator.wikimedia.org/T315630 (10cmooney) Hi @Papaul My apologies that's due to me, as soon as I can get https://gerrit.wikimedia.org/r/c/operations/dns/+/824572 merged I'll sort it. [07:12:26] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 2 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10Ladsgroup) [09:08:52] 10netops, 10Infrastructure-Foundations, 10SRE: Occasional high ICMP probe response from codfw to cr2-drmrs - https://phabricator.wikimedia.org/T315645 (10cmooney) p:05Triage→03Low [09:09:17] 10netops, 10Infrastructure-Foundations, 10SRE: Occasional high ICMP probe response from codfw to cr2-drmrs - https://phabricator.wikimedia.org/T315645 (10cmooney) a:03cmooney [09:26:13] 10netops, 10Infrastructure-Foundations, 10SRE: Netbox DNS changes are not updating - https://phabricator.wikimedia.org/T315630 (10cmooney) 05Open→03Resolved Ok got the +1 and merged. All is good now. ` cmooney@cumin1001:~$ dig +short A kubernetes2024.codfw.wmnet @ns0.wikimedia.org 10.192.48.87 cmooney... [09:29:57] 10netops, 10Infrastructure-Foundations, 10SRE: Occasional high ICMP probe response from codfw to cr2-drmrs - https://phabricator.wikimedia.org/T315645 (10cmooney) [11:47:14] 10netops, 10Infrastructure-Foundations, 10SRE: Occasional high ICMP probe response from codfw to cr2-drmrs - https://phabricator.wikimedia.org/T315645 (10cmooney) Ok so I got some results back. Firstly 10,000 pings to cr1-drmrs from bast2002, starting at 08:14 UTC. Average RTT was 118ms, worst was 154ms: `... [11:54:56] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE: Allow jumbo frames between cloud hosts in production realm - https://phabricator.wikimedia.org/T315446 (10dcaro) That seemed to do the trick yes! Thanks! [13:55:59] 10netops, 10Infrastructure-Foundations, 10SRE: Netbox DNS changes are not updating - https://phabricator.wikimedia.org/T315630 (10Papaul) @cmooney thank you my changes are now on the DNS server. [14:58:43] 10Traffic: Add DP cookie for pageview filtering - https://phabricator.wikimedia.org/T315676 (10Isaac) [15:00:16] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106 (10Krinkle) [15:00:33] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106 (10Krinkle) [15:25:54] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites - https://phabricator.wikimedia.org/T260449 (10Krinkle) [15:26:05] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Sustainability (Incident Followup): clean up workaround and measurements put in place during Jio RPKI error - https://phabricator.wikimedia.org/T260452 (10Krinkle) [16:10:32] 10netops, 10Infrastructure-Foundations, 10SRE: Overlay VRF / VXLAN traffic failure between lsw1-f2-eqiad and lsw1-f3-eqiad - https://phabricator.wikimedia.org/T315038 (10cmooney) p:05High→03Low Thanks yep case opened with JTAC now will keep it open to document any information they may provide. [17:44:58] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, and 2 others: LibreNMS seemingly not collecting data for many ports after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10CDanis) Looks like this is resolved...? [19:43:32] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, and 2 others: LibreNMS seemingly not collecting data for many ports after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10andrea.denisse) 05Open→03Resolved