[00:16:01] 10netops, 10Infrastructure-Foundations, 10SRE: Rancid on netmon1003 unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10andrea.denisse) [00:16:51] 10netops, 10Infrastructure-Foundations, 10SRE: Rancid on netmon1003 unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10andrea.denisse) 05In progress→03Resolved [00:17:22] 10netops, 10Infrastructure-Foundations, 10SRE: Rancid on netmon1003 unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10andrea.denisse) Fixed in [[ https://gerrit.wikimedia.org/r/822196 | 822196 ]]. [04:29:57] (HAProxyEdgeTrafficDrop) firing: 54% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [04:31:57] (VarnishTrafficDrop) firing: Varnish traffic in eqiad has dropped 68.85855831832225% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [04:34:57] (HAProxyEdgeTrafficDrop) resolved: 52% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [04:36:41] (VarnishTrafficDrop) resolved: Varnish traffic in eqiad has dropped 67.5369097678865% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [08:56:18] 10netops, 10Infrastructure-Foundations, 10SRE: Rancid on netmon1003 unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10cmooney) 05Resolved→03Open @andrea.denisse Hey, does you patch correct the other problem I observed above? With the prompt for accepting the host key cau... [10:41:56] 10Traffic, 10netops, 10Infrastructure-Foundations: Increased latency between eqiad<-->eqsin beginning on August 15th ~08:30:00 - https://phabricator.wikimedia.org/T315429 (10Vgutierrez) [11:57:25] 10Traffic, 10Performance-Team, 10SRE, 10serviceops: multi-dc.lua ATS script failing in production - https://phabricator.wikimedia.org/T315434 (10Vgutierrez) [13:26:04] 10Traffic, 10Beta-Cluster-Infrastructure, 10Infrastructure-Foundations, 10SRE, and 3 others: Evaluation Error on deployment-cache-text06 puppet run - https://phabricator.wikimedia.org/T315351 (10RhinosF1) [13:31:43] 10Traffic, 10Performance-Team, 10SRE, 10serviceops: multi-dc.lua ATS script failing in production - https://phabricator.wikimedia.org/T315434 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez Fix has been deployed. I'll reopen the task if we are still seeing errors [13:31:53] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 2 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10Vgutierrez) [13:38:01] 10netops, 10Infrastructure-Foundations, 10SRE: Rancid on netmon1003 unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10andrea.denisse) @cmooney Thanks for the heads-up, I missed that part, my bad. [14:00:06] 10netops, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE: Enable OSPF Icinga check for EVPN based switches - https://phabricator.wikimedia.org/T315053 (10lmata) [14:07:36] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE: Allow jumbo frames between cloud hosts in production realm - https://phabricator.wikimedia.org/T315446 (10cmooney) p:05Triage→03Medium [14:28:21] 10Traffic, 10Beta-Cluster-Infrastructure, 10Infrastructure-Foundations, 10SRE, and 3 others: Evaluation Error on deployment-cache-text06 puppet run - https://phabricator.wikimedia.org/T315351 (10Zabe) 05Open→03Resolved Some cherry-picks made by ori made puppet run again, see T315394 for follow-up. [15:29:30] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE: Allow jumbo frames between cloud hosts in production realm - https://phabricator.wikimedia.org/T315446 (10cmooney) Ok so looking at this a bit closer it seems the ommision was just that the MTU wasn't set high on cloudsw1-c8, on its links to... [15:59:03] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE: Allow jumbo frames between cloud hosts in production realm - https://phabricator.wikimedia.org/T315446 (10dcaro) [16:22:37] 10Traffic, 10MediaWiki-General, 10SRE, 10MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), 10Patch-For-Review: Roll out query parameter normalization - https://phabricator.wikimedia.org/T314868 (10ori) [16:57:18] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE: Allow jumbo frames between cloud hosts in production realm - https://phabricator.wikimedia.org/T315446 (10cmooney) 05Open→03Resolved Ok gonna close this one as the cloud team have confirmed things are now working for them. Apologies for... [18:51:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10Jclark-ctr) [18:56:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10Jclark-ctr) [21:23:59] 10Traffic, 10MediaWiki-extensions-CentralNotice: Extremely outdated GeoIP cookie - https://phabricator.wikimedia.org/T315490 (10nshahquinn-wmf) [22:37:57] (HAProxyEdgeTrafficDrop) firing: (2) 41% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [22:38:42] win 47 [22:42:56] (HAProxyEdgeTrafficDrop) resolved: (3) 68% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [22:42:58] 10Traffic, 10MediaWiki-extensions-CentralNotice: Extremely outdated GeoIP cookie - https://phabricator.wikimedia.org/T315490 (10Platonides) You are not the first one to notice it. This is basically a duplicate of T122097 [22:43:44] 10Traffic, 10MediaWiki-extensions-CentralNotice: Extremely outdated GeoIP cookie - https://phabricator.wikimedia.org/T315490 (10Platonides) [23:09:57] (HAProxyEdgeTrafficDrop) firing: 67% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [23:14:56] (HAProxyEdgeTrafficDrop) resolved: 66% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [23:16:56] (HAProxyEdgeTrafficDrop) firing: 69% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [23:21:56] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop