[07:53:52] 10Traffic, 10netops, 10Infrastructure-Foundations: Upgrade to Bird 2 - https://phabricator.wikimedia.org/T310574 (10ayounsi) p:05Triage→03Low [09:01:56] (HAProxyEdgeTrafficDrop) firing: 52% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:06:56] (HAProxyEdgeTrafficDrop) resolved: 52% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:09:56] (HAProxyEdgeTrafficDrop) firing: 48% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:14:56] (HAProxyEdgeTrafficDrop) resolved: (2) 66% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [12:03:07] 10Traffic, 10Community-Tech, 10MediaWiki-Parser, 10SRE, and 3 others: Show SVGs in page language if available - https://phabricator.wikimedia.org/T205040 (10Winston_Sung) [12:52:58] 10Traffic, 10SRE-OnFire, 10Sustainability (Incident Followup): (Re) evaluate effectiveness / usefulness of varnish/haproxy traffic drop alerts - https://phabricator.wikimedia.org/T310608 (10fgiunchedi) [12:58:41] 10Traffic, 10SRE, 10Wikimedia-Incident: 503 Service Unavailable - https://phabricator.wikimedia.org/T310368 (10fgiunchedi) Please see https://wikitech.wikimedia.org/wiki/Incidents/2022-06-14_overload_varnish_/_haproxy for the public incident report (we know what's going on, the report is light on details on... [14:02:00] 10Traffic, 10SRE, 10serviceops: fawiki user reports getting 503 errors with message "upstream connect error or disconnect before headers" - https://phabricator.wikimedia.org/T310450 (10CDanis) This error message comes from [[ https://www.envoyproxy.io/ | Envoy ]], which we use for internal cross-service TLS... [14:10:47] 10Domains, 10Analytics-Radar, 10SRE, 10Traffic-Icebox, 10WMF-General-or-Unknown: Don't set cookies in traffic layer for non-user facing domains (avoid false third-party cookie warning) - https://phabricator.wikimedia.org/T262996 (10Nemo_bis) Is this related to https://phabricator.wikimedia.org/T255366 ? [14:19:06] 10Traffic, 10Community-Tech, 10MediaWiki-Parser, 10SRE, and 3 others: Show SVGs in page language if available - https://phabricator.wikimedia.org/T205040 (10Winston_Sung) [17:27:47] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: cp1089 memory errors on DIMM_B1 - https://phabricator.wikimedia.org/T310387 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson The server was out of warranty, I swapped DIMM B1 with a DIMM from a spare. Server booted, no issues. [17:41:53] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: cp1089 memory errors on DIMM_B1 - https://phabricator.wikimedia.org/T310387 (10ssingh) >>! In T310387#8003619, @Cmjohnson wrote: > The server was out of warranty, I swapped DIMM B1 with a DIMM from a spare. Server booted, no issues. Thanks for your help @Cmjohnson! [19:02:57] (HAProxyEdgeTrafficDrop) firing: 23% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [19:07:57] (HAProxyEdgeTrafficDrop) resolved: 40% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [19:22:57] (HAProxyEdgeTrafficDrop) firing: 60% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [19:27:56] (HAProxyEdgeTrafficDrop) resolved: 67% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [20:33:56] (HAProxyEdgeTrafficDrop) firing: (2) 41% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [20:38:56] (HAProxyEdgeTrafficDrop) resolved: (2) 42% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop