[00:57:56] (HAProxyEdgeTrafficDrop) firing: 51% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:59:23] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp4043.ulsfo.wmnet with OS buster [01:11:46] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp4043.ulsfo.wmnet with OS buster executed with errors: - cp4043 (**FA... [01:16:06] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp4043.ulsfo.wmnet with OS buster [01:32:56] (HAProxyEdgeTrafficDrop) resolved: 61% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [02:13:17] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp4043.ulsfo.wmnet with OS buster completed: - cp4043 (**PASS**) - R... [09:01:20] 10Traffic: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10Vgutierrez) [09:01:34] 10Traffic: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10Vgutierrez) p:05Triage→03Medium [09:03:28] 10Traffic: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10Vgutierrez) [09:14:28] 10Traffic, 10SRE: Clean up after ATS 9.x upgrade - https://phabricator.wikimedia.org/T321776 (10Vgutierrez) [09:14:53] 10Traffic, 10SRE: Clean up after ATS 9.x upgrade - https://phabricator.wikimedia.org/T321776 (10Vgutierrez) p:05Triage→03Medium [09:17:17] Hello. Is there a rolling restart of haproxy or varnish or similar going on at the moment? We've got these false positive alerts from varnishkafka again in #wikimedia-analytics and it would be useful to tie them to a cause. Thanks. [09:21:12] btullis: nope [09:21:16] ulsfo is depooled [09:21:33] but besides that.. nothing AFAIK [09:23:48] OK, thanks. Most of them are indeed ulsfo, so that helps. But we also have a couple of brief blips from eqiad and codfw. I'll need to go back and tweak the alertmanager rule again. [09:24:20] btullis: afaics there was a spike in requests handled, maybe the datapoints after that were calculated as a drop [09:24:32] (at least for the two eqiad nodes that I checked) [09:27:31] elukey: OK, thanks. Good to know. This is the rule that is firing: https://github.com/wikimedia/operations-alerts/blob/master/team-data-engineering/varnishkafka.yaml#L9 I tried to smoth out spikes caused by starts/stops/pools/deppols but clearly I need to go back to it again. [09:28:43] btullis: it is not a simple use case :( [09:29:56] (HAProxyEdgeTrafficDrop) firing: (2) 62% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:34:56] (HAProxyEdgeTrafficDrop) resolved: (2) 63% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [10:30:16] (VarnishTrafficDrop) firing: Varnish traffic in esams has dropped 69.99595829728356% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [10:32:14] (HAProxyEdgeTrafficDrop) firing: 38% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [10:35:16] (VarnishTrafficDrop) firing: (3) Varnish traffic in eqsin has dropped 57.584549735414726% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [10:37:13] (HAProxyEdgeTrafficDrop) resolved: 51% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [10:40:16] (VarnishTrafficDrop) resolved: (4) Varnish traffic in eqsin has dropped 57.584549735414726% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [10:43:56] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786 (10Clement_Goubert) [10:44:59] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786 (10Clement_Goubert) 05Open→03In progress p:05Triage→03High [10:45:11] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [10:59:56] (HAProxyEdgeTrafficDrop) firing: (2) 15% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [11:04:56] (HAProxyEdgeTrafficDrop) firing: (5) 19% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [11:09:56] (HAProxyEdgeTrafficDrop) resolved: (5) 37% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [13:21:01] 10Traffic, 10SRE: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10CDanis) [13:34:37] 10Traffic: Enterprise redirect for wikimediaenterprise.com to enterprise.wikimedia.com - https://phabricator.wikimedia.org/T321804 (10HShaikh) [16:07:34] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ssingh) [16:08:01] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ssingh) [16:29:15] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) [17:12:57] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10RobH) cp4040 and cp4048 had the DAC cable clicked in on the NIC, but not pressed in quite all the way. Reseated and the link lights came up immediately. [17:57:53] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10RobH) [18:00:32] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ssingh) [18:02:39] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10RobH) [19:31:16] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10RobH) [20:08:54] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host ganeti4006.ulsfo.wmnet with OS buster [20:35:43] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host ganeti4006.ulsfo.wmnet with OS buster executed with errors:... [20:36:19] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host ganeti4006.ulsfo.wmnet with OS bullseye [21:13:29] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host ganeti4006.ulsfo.wmnet with OS bullseye completed: - ganeti4... [21:14:11] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10RobH) [21:18:54] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10RobH) When trying to run the sre.hosts.provision script on cp4052, I get the following issue: ` [1/30, retrying in 30.00s] Polling task: JID_669057087428 not co... [22:37:30] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Andrew) [22:59:56] (HAProxyEdgeTrafficDrop) firing: 43% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [23:24:56] (HAProxyEdgeTrafficDrop) resolved: 63% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop