[05:21:56] (HAProxyEdgeTrafficDrop) firing: 31% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [05:22:16] (VarnishTrafficDrop) firing: (3) Varnish traffic in codfw has dropped 27.821256328284253% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [05:26:56] (HAProxyEdgeTrafficDrop) resolved: (4) 64% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [05:27:16] (VarnishTrafficDrop) resolved: (4) Varnish traffic in codfw has dropped 26.5354538645019% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [09:10:56] (HAProxyEdgeTrafficDrop) firing: 66% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:15:56] (HAProxyEdgeTrafficDrop) resolved: (2) 67% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:41:39] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10Vgutierrez) 05Open→03In progress [11:26:42] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: `ganeti4003.ulsfo.wmnet` - ganeti4003.ulsfo.wmnet (**PASS**)... [13:02:44] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Create mw-api-int helmfile deployment - https://phabricator.wikimedia.org/T321895 (10Clement_Goubert) [13:03:18] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Create mw-api-ext helmfile deployment - https://phabricator.wikimedia.org/T321896 (10Clement_Goubert) [13:03:45] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Create mw-jobrunner helmfile deployment - https://phabricator.wikimedia.org/T321897 (10Clement_Goubert) [13:04:21] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Create mw-videoscaler helmfile deployment - https://phabricator.wikimedia.org/T321899 (10Clement_Goubert) [13:04:44] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Create mw-web helmfile deployment - https://phabricator.wikimedia.org/T321900 (10Clement_Goubert) [14:04:10] 10Traffic, 10SRE: ATS should alert if the number of total or active connections reached maximum - https://phabricator.wikimedia.org/T292815 (10Vgutierrez) 05Resolved→03Open reopening this as I've found some issues: metric names aren't consistent with existing ones, all the previous metrics are named using... [14:08:29] 10Traffic, 10SRE: ATS should alert if the number of total or active connections reached maximum - https://phabricator.wikimedia.org/T292815 (10Vgutierrez) oh, and we're seeing some errors like: ` Oct 27 02:06:46 cp4043 prometheus-ats-config[1787]: Traffic Server: failed to fetch proxy.config.net.max_connection... [14:33:11] 10Traffic, 10SRE, 10Performance-Team (Radar): Track TTFB per Cache Status Code in ATS - https://phabricator.wikimedia.org/T321484 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [15:49:52] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host cp4052.ulsfo.wmnet with OS buster [15:50:30] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host cp4052.ulsfo.wmnet with OS buster executed with errors: - cp4052 (**FAI... [16:35:36] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10RobH) [17:33:06] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ssingh) [17:33:26] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ssingh) [17:35:20] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ssingh) Traffic update: all the new cp hosts in ulsfo are marked active and pooled. Rob: Feel free to mark this as resolved. Thanks to @RobH, @Papaul, @BBlack, @c... [17:36:14] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10RobH) 05Open→03Resolved [22:58:58] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Dzahn) [23:32:57] (HAProxyEdgeTrafficDrop) firing: 20% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [23:37:57] (HAProxyEdgeTrafficDrop) resolved: 60% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop