[05:49:56] (HAProxyEdgeTrafficDrop) firing: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [06:04:56] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:27:55] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10elukey) Quick question about how to proceed. Would it make sense to start testing adding manual labels in the ml-serve-eqiad clu... [08:08:29] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10cmooney) @elukey yes I think that makes sense, no need to hold off on testing. Your suggested label naming makes sense so let's... [11:17:57] (HAProxyEdgeTrafficDrop) firing: 48% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [11:22:57] (HAProxyEdgeTrafficDrop) resolved: 59% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [11:24:57] (HAProxyEdgeTrafficDrop) firing: 57% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [11:29:57] (HAProxyEdgeTrafficDrop) resolved: (4) 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [14:14:43] 10Traffic: Restarting pybal caused icinga error - https://phabricator.wikimedia.org/T308174 (10razzi) 05Open→03Resolved a:03razzi Thanks for the explanation @BBlack, nothing to do here so I'll close this. I saw your comment about no pybal restart being necessary, and that makes sense; I could even see bot... [21:03:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp3061:9331 is unreachable - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [21:13:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp3061:9331 is unreachable - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [21:24:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10fundraising-tech-ops: Upgrade pfw to Junos 20+ - https://phabricator.wikimedia.org/T295691 (10Papaul) The Junos image is now on both pfw ` root@pfw3-eqiad% ls /var/tmp/junos-srxentedge-x86-64-20.4R3-S1.3.tgz /var/tmp/junos-srxentedge-x86-64-20.4R3-S1.3.tgz... [21:25:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp3062:9331 is unreachable - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [21:30:57] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp3062:9331 is unreachable - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [21:35:57] (HAProxyEdgeTrafficDrop) firing: 20% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [21:35:57] (VarnishPrometheusExporterDown) resolved: (2) Varnish Exporter on instance cp3062:9331 is unreachable - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [21:40:57] (HAProxyEdgeTrafficDrop) firing: (6) 64% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [21:45:57] (HAProxyEdgeTrafficDrop) resolved: (6) 64% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [21:59:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp3064:9331 is unreachable - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [22:04:57] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp3064:9331 is unreachable - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [22:09:57] (VarnishPrometheusExporterDown) resolved: (2) Varnish Exporter on instance cp3064:9331 is unreachable - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown