[09:56:51] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp2035.codfw.wmnet with OS buster [10:02:56] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp2035:9331 is unreachable - https://alerts.wikimedia.org [10:11:11] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp2035:9331 is unreachable - https://alerts.wikimedia.org [10:12:56] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp2035:9331 is unreachable - https://alerts.wikimedia.org [10:26:11] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp2035:9331 is unreachable - https://alerts.wikimedia.org [10:27:56] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp2035:9331 is unreachable - https://alerts.wikimedia.org [10:32:56] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp2035:9331 is unreachable - https://alerts.wikimedia.org [10:39:28] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp2035.codfw.wmnet with OS buster c... [10:46:49] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp1083.eqiad.wmnet with OS buster [10:52:56] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp1083:9331 is unreachable - https://alerts.wikimedia.org [11:06:11] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp1083:9331 is unreachable - https://alerts.wikimedia.org [11:10:31] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp1083.eqiad.wmnet with OS buster e... [11:11:11] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp1083:9331 is unreachable - https://alerts.wikimedia.org [11:11:20] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp1083.eqiad.wmnet with OS buster [11:47:56] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp1083:9331 is unreachable - https://alerts.wikimedia.org [11:49:27] btullis: re https://gerrit.wikimedia.org/r/c/operations/puppet/+/768668 you might wanna merge it like that and let puppet configure the VIP on the backend servers [11:50:49] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp1083.eqiad.wmnet with OS buster c... [12:04:09] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez) [12:04:40] vgutierrez: OK, thanks for the review. I'm still just a bit confused what I'm supposed to use for the service name in `profile::lvs::realserver::pools`. Should it be the name of the systemd unit instance? [12:17:57] Afaik yes, cause that will be used by the safe restart script to restart the service [12:38:16] OK, great. That's what I've updated it to now. [16:40:47] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10Milimetric) > Perhaps a way forward would be to find a way to serve those use cases by design instead of by accident.... [17:20:45] 10Traffic, 10Beta-Cluster-Infrastructure, 10SRE, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10AlexisJazz) >>! In T302699#7756446, @Vgutierrez wrote: > in this case a 502 is emitted by ats-backend cause it is... [17:43:56] (EdgeTrafficDrop) firing: 46% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org [17:47:50] 10Traffic: upstream connect error or disconnect/reset before headers. reset reason: overflow - https://phabricator.wikimedia.org/T303305 (10AlexisJazz) [17:48:31] 10Traffic: upstream connect error or disconnect/reset before headers. reset reason: overflow - https://phabricator.wikimedia.org/T303305 (10AlexisJazz) [17:51:18] 10Traffic: upstream connect error or disconnect/reset before headers. reset reason: overflow - https://phabricator.wikimedia.org/T303305 (10AlexisJazz) [17:53:56] (EdgeTrafficDrop) resolved: 59% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org [18:18:36] 10Traffic: upstream connect error or disconnect/reset before headers. reset reason: overflow - https://phabricator.wikimedia.org/T303305 (10AlexisJazz) https://test.wikipedia.org/wiki/User:Alexis_Jazz/sandbox just failed to load. Blank page. No response headers. Again only for a few seconds. [18:20:56] (EdgeTrafficDrop) firing: 19% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org [18:30:56] (EdgeTrafficDrop) resolved: 60% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org [19:10:56] (EdgeTrafficDrop) firing: 67% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org [19:15:56] (EdgeTrafficDrop) resolved: 66% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org [19:40:11] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: ganeti4002 dimm error - https://phabricator.wikimedia.org/T303318 (10RobH) [20:17:01] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, 10User-jbond: varnish filtering: should we automatically update public_cloud_nets - https://phabricator.wikimedia.org/T270391 (10jbond) brandon also just pointed me to `git grep netmapper` and https://gerrit.wikimedia.org/g/operations/softw... [20:29:57] (EdgeTrafficDrop) firing: 51% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [20:34:56] (EdgeTrafficDrop) resolved: 58% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org