[05:44:00] hey traffic. we had wcqs in production as of a few weeks ago, then moved it all the way back to `service_setup` because the service was generating noise (and isn't serving production traffic yet, to be clear). We'd like to start moving it back into prod. I've got the patch to move it into `lvs_setup` here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/742841 [05:46:15] the change will require a pybal restart. i've done it before so happy to do it myself if that's convenient. I'd probably do the restarts about 14-15 hours from now. I'll check in again tomorrow morning, but just wanted to mention in advance incase there's any work going on tomorrow that might make it not a good time for pybal restarts [06:49:56] (EdgeTrafficDrop) firing: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [06:54:56] (EdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [09:26:58] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: ulsfo: (2) mx80s to become temp cr[34]-drmrs - https://phabricator.wikimedia.org/T295819 (10ayounsi) 05Open→03Stalled Thanks I had a quick look and they both are healthy, all 8 interfaces show up as well. I'll wait that we make pr... [09:44:43] ryankemper: ack [12:07:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): cloud: decide on general idea for having cloud-dedicated hardware provide service in the cloud realm & the internet - https://phabricator.wikimedia.org/T296411 (10aborrero) create and shared a spreadsheet trying to capture/compa... [12:27:11] 10netops, 10Infrastructure-Foundations: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10cmooney) p:05Triage→03Low [13:01:35] 10netops, 10Infrastructure-Foundations, 10SRE: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10cmooney) [13:07:44] 10netops, 10Infrastructure-Foundations, 10SRE: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10cmooney) [13:21:43] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez) [13:46:18] ryankemper: I don't think we have any planned lvs work today, should be ok! [13:55:22] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10Volans) @cmooney we have the possibility to add custom facts to puppetdb, we already have a bunch of them, or modify existing one... [17:40:47] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): cloud: decide on general idea for having cloud-dedicated hardware provide service in the cloud realm & the internet - https://phabricator.wikimedia.org/T296411 (10aborrero) We had a meeting today, rough summary: The idea is rou... [17:58:10] 10netops, 10Infrastructure-Foundations, 10SRE: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10cmooney) @volans thanks for the info. Sounds like we have a way forward if we want to do this. And certainly if we expand our use of bridges, sub-int... [17:58:20] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp1089.eqiad.wmnet with OS buster [18:04:56] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp1089:9331 is unreachable - https://alerts.wikimedia.org [18:08:16] (host being reimaged, nothing to worry about) [18:34:56] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp1089:9331 is unreachable - https://alerts.wikimedia.org [18:41:11] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp1089.eqiad.wmnet with OS buster c... [18:41:17] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez) [19:05:50] 10Traffic, 10SRE: HAProxy fails to reuse connections under some conditions - https://phabricator.wikimedia.org/T296874 (10Vgutierrez) [23:39:10] Planning on rolling pybal restarts in 30-45 mins or so. Any objections?