[06:39:44] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Add Envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11752448 (10ABran-WMF) [06:41:23] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Add Envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11752450 (10ABran-WMF) 05In progress→03Resolved >>! In T420909#11747537, @ABran-WMF wrote: >>>! In T420909#11742909, @ABra... [08:21:37] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: ATS: align ATS and Gerrit Apache timeouts to reenable connection re-use - https://phabricator.wikimedia.org/T417998#11752564 (10hashar) The connection-reuse was enabled as part of introducing Envoy between... [09:18:45] 06Traffic, 13Patch-For-Review: Refresh trafficserver_backend_requests_seconds histogram - https://phabricator.wikimedia.org/T411584#11752683 (10SLyngshede-WMF) 05In progress→03Resolved [14:11:07] 10netops, 06Infrastructure-Foundations: mr1-eqiad: move from OSPF to BGP - https://phabricator.wikimedia.org/T421238#11753882 (10Papaul) @ayounsi please see below for the BGP config to setup BGP and remove OSFP between the mr router and the core routers. I will send out gerrit patch later today and merge it wh... [15:01:56] hey folks, if someone has a moment to review https://gerrit.wikimedia.org/r/c/operations/dns/+/1261464, i'd appreciate it [15:03:35] thanks! [15:27:52] 06Traffic: Upgrade HAProxy to version 3.2 - https://phabricator.wikimedia.org/T421402 (10Fabfur) 03NEW [15:32:43] > Improvements to the Runtime API and Prometheus exporter make it easier to monitor your load balancers and inspect traffic 👀 [15:32:52] `Following this commit, we can now upgrade HAProxy on all cache hosts to version 3.2` [15:32:56] you're missing something there fabfur :P [15:33:24] you can pick whatever commit you want! :D [15:33:28] tnx, fixing [15:35:38] 06Traffic: Upgrade HAProxy to version 3.2 - https://phabricator.wikimedia.org/T421402#11754530 (10Fabfur) [16:42:33] 06Traffic, 13Patch-For-Review: Upgrade HAProxy to version 3.2 - https://phabricator.wikimedia.org/T421402#11754859 (10Vgutierrez) p:05Triage→03Medium [16:42:55] fabfur: ^^ don't leave tasks as untriaged plz [16:43:13] ack [16:43:29] or in backlog if you're working on them :) [18:42:30] 06Traffic, 06DC-Ops, 10ops-eqiad: Revert lvs1017 Mellanox NIC to Broadcom - https://phabricator.wikimedia.org/T421421#11755335 (10BCornwall) [18:43:05] 06Traffic, 06DC-Ops, 10ops-eqiad: Revert lvs1017 Mellanox NIC to Broadcom - https://phabricator.wikimedia.org/T421421#11755340 (10BCornwall) [18:43:08] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#11755341 (10BCornwall) [18:53:23] 06Traffic, 06DC-Ops, 10ops-eqiad: Revert lvs1017 Mellanox NIC to Broadcom - https://phabricator.wikimedia.org/T421421#11755363 (10BCornwall) [18:53:49] 06Traffic, 06DC-Ops, 10ops-eqiad: Revert lvs1017 Mellanox NIC to Broadcom - https://phabricator.wikimedia.org/T421421#11755365 (10BCornwall) [18:53:55] FIRING: [4x] SystemdUnitFailed: nic-saturation-exporter.service on dns5003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:58:55] FIRING: [5x] SystemdUnitFailed: nic-saturation-exporter.service on dns5003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:23:55] FIRING: [5x] SystemdUnitFailed: nic-saturation-exporter.service on dns5003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:51:40] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team: gerrit: Add Envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11756053 (10hashar) [20:53:13] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team: gerrit: Add Envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11756057 (10DLynch) 05Resolved→03Open Reopening since the merged issue is actively ongoing. [21:11:23] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team: gerrit: Add Envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11756101 (10hashar) We had reports of 502 issues on T420865 (which I have marked as a dupe of this one). After Gerrit was moved to Envoy I did notice A... [21:19:58] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team: gerrit: Add Envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11756114 (10hashar) Envoy configuration for Gerrit is in Puppet at `hieradata/role/common/gerrit.yaml`. It has configuration bits for ATS/CDN: ` lang=y... [22:44:38] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team: gerrit: Add Envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11756344 (10Dzahn) Looking directly at the file system, to bypass any possible issues with Hiera or Puppet, I see this: envoy: ` clusters.d/00-cluster... [23:00:23] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team: gerrit: Add Envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11756356 (10Dzahn) If that hypothesis was true the issues started with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1241048/2/modules/gerrit/tem... [23:22:29] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team: gerrit: Add Envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11756424 (10hashar) @Dzahn the Jetty timeout is independent to the problem. It is for the {nav Apache > mod_proxy > Jetty} chain. We had that issue seve... [23:24:10] FIRING: SystemdUnitFailed: wmf_auto_restart_nic-saturation-exporter.service on lvs3008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed