[00:02:57] (HAProxyEdgeTrafficDrop) firing: 69% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:07:57] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [08:19:32] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-OnFire, and 2 others: asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10dcaro) [08:30:06] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqsin: cp5001 memory errors on DIMM A2 - https://phabricator.wikimedia.org/T314256 (10Vgutierrez) 05Open→03Resolved >>! In T314256#8275011, @MoritzMuehlenhoff wrote: > Traffic folks, can be please go ahead and fully decom cp5001, then? Right now this is in a weird limb... [08:51:40] 10Traffic, 10SRE, 10decommission-hardware, 10ops-eqsin: decommission cp5001.eqsin.wmnet - https://phabricator.wikimedia.org/T319166 (10Vgutierrez) a:03wiki_willy [08:52:04] 10Traffic, 10SRE, 10decommission-hardware, 10ops-eqsin: decommission cp5001.eqsin.wmnet - https://phabricator.wikimedia.org/T319166 (10Vgutierrez) [09:32:48] 10Traffic, 10SRE: Implement SLI measurement for ATS - https://phabricator.wikimedia.org/T316921 (10Vgutierrez) 05Open→03Resolved SLO dashboard available in https://grafana.wikimedia.org/d/slo-trafficserver-tmpl/trafficserver-slos-grizzly-template?orgId=1 [11:06:58] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 3 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10jbond) [11:53:16] 10netops, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) [11:57:15] 10netops, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) [11:58:31] 10netops, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) [12:03:36] 10netops, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) 05Open→03In progress p:05Triage→03Medium [13:39:38] (LVSHighCPU) firing: (8) The host lvs5002:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5002 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [13:44:38] (LVSHighCPU) resolved: (8) The host lvs5002:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5002 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [14:53:37] 10Traffic, 10SRE: CDN doesn't validate request-target - https://phabricator.wikimedia.org/T318676 (10Vgutierrez) [14:56:30] 10Traffic, 10SRE: CDN doesn't validate request-target - https://phabricator.wikimedia.org/T318676 (10Vgutierrez) T317660 has been fixed by the shipping of trafficserver 9.1.3-1wm2 including https://gerrit.wikimedia.org/r/c/operations/debs/trafficserver/+/834045 [15:02:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10ayounsi) Synced on IRC, we're aiming at Thursday 1pm UTC. [15:07:13] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade management routers and switches to Junos 21 - https://phabricator.wikimedia.org/T316529 (10Papaul) [17:01:35] 10Traffic, 10decommission-hardware, 10ops-ulsfo: decommission dns4001 - https://phabricator.wikimedia.org/T319215 (10RobH) [17:04:31] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Package and deploy ATS 9.1.3 - https://phabricator.wikimedia.org/T309651 (10ssingh) We are running ATS9 on all cp hosts in: codfw, ulsfo, drmrs, in addition to the existing hosts in eqiad, esams, eqsin, the site-wide deployment of which will... [17:13:08] 10Traffic, 10decommission-hardware, 10ops-ulsfo, 10Patch-For-Review: decommission dns4001 - https://phabricator.wikimedia.org/T319215 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `dns4001.wikimedia.org` - dns4001.wikimedia.org (**PASS**) - Downtimed host o... [17:33:26] 10Traffic, 10SRE, 10decommission-hardware, 10ops-ulsfo: decommission dns4001 - https://phabricator.wikimedia.org/T319215 (10RobH) [17:33:31] 10Traffic, 10SRE, 10decommission-hardware, 10ops-ulsfo: decommission dns4001 - https://phabricator.wikimedia.org/T319215 (10RobH) a:05RobH→03BBlack [17:43:06] 10Traffic, 10SRE, 10decommission-hardware, 10ops-eqsin: decommission cp5001.eqsin.wmnet - https://phabricator.wikimedia.org/T319166 (10wiki_willy) a:05wiki_willy→03RobH [18:35:06] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host dns4003.wikimedia.org with OS bullseye [18:35:34] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10BBlack) Further updates on this thread: 1. The installation attempts and debugging above were on **bullseye**, but our cp puppetization is actually still... [18:42:06] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host dns4003.wikimedia.org with OS bullseye executed with errors:... [18:45:57] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10BBlack) I see our buster actually has `linux-image-5.10.0-0.deb10.17-amd64` available in its repos. It may just be a matter of figuring out how to launch... [18:49:05] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host dns4003.wikimedia.org with OS bullseye [18:57:59] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10RobH) [19:12:39] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10BBlack) I've also found some other breadcrumbs. Runtime buster + 5.10 support is puppetized in `modules/profile/manifests/base/linux510.pp`. There's ins... [19:15:50] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host dns4003.wikimedia.org with OS bullseye executed with errors: - dns4003 (**FAIL**)... [20:47:29] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10Volans) The bits for the reimage cookbooks are trivial to do, Spicerack has already support for custom images, see the `media_type` argument to https://do... [21:18:51] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host dns4003.wikimedia.org with OS bullseye [21:42:09] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10RobH) dns4003 is getting stuck in the reimage at: ` 100.0% (1/1) success ratio (>= 100.0% threshold) for command: '/usr/local/sbin/...cludes -r commit'. 100.0% (1/1) succes... [21:44:03] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host dns4003.wikimedia.org with OS bullseye executed with errors: - dns4003 (**FAIL**)...