[00:11:18] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10RobH) [00:19:18] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10RobH) [00:22:21] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host cp4045.ulsfo.wmnet with OS bullseye [00:31:12] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10RobH) cp4045 failing to pxe boot. it could be firmware issue, as the NIC came with 6.x firmware. I'll have to mess with rolling it back tomorrow (Friday) ` PXELI... [00:31:27] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host cp4045.ulsfo.wmnet with OS bullseye executed with errors: - cp4045 (**F... [08:02:56] (HAProxyEdgeTrafficDrop) firing: 67% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [08:07:56] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [08:45:23] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10ayounsi) @Jclark-ctr Awesome thanks! We need to schedule a window to do the plugging/unplugging/reconfiguring. Would next Tu... [10:46:55] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqsin: cp5001 memory errors on DIMM A2 - https://phabricator.wikimedia.org/T314256 (10MoritzMuehlenhoff) Traffic folks, can be please go ahead and fully decom cp5001, then? Right now this is in a weird limbo state between debmonitor/puppetdb/Netbox. [13:03:00] 10Traffic, 10SRE, 10decommission-hardware, 10Patch-For-Review: decommission cp4021 &n cp4027 - https://phabricator.wikimedia.org/T318963 (10BBlack) [13:03:20] 10Traffic, 10SRE, 10decommission-hardware, 10Patch-For-Review: decommission cp4021 &n cp4027 - https://phabricator.wikimedia.org/T318963 (10BBlack) a:05BBlack→03RobH >>! In T318963#8274300, @RobH wrote: > Brandon, > > Both of these hosts have had the decom script run, but they still have references in... [13:05:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10Jclark-ctr) @ayounsi is there a time window you prefer? I can be available 1pm UTC time I am available any day. [15:34:23] 10Traffic, 10SRE, 10decommission-hardware: decommission cp4021 &n cp4027 - https://phabricator.wikimedia.org/T318963 (10RobH) 05Open→03Resolved [15:34:25] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10RobH) [15:44:46] 10netops, 10Infrastructure-Foundations, 10SRE: Q4: esams atlas anchor - https://phabricator.wikimedia.org/T307021 (10RobH) [15:45:25] 10netops, 10Infrastructure-Foundations, 10SRE: Q4: esams atlas anchor - https://phabricator.wikimedia.org/T307021 (10RobH) [16:54:32] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bblack@cumin2002 for host cp4045.ulsfo.wmnet with OS bullseye [17:24:55] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bblack@cumin2002 for host cp4045.ulsfo.wmnet with OS bullseye executed with errors: - cp4045 (**FAIL**) - Removed f... [17:25:00] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10RobH) cp4045 firmware inventory: bios is newest 1.6.5 10G nic is 22.00.07.60 , downgrading to 21.85.21.92 idrac is 5.10.30.00, cap at this and won't upgrade to 6.x which breaks http... [17:26:16] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10RobH) [17:43:50] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host cp4045.ulsfo.wmnet with OS bullseye [18:01:24] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host cp4045.ulsfo.wmnet with OS bullseye executed with errors: - cp4045 (**FAIL**) - Removed fro... [18:08:58] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host cp4045.ulsfo.wmnet with OS bullseye [18:19:46] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: add hbs330 support to installer - https://phabricator.wikimedia.org/T319067 (10RobH) [18:22:43] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10RobH) [18:22:50] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10RobH) {F35541613} The last time I had an issue with driver support in the installer, I recall @MoritzMuehlenhoff being the person to help me out. Moritz is this still the case, and are... [18:23:26] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10RobH) [18:23:38] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10RobH) a:05RobH→03MoritzMuehlenhoff [18:30:19] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:rack/setup/install cp40[37-52] - https://phabricator.wikimedia.org/T317244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host cp4045.ulsfo.wmnet with OS bullseye executed with errors: - cp4045 (**FAIL**) - Removed fro... [19:05:56] (HAProxyEdgeTrafficDrop) firing: 53% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [19:10:56] (HAProxyEdgeTrafficDrop) resolved: 58% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [19:37:54] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10BBlack) I did a little digging from the `install_console` shell on this host. lspci output for this adapter is: ` ~ # lspci -v -s 65:00.0 -nn 65:00.0 Ser... [20:59:10] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: ulsfo refresh scheduling - https://phabricator.wikimedia.org/T317249 (10RobH) Update: cp4037 is racked, but I had to steal its optic for T280202, since its cp4021 was busted anyhow. cp4045 is racked and accessible, but we've run into an installer issue on its insta... [21:11:16] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10Peachey88)