[08:48:08] 10Traffic, 10SRE: varnish-frontend-fetcherr: Assert error in vslc_vtx_next, 100% CPU usage - https://phabricator.wikimedia.org/T253093 (10Aklapper) a:05ema→03None Removing inactive task assignee as this task got reopened after 3 years [09:33:50] 10Traffic, 10Infrastructure-Foundations, 10SRE: Manual upload of iDRAC EXE results in broken web interface - https://phabricator.wikimedia.org/T334146 (10jbond) @BCornwall i suspect this is T322419#8370970. the fix would be to run `racadm set IDRAC.WeServer.HostHeaderCheck 0` [09:48:44] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [09:53:44] (VarnishHighThreadCount) firing: (16) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [09:58:44] (VarnishHighThreadCount) firing: (16) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [10:03:44] (VarnishHighThreadCount) firing: (16) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [10:08:44] (VarnishHighThreadCount) firing: (17) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [10:13:44] (VarnishHighThreadCount) firing: (19) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [10:28:44] (VarnishHighThreadCount) firing: (13) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [10:33:44] (VarnishHighThreadCount) firing: (19) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [10:53:44] (VarnishHighThreadCount) resolved: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:24:16] 10Traffic, 10netops, 10DBA, 10Data-Engineering, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10MPhamWMF) [15:24:38] 10Traffic, 10Infrastructure-Foundations, 10SRE: Manual upload of iDRAC EXE results in broken web interface - https://phabricator.wikimedia.org/T334146 (10BCornwall) I'm aware of that workaround, but I'm not sure if that's related: Flashing the exe via the web uploader results in the broken web interface but... [15:25:53] 10Traffic, 10Infrastructure-Foundations: Manual upload of iDRAC EXE results in broken web interface - https://phabricator.wikimedia.org/T334146 (10BCornwall) [15:30:53] 10Traffic, 10SRE: Upgrade lvs1013-1016 firmware - https://phabricator.wikimedia.org/T334259 (10BCornwall) [15:31:53] 10Traffic, 10SRE: Upgrade lvs1013-1016 firmware - https://phabricator.wikimedia.org/T334259 (10BCornwall) [15:37:25] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [16:04:19] 10Traffic, 10Infrastructure-Foundations: Manual upload of iDRAC EXE results in broken web interface - https://phabricator.wikimedia.org/T334146 (10jbond) >>! In T334146#8768504, @BCornwall wrote: > I'm aware of that workaround, but I'm not sure if that's related: Flashing the exe via the web uploader results i... [16:05:31] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs6001.drmrs.wmnet with OS bullseye [16:07:23] 10Traffic, 10Infrastructure-Foundations: Manual upload of iDRAC EXE results in broken web interface - https://phabricator.wikimedia.org/T334146 (10BCornwall) Cool. That sounds correct to me. Since all of our servers have been upgraded there's not a whole lot of testing to be had without intentionally setting t... [16:07:33] 10Traffic, 10Infrastructure-Foundations: Manual upload of iDRAC EXE results in broken web interface - https://phabricator.wikimedia.org/T334146 (10BCornwall) 05Open→03Declined [16:48:32] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs6001.drmrs.wmnet with OS bullseye completed: - lvs6001 (**PASS**) - Downtimed on Icinga/Aler... [16:49:42] (SystemdUnitFailed) firing: varnishmtail@internal.service Failed on cp6015:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:54:42] (SystemdUnitFailed) firing: (2) varnishmtail@default.service Failed on cp6015:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:55:32] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [17:01:22] 10Traffic, 10SRE: Upgrade lvs1013-1016 firmware - https://phabricator.wikimedia.org/T334259 (10BCornwall) [17:04:42] (SystemdUnitFailed) resolved: (2) varnishmtail@default.service Failed on cp6015:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:25:04] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [17:41:23] 10Traffic, 10Infrastructure-Foundations, 10SRE: Receive network latency reports from the browsers - https://phabricator.wikimedia.org/T334417 (10JameelKaisar) [18:22:20] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs6002.drmrs.wmnet with OS bullseye [18:31:20] 10Traffic, 10SRE: Upgrade lvs1013-1016 firmware - https://phabricator.wikimedia.org/T334259 (10BCornwall) [19:09:07] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs6002.drmrs.wmnet with OS bullseye completed: - lvs6002 (**WARN**) - Downtimed on Icinga/Aler... [19:20:49] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [19:33:30] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [20:15:18] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs3005.esams.wmnet with OS bullseye [20:57:22] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team, 10Epic: CloudVPS: network architecture - https://phabricator.wikimedia.org/T209460 (10Jhancock.wm) [21:14:23] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs3005.esams.wmnet with OS bullseye executed with errors: - lvs3005 (**FAIL**) - Downtimed on... [21:14:39] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs3005.esams.wmnet with OS bullseye [21:40:26] 10Traffic, 10SRE: Upgrade lvs1013-1016 firmware - https://phabricator.wikimedia.org/T334259 (10BCornwall) [21:46:12] 10Traffic, 10SRE: Upgrade lvs1013-1016 firmware - https://phabricator.wikimedia.org/T334259 (10BCornwall) @wiki_willy I'd love your advice on upgrading lvs1013-1016 NICs! These servers are r430s. I've been able to upgrade the [[ https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=rh05p&osco... [21:53:25] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs3005.esams.wmnet with OS bullseye completed: - lvs3005 (**WARN**) - Downtimed on Icinga/Aler... [22:13:17] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [22:53:45] (HAProxyRestarted) firing: HAProxy server restarted on cp2033:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=codfw%20prometheus/ops&var-instance=cp2033&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [22:58:56] 10Traffic, 10SRE: Upgrade lvs1013-1016 firmware - https://phabricator.wikimedia.org/T334259 (10wiki_willy) Hi @BCornwall - thanks for reaching out. I'm going to add @Papaul to the the thread, for any input/suggestions that he might have on upgrading the firmware on these NICs >>! In T334259#8769600, @BCornwa...