[00:04:20] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5022.eqsin.wmnet with OS bullseye completed: - cp5022 (**PASS**) - Removed from Puppet and PuppetDB if present -... [00:06:43] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [00:44:50] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5023.eqsin.wmnet with OS bullseye [01:05:15] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) Using a slight modification of @jbond's script in T328593, the list of cp nodes in eqiad with the oudated firmware (`3.15.17.15`) is basically all the cp nodes in eqiad: ` cp1076.eqiad.wmnet cp1077.eqiad... [01:07:54] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp1075.eqiad.wmnet with OS bullseye [01:49:26] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp1075.eqiad.wmnet with OS bullseye completed: - cp1075 (**PASS**) - Downtimed on Icinga/Alertmanager - Disabled Pu... [01:50:36] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) [01:55:30] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5023.eqsin.wmnet with OS bullseye completed: - cp5023 (**PASS**) - Downtimed on Icinga/Alertmanager - Disabled Pu... [01:56:35] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5024.eqsin.wmnet with OS bullseye [01:56:46] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [02:14:15] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5024.eqsin.wmnet with OS bullseye executed with errors: - cp5024 (**FAIL**) - Downtimed on Icinga/Alertmanager -... [02:14:41] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5024.eqsin.wmnet with OS bullseye [02:27:37] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) Steps to follow for manual upgrade of the iDRAC firmwares for the cp hosts in eqiad for us and in case someone else stumbles on this issue. The TL;DR is that we need to manually update the iDRAC firmware... [03:22:38] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5024.eqsin.wmnet with OS bullseye completed: - cp5024 (**PASS**) - Removed from Puppet and PuppetDB if present -... [04:01:14] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [10:48:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10serviceops: Optimize k8s same row traffic flows - https://phabricator.wikimedia.org/T328523 (10cmooney) > BGP is smart about it (see '"first party" NEXT_HOP' in section 5.1.3.2 of the RFC), so it should just work on the router side. TIL didn't realise EBGP... [11:04:35] (PurgedHighEventLag) firing: (3) High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [11:09:35] (PurgedHighEventLag) firing: (17) High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [11:14:35] (PurgedHighEventLag) firing: (22) High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [11:19:35] (PurgedHighEventLag) firing: (18) High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [11:24:35] (PurgedHighEventLag) resolved: (9) High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [11:49:45] 10Traffic, 10SRE, 10Patch-For-Review: Add DP cookie for pageview filtering - https://phabricator.wikimedia.org/T315676 (10Vgutierrez) Initial sanity checks confirms that the daily key generated on two different hosts is the same: ` vgutierrez@cumin1001:~$ sudo -i cumin 'cp[6015,6016].*' 'sha512sum /etc/varni... [12:23:26] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp1076.eqiad.wmnet with OS bullseye [13:03:35] (PurgedHighEventLag) firing: High event process lag with purged on cp5021:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5021 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:08:35] (PurgedHighEventLag) firing: (8) High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:09:42] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp1076.eqiad.wmnet with OS bullseye completed: - cp1076 (**PASS**) - Downtimed on Icinga/Alertmanager - Disabled Pu... [13:13:35] (PurgedHighEventLag) firing: (12) High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:18:35] (PurgedHighEventLag) firing: (19) High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:23:35] (PurgedHighEventLag) firing: (18) High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:28:35] (PurgedHighEventLag) resolved: (12) High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:35:01] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) [14:37:39] 10Traffic, 10DNS, 10Infrastructure-Foundations, 10Mail, and 3 others: Add SPF records for gitlab.wikimedia.org - https://phabricator.wikimedia.org/T328642 (10eoghan) p:05Triage→03Medium a:03eoghan [14:43:56] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10cmooney) >>! In T316544#8575464, @Andrew wrote: > We have a ton of rebalancing to do for each of these switches. The C8 deadl... [15:07:03] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10MoritzMuehlenhoff) >>! In T321309#8581111, @ssingh wrote: > Steps to follow for manual upgrade of the iDRAC firmwares for the cp hosts in eqiad for us and in case someone else stumbles on th... [15:20:22] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) >>! In T321309#8582443, @MoritzMuehlenhoff wrote: >>>! In T321309#8581111, @ssingh wrote: >> Steps to follow for manual upgrade of the iDRAC firmwares for the cp hosts in eqiad for u... [20:28:35] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp1077.eqiad.wmnet with OS bullseye [20:29:07] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp1078.eqiad.wmnet with OS bullseye [20:59:17] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp1078.eqiad.wmnet with OS bullseye executed with errors: - cp1078 (**FAIL**) - Downtimed on Ic... [20:59:30] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp1078.eqiad.wmnet with OS bullseye [21:22:01] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp1077.eqiad.wmnet with OS bullseye completed: - cp1077 (**PASS**) - Downtimed on Icinga/Alertm... [21:22:44] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [21:22:58] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp1079.eqiad.wmnet with OS bullseye [21:47:53] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp1078.eqiad.wmnet with OS bullseye completed: - cp1078 (**WARN**) - Removed from Puppet and Pu... [22:01:18] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [22:01:58] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp1080.eqiad.wmnet with OS bullseye [22:12:23] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp1079.eqiad.wmnet with OS bullseye completed: - cp1079 (**PASS**) - Downtimed on Icinga/Alertm... [22:16:18] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [22:58:31] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp1080.eqiad.wmnet with OS bullseye executed with errors: - cp1080 (**FAIL**) - Downtimed on Ic...