[06:26:20] ryankemper: yeah agreed, it looks great to me! The best is to have service owner take care of the depool/repool so we're sure it's done correctly [06:30:18] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) [06:30:33] I updated the task description to make it more clear [06:43:12] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) [06:49:08] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) Script used to generate the servers lists: {P43345} [06:49:25] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [06:50:00] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 8 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10ayounsi) [06:50:17] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [06:50:28] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 8 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10ayounsi) [06:51:46] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [06:52:45] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [06:53:09] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) Adding Jaime for the backup related hosts [07:28:37] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [07:35:30] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [07:45:32] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [08:04:12] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [08:15:11] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10ayounsi) @Papaul could you rename (Netbox, label, console, etc) the switch cloudsw**1**-b1-codfw? For co... [08:21:57] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [08:22:57] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [08:23:47] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [08:37:29] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [08:38:31] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [08:40:20] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [08:44:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10ayounsi) @cmooney Thinking more about it... Your approach is great and careful and would suit well live... [08:44:41] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [08:49:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10ayounsi) [08:55:10] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) >>! In T327919#8560178, @ayounsi wrote: >> B connection is probably sufficient, this does mean... [08:57:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) [09:05:23] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [09:06:01] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [09:08:07] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10aborrero) LGTM! [09:24:01] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10ayounsi) Thanks for the summary! Some additional notes/thoughts: * public1-a/b-codfw host might be better grouped in a single rack per row, providing still redundancy (... [09:30:31] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Vgutierrez) [09:35:10] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [09:37:09] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [09:38:12] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10jcrespo) [09:39:12] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10jcrespo) [10:07:35] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10MoritzMuehlenhoff) We can't migrate the puppetdb2002 VM (it's being moved to baremetal, but that is unlikely completed by then), so we'll need to disable Puppe... [11:02:52] FYI, upgrading postgresql on netbox DB hosts, there should be no impact [11:35:48] 10netops, 10Infrastructure-Foundations: Decom flowspec1001 - https://phabricator.wikimedia.org/T328009 (10ayounsi) [11:48:13] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Decom flowspec1001 - https://phabricator.wikimedia.org/T328009 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by ayounsi@cumin1001 for hosts: `flowspec1001` - flowspec1001 (**PASS**) - Downtimed host on Icinga/Alertmanag... [12:52:25] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: sre.swift.roll-restart-reboot-proxies fails on thanos hosts, which lack nginx - https://phabricator.wikimedia.org/T327783 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This has been fixed by splitting the restart cookbooks i... [13:31:46] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) Adding Jaime for the backup hosts. [13:36:32] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [13:39:01] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [13:40:38] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [13:43:17] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [13:44:48] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [13:59:18] topranks: just hit an error with homer https://phabricator.wikimedia.org/P43407 is this something you are aware of/working on? [13:59:38] i sent a naive patch in https://gerrit.wikimedia.org/r/c/operations/homer/public/+/883947 but that may not be the correct fix [13:59:40] jbond: yes, bad timing sorry [14:00:15] topranks: no problem, i have a change to push to cr2-eqiad, its not urgent and is deployed everywhere else [14:00:28] if you could ping when your done or deplot it either works with me [14:00:32] ill abandon the Cr above [14:00:36] we don't need to fix it at that level - issue was interface in netbox (which is being removed) but I had removed IPs first and not interface [14:00:41] both are gone now [14:00:55] I'm just running Homer against cr2-eqiad to remove the interface from the config [14:01:03] ack thanks [14:01:13] 10SRE-tools, 10Infrastructure-Foundations: Cookbook for rack depool - https://phabricator.wikimedia.org/T327300 (10ayounsi) [14:01:24] jbond: were you running homer to apply ACL changes? [14:01:32] yes for confd [14:01:35] and wmcs [14:02:14] ok cool yes I see those diffs, pushed to cr2-eqiad now [14:02:47] cheers [14:08:58] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [14:43:00] topranks: i think im crossing over with you again [14:43:40] pushing a change to eqiad and see a new gr interface gr-3/3/0.1 in ospf do you want me to commit them [14:55:25] topranks: ?? [15:01:05] jbond: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/883959 should solve it, re-run puppet on cumin [15:01:25] to pick it up, then you shouldn't have the diff anymore [15:02:22] XioNoX: tnanks trying now [15:03:26] jbond: sorry for the confusion, thanks XioNoX for sorting it out [15:03:43] I just clicked "merge" :) [15:04:01] topranks: bo problem, XioNoX thank all good now [16:14:17] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [16:39:12] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10colewhite) [16:45:12] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10jcrespo) [16:52:50] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10herron) [17:14:57] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Eevans) [17:17:48] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Eevans) [17:23:25] 10netops, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Eevans) [18:16:52] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10herron) [18:17:29] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10herron) [20:40:37] 10netops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10RKemper)