[07:20:13] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [07:31:06] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [08:44:00] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) [08:49:44] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10MoritzMuehlenhoff) [09:57:55] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10akosiaris) >>! In T327925#8587186, @Marostegui wrote: >>>! In T327925#8587104, @Joe wrote: >> I would suggest that instead of handling individual syst... [10:11:35] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) Cool! I am going to repool the hosts then :) [10:12:36] volans: quick q if you have a moment [10:12:52] (and btw I am only half-working today as it's a bank holiday, but I'm on call so I'm around) [10:15:11] I changed the status of new (racked but not yet live) switch cloudsw1-b1-codfw from "planned" to "staged" on Friday [10:15:42] Intent was to make my local Homer configure it (I really should have changed my Homer conf to work with 'Planned' state devices instead) [10:15:59] topranks: I'm oncall too :D [10:16:02] The change from "planned" to "staged" resulted in the DNS cookbook wanting to remove the mgmt IP for it from DNS? [10:16:27] interesting... and was there with planned? [10:16:32] volans: yep on call buddies :) [10:16:56] yes, I added the MGMT IPs earlier in the day (didn't change status, it was already at planned), and the dns cookbook added the entries [10:17:42] The change from 'planned' to 'staged' also seemed to cause the dns cookbook to trigger sre.puppet.sync-netbox-hiera cookbook [10:18:05] the dns cookbook always runs the hiera one too [10:18:09] to keep things in sync [10:18:11] Ah ok [10:18:18] so that's not a prob, let me check the status [10:18:23] I let that go through, the info was all correct [10:18:45] So anyway, I didn't modify anything, I allowed DNS cookbook remove the entries for it [10:19:15] was late Friday so figured best to leave it alone [10:19:24] yeah but that's weird, trying to get from the code why [10:19:29] we have NETBOX_DEVICE_STATUSES = ('active', 'planned', 'failed', 'inventory') [10:19:50] Ok well that explains it, given I changed from 'planned' to 'staged' [10:20:01] ahhhh yes [10:20:09] Tbh it's better at 'planned' stage, that reflects its current status better to me [10:20:13] T320696 [10:20:14] T320696: Reduce the count of Netbox devices with incorrect status - https://phabricator.wikimedia.org/T320696 [10:20:18] we're not using staged anymore [10:20:23] (being a switch it doesn't exactly mirror our server lifecycle) [10:20:32] but yeah that was mostly meant for servers [10:20:57] and I didn't make the script treat non-server differently [10:20:58] ok yep. My issue was I only had "active" and "staged" configured in my Homer config.yaml file [10:21:14] so I was changing it to get Homer to connect to it. But I've now added "planned" there, which is what I should have done [10:21:27] I think we can amend the script to include staged non-servers [10:21:34] cool.... so I'm guessing if I change it back to planned now and re-run DNS cookbook it will be ok [10:21:49] yes [10:22:00] sorry about that, if you want to open a task for it to not forget [10:22:05] we can improve that [10:22:08] I'm not sure we (netops) need to have a difference between planned/staged, so probably we can do the same as the servers [10:22:11] and not use staged [10:22:26] oh that would simplify things [10:22:30] I can open a task, but I'm not sure there is really something we want to change? [10:22:36] feel free to discuss that with arzh.el and we can decide what to do next [10:22:45] based on the outcome of your decision [10:22:54] I'll talk to Arzhel, but I think for network devices ('active', 'planned', 'failed', 'inventory') is enough [10:23:08] ok cool, thanks for the info! [10:23:48] ack thanks [10:24:19] sorry for the trouble on friday [10:24:43] ah no worries it wasn't a big deal [10:32:36] I think we can do without staged for now, and we will quickly see if it becomes a limitation when putting switches in prod [10:33:01] +1 for me [10:51:50] FYI I've created the pad for today's team meeting [11:26:04] 10SRE-tools, 10SRE, 10Spicerack, 10serviceops, 10Datacenter-Switchover: Expose hosts from MysqlLegacyRemoteHosts in spicerack - https://phabricator.wikimedia.org/T328911 (10Clement_Goubert) p:05Triage→03Low [11:30:06] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: decomission puppetmaster[12]00[12] and replace them with puppetmaster[12]00[45] - https://phabricator.wikimedia.org/T314136 (10jbond) Im going to add the puppetmasters[12]002 back into services. No that puppetserver 7 is out it would be nice to bui... [12:47:40] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10dcaro) [12:51:48] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) I am repooling all the databases since we are going to fully depool codfw for reads. [13:47:37] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) So currently we can't take down all the osds on rack C8 (14), as we don't have enough space to allocate their data on... [14:09:53] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ssingh) [14:39:46] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10MatthewVernon) If we're "just" depooling codfw it's worth noting we will still need to depool the affected ms-fe* nodes (since mw always tries to writ... [16:45:54] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Spicerack, and 2 others: Expose hosts from MysqlLegacyRemoteHosts in spicerack - https://phabricator.wikimedia.org/T328911 (10Clement_Goubert) [17:22:01] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10jbond) [22:51:57] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=e0e96453-af13-467f-a75e-ebd1c4122a32) set by bking@cumin2002 for 1 day, 0:00:00 on 13...