[02:11:22] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Dzahn)
[07:47:01] <wikibugs>	 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, 10Kubernetes: etcd cluster reimage strategies to use with the K8s upgrade cookbook - https://phabricator.wikimedia.org/T330060 (10elukey) @JMeybohm did you have to set the cluster's status to `new` by any c...
[11:01:06] <claime>	 Ok I was doing a last check of our appserver clusters
[11:01:24] <claime>	 4 less parsoid hosts in codfw than in eqiad
[11:01:32] <claime>	 10 less appservers
[11:01:40] <claime>	 10 more jobrunners
[11:01:44] <claime>	 10 more videoscalers
[11:14:04] <moritzm>	 not surprising, there's 30ish servers in mw2* in role::insetup::serviceops
[11:15:20] <_joe_>	 10 more videoscalers?
[11:15:27] <_joe_>	 ok that doesn't sound right
[11:15:58] <_joe_>	 and given those 30 servers are replacements, that doesn't really change things in terms of balance
[11:16:10] <claime>	 This is from for cluster in parsoid appserver api_appserver jobrunner videoscaler; do for dc in eqiad codfw; do echo $cluster $dc; sudo confctl select "dc=$dc,cluster=$cluster" get | wc -l; done; done
[11:16:20] <claime>	 So those insetup would not be counted
[11:17:08] <_joe_>	 claime: uhm something is fishy
[11:17:18] <_joe_>	 I count 5 more jobrunners as far as servers go
[11:17:33] <claime>	 Let me check one by one
[11:17:49] <_joe_>	 and 7 less appservers
[11:18:04] <claime>	 Ah wait
[11:18:08] <claime>	 They have 2 services
[11:18:50] <_joe_>	 yes
[11:18:52] <_joe_>	 canary :)
[11:18:55] <_joe_>	 I was about to say
[11:19:04] <claime>	 Also nginx/apache2 for videoscaler
[11:20:14] <claime>	 With a split on hostname | sort | uniq, I have +5 jobrunners, +5 videoscalers, +2 api_appservers, -7 appservers, -4 parsoid
[11:26:53] <_joe_>	 the only thing slightly worrisome is the -4 parsoids imho
[11:26:59] <_joe_>	 but not really an issue
[11:34:18] <claime>	 we're at way less than 50% CPU and memory usage in eqiad, so we should be fine even losing 4 hosts (36 cores)
[11:35:27] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Clement_Goubert)
[12:00:57] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Clement_Goubert)
[12:19:10] <hnowlan>	 Would I be safe to do a thumbor redeploy in k8s? Given that we reduced the number of replicas previously etc 
[13:11:51] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Clement_Goubert)
[13:13:24] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Clement_Goubert)
[13:34:11] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Clement_Goubert)
[13:57:26] <wikibugs>	 10serviceops, 10SRE, 10CommRel-Specialists-Support (Jan-Mar-2023), 10Datacenter-Switchover: CommRel support for March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T328287 (10Trizek-WMF)
[13:59:45] <jayme>	 hnowlan: the cluster is back to full capacity, so go ahead
[14:08:59] <hnowlan>	 jayme: cool, thanks! 
[14:17:49] <wikibugs>	 10serviceops, 10SRE, 10CommRel-Specialists-Support (Jan-Mar-2023), 10Datacenter-Switchover: CommRel support for March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T328287 (10Trizek-WMF) It happend.   The next step, next week: debrief the process.
[14:26:37] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Clement_Goubert)
[15:10:21] <hashar>	 I found out the list of hosts for a dsh target (used by scap) can be populated automatically from Puppet DB.  That got first introduced for the k8s workers in https://gerrit.wikimedia.org/r/c/operations/puppet/+/859466
[15:11:01] <hashar>	 I have proposed a series of change to rely on Puppet DB queries instead of a manually maintained list of host. That would slightly simplify the hieradata copy pasting and ensure the list of targets is consistent
[15:11:13] <hashar>	 no rush, it is merely an improvement =)
[15:38:41] <wikibugs>	 10serviceops, 10Citoid: citoid having stability issues - https://phabricator.wikimedia.org/T330768 (10JMeybohm)
[17:26:05] <wikibugs>	 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10lbowmaker)
[22:51:00] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover: Post March 2023 Datacenter Switchover Tasks - https://phabricator.wikimedia.org/T328907 (10Dzahn) After the switch of the apt servers we are getting alerting about bad systemd status on apt1001.   ` <+icinga-wm> PROBLEM - Check systemd state...