[10:27:18] <brouberol>	 btullis joal and myself were talking about the kubernetes upgrade procedure, and especially the "wipe etcd" part. I understand why it was done this way in the context of wikikube: starting fully fresh in cluster running in the depooled site. However, for dse-k8s-eqiad, we have 2 extra spicy topppings
[10:28:02] <brouberol>	 a) we rely on PVCs provisioning volumes in ceph. This means that deleting the volume will result in the ceph volume being deleted, which we'd like to avoid
[10:28:46] <brouberol>	 b) we don't have backup cluster of the same size, so while we can temporarily migrate small stateless apps such as superset, datahub, etc, to dse-k8s-codfw, we can't do it for our largest deployments, such as airflow 
[10:29:37] <brouberol>	 the b) is point is something that aligns with our SLOs (mostly the lack thereof) as airflow or other streaming consunmers will restart from "last checkpoint"
[10:29:55] <brouberol>	 however, I'd be keen in trying to _keep_ the etcd state, at least in the application namespaces
[10:30:51] <brouberol>	 I'm curious if any of you (pinging jayme because you know all of the YAML resources by heart) feels like it's a terrible idea
[10:31:21] <jayme>	 we had this raised a couple of times by Ben in the SIG...it might work
[10:31:54] <jayme>	 the k8s yaml is probebly not the biggest of problems since we're already linting in CI for 1.31 compatibility
[10:32:19] <jayme>	 the bigger problem is the version skew - since that's not officially supported
[10:32:58] <brouberol>	 the version skew in what context, sorry?
[10:32:59] <jayme>	 you might get away with upgrading control planes first and hope for nothing bad to happen until you have upgraded the workers
[10:33:28] <jayme>	 version skew between the old and the new k8s version
[10:34:07] <brouberol>	 so, to be clear, I'm not talking about a rolling upgrade. I'm more than happy to shutdown all kubelets and worker nodes
[10:34:30] <brouberol>	 but I'd like to keep the data in etcd for application namespaces if possible  
[10:44:13] <jayme>	 ah, I see
[10:44:21] <jayme>	 yeah...might work 🤷
[10:45:10] <brouberol>	 going at this question from another angle: are there some namespaces that we *have* to cleanup?
[10:45:21] <brouberol>	 (in etcd, I mean)
[10:50:34] <jayme>	 idk for sure, sorry. Never tested that in any of the clusters. In theory it should be fine since that's what a rolling update would do as well.
[10:51:35] <jayme>	 might be that some migrations will be applied to etcd data but as far as I know there is no "you have to do step by step upgrades" rule for k8s
[10:56:29] <elukey>	 is it worth to test the rolling upgrade procedure at this point? Feels way cleaner and possibly more future-proof
[11:11:36] <jayme>	 for a "supported" rolling upgrade the version skew is too large. But we will be working on moving to rolling upgrade for the next k8s version
[11:12:22] <jayme>	 that ofc. will probably require a more reactive upgrade cycle from all cluster maintainers in the future...since we would like to still stick with "two supported versions only" in our infra
[11:21:58] <btullis>	 OK, this all sounds pretty hopeful. By the sounds of it, we can probably aim for a sort of 'stop the world' upgrade for dse-k8s-eqiad, as long as we are prepared for the nuclear option of wiping etcd, if we need to.
[11:24:51] <elukey>	 jayme: do you think that the jump 1.23 -> 1.31 wouldn't be even feasible to test for rolling? It seems safer than keeping things in etcd.. We could always kind or similar for a preliminary test
[11:29:07] <jayme>	 I'm just saying the jump is not supported for a rolling upgrade and therefore makes it a suboptimal test. But I think there is no difference in rolling upgrade vs. shutting down all k8s control planes and bringing up a new one with 1.31 with the 1.23 etcd backend. What I would try to avoid is having control planes on 1.23 and 1.31 in parallel and having workers on 1.23 running against a 1.31 control plane
[11:29:39] <jayme>	 the latter will probably work, but is unsupported as well (and worker upgrade is rather quick, so no point in leaving them on 1.23)
[11:30:36] <jayme>	 What I would probably try to do first is dump the etcd dataset to a test system and configure a 1.31 control plane to use it to see if it fails right away
[11:31:33] <jayme>	 if not, try to interact with the api a bit changing standard objects like deployments etc. but especially CRDs
[11:31:57] <jayme>	 if that does not raise any issues it's most likely fine
[11:32:36] <jayme>	 if it does, you will probebly need to update some components first for the CRDs/manifests to be 1.31 compatible 
[11:33:26] <jayme>	 "storedVersion" is the magic keyword here I guess. Since the apiVersion you are sending an object in must not be the version in which it will be stored in etcd
[11:40:57] <elukey>	 okok makes sense :)
[11:41:45] <btullis>	 When you say 'dump the etcd dataset to a test system' -  Are you envisaging something like a 'kind' cluster, or something in the prod realm? Like more ganeti VMs for a stacked control plane/etcd node?
[11:47:37] <jayme>	 I left that open on purpose :D
[11:48:33] <jayme>	 I think setting something up manually should be good enough (so some kind cluster or really just our kubernetes-master package in wmcs )
[11:49:02] <jayme>	 I would refrain from copying the etcd dump out of our infra though since it holds a bunch of secrets
[11:52:36] <btullis>	 Ack, thanks.
[14:15:57] <claime>	 Rolling upgrades are only officialy supported from minor to minor right? You can't skip
[14:16:43] <elukey>	 I keep wondering why they still use minor versions for clear major version upgrades
[14:16:45] <elukey>	 anyway
[14:16:47] <elukey>	 </rant>
[14:27:37] <claime>	 elukey: Their minor isn't the usual minor for sure
[15:01:23] <jayme>	 claime: yes, "real" rolling upgrade (for the apiserver) are on supported just within one minor version
[15:02:17] <jayme>	 but kubelet and kube proxy can be 3 minor versions behind the apiserver
[15:02:51] <jayme>	 so we could (in theory) still aim for bigger version jumps with control plane downtime
[15:05:09] <jayme>	 what is also not supported is in place minor upgrade for the kubelet. So draining is required before that