[10:27:18] btullis joal and myself were talking about the kubernetes upgrade procedure, and especially the "wipe etcd" part. I understand why it was done this way in the context of wikikube: starting fully fresh in cluster running in the depooled site. However, for dse-k8s-eqiad, we have 2 extra spicy topppings [10:28:02] a) we rely on PVCs provisioning volumes in ceph. This means that deleting the volume will result in the ceph volume being deleted, which we'd like to avoid [10:28:46] b) we don't have backup cluster of the same size, so while we can temporarily migrate small stateless apps such as superset, datahub, etc, to dse-k8s-codfw, we can't do it for our largest deployments, such as airflow [10:29:37] the b) is point is something that aligns with our SLOs (mostly the lack thereof) as airflow or other streaming consunmers will restart from "last checkpoint" [10:29:55] however, I'd be keen in trying to _keep_ the etcd state, at least in the application namespaces [10:30:51] I'm curious if any of you (pinging jayme because you know all of the YAML resources by heart) feels like it's a terrible idea [10:31:21] we had this raised a couple of times by Ben in the SIG...it might work [10:31:54] the k8s yaml is probebly not the biggest of problems since we're already linting in CI for 1.31 compatibility [10:32:19] the bigger problem is the version skew - since that's not officially supported [10:32:58] the version skew in what context, sorry? [10:32:59] you might get away with upgrading control planes first and hope for nothing bad to happen until you have upgraded the workers [10:33:28] version skew between the old and the new k8s version [10:34:07] so, to be clear, I'm not talking about a rolling upgrade. I'm more than happy to shutdown all kubelets and worker nodes [10:34:30] but I'd like to keep the data in etcd for application namespaces if possible [10:44:13] ah, I see [10:44:21] yeah...might work 🤷 [10:45:10] going at this question from another angle: are there some namespaces that we *have* to cleanup? [10:45:21] (in etcd, I mean) [10:50:34] idk for sure, sorry. Never tested that in any of the clusters. In theory it should be fine since that's what a rolling update would do as well. [10:51:35] might be that some migrations will be applied to etcd data but as far as I know there is no "you have to do step by step upgrades" rule for k8s [10:56:29] is it worth to test the rolling upgrade procedure at this point? Feels way cleaner and possibly more future-proof [11:11:36] for a "supported" rolling upgrade the version skew is too large. But we will be working on moving to rolling upgrade for the next k8s version [11:12:22] that ofc. will probably require a more reactive upgrade cycle from all cluster maintainers in the future...since we would like to still stick with "two supported versions only" in our infra [11:21:58] OK, this all sounds pretty hopeful. By the sounds of it, we can probably aim for a sort of 'stop the world' upgrade for dse-k8s-eqiad, as long as we are prepared for the nuclear option of wiping etcd, if we need to. [11:24:51] jayme: do you think that the jump 1.23 -> 1.31 wouldn't be even feasible to test for rolling? It seems safer than keeping things in etcd.. We could always kind or similar for a preliminary test [11:29:07] I'm just saying the jump is not supported for a rolling upgrade and therefore makes it a suboptimal test. But I think there is no difference in rolling upgrade vs. shutting down all k8s control planes and bringing up a new one with 1.31 with the 1.23 etcd backend. What I would try to avoid is having control planes on 1.23 and 1.31 in parallel and having workers on 1.23 running against a 1.31 control plane [11:29:39] the latter will probably work, but is unsupported as well (and worker upgrade is rather quick, so no point in leaving them on 1.23) [11:30:36] What I would probably try to do first is dump the etcd dataset to a test system and configure a 1.31 control plane to use it to see if it fails right away [11:31:33] if not, try to interact with the api a bit changing standard objects like deployments etc. but especially CRDs [11:31:57] if that does not raise any issues it's most likely fine [11:32:36] if it does, you will probebly need to update some components first for the CRDs/manifests to be 1.31 compatible [11:33:26] "storedVersion" is the magic keyword here I guess. Since the apiVersion you are sending an object in must not be the version in which it will be stored in etcd [11:40:57] okok makes sense :) [11:41:45] When you say 'dump the etcd dataset to a test system' - Are you envisaging something like a 'kind' cluster, or something in the prod realm? Like more ganeti VMs for a stacked control plane/etcd node? [11:47:37] I left that open on purpose :D [11:48:33] I think setting something up manually should be good enough (so some kind cluster or really just our kubernetes-master package in wmcs ) [11:49:02] I would refrain from copying the etcd dump out of our infra though since it holds a bunch of secrets [11:52:36] Ack, thanks. [14:15:57] Rolling upgrades are only officialy supported from minor to minor right? You can't skip [14:16:43] I keep wondering why they still use minor versions for clear major version upgrades [14:16:45] anyway [14:16:47] [14:27:37] elukey: Their minor isn't the usual minor for sure [15:01:23] claime: yes, "real" rolling upgrade (for the apiserver) are on supported just within one minor version [15:02:17] but kubelet and kube proxy can be 3 minor versions behind the apiserver [15:02:51] so we could (in theory) still aim for bigger version jumps with control plane downtime [15:05:09] what is also not supported is in place minor upgrade for the kubelet. So draining is required before that