[14:48:13] Is it possible to use the memory QoS feature in our current kubernetes install? https://kubernetes.io/blog/2021/11/26/qos-memory-resources/ [15:06:01] From a quick read, it seems it needs Unified support in the CRI runtime which is not supported by our current runtime (containerd 1.4) [15:12:27] ah yes, Dockershim [15:15:51] I didn't know we had moved to containerd already [15:16:35] or maybe I'm just confused [15:17:14] Hmm, I'm not sure we have. it's installed, but we're still using docker [15:17:59] kubectl get nodes -o wide | awk '{print $13}' [15:18:01] docker://20.10.5+dfsg1 [15:19:48] Yeah as far as I can tell we're still using dockershim with Docker Engine [15:20:18] that's cool! dcausse and I have been trying to figure out why our containers keep getting oomkilled. Found the memory QOS stuff in this article, which has a pretty good explanation of how the kube manifest yaml maps to cgroups https://scribe.esmailelbob.xyz/cgroups-deep-dive-into-resource-management-in-kubernetes-5970e23620f2 [15:26:50] correct we still use the docker engine, we have discussed needing to migrate to another container engine, but that work has not been picked up by anyone [15:31:08] What are the big concerns with the change? Based on https://kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-removal-affects-you/ , I guess the first thing we'd need to do is audit our apps for depencies? [15:31:20] errr....dependencies [15:31:47] inflatador: I think the primary concern is whether we would see any changes in behavior [15:32:31] it is a big enough change that some testing is warranted, we also need to decide which new engine to pick [15:37:12] jhathaway Oh yeah, I'm sure we need to test, but I was just wondering about specifics. That article makes it sound easy (not that we should unconditionally accept that take). I'm guessing no one in this room has actually done such a migration? [15:39:07] I assume the bulk of the work is choosing an engine and then adjusting our puppetry. I would be surprised if we run into any compatibility issues. [15:39:55] but sre assumptions rarely pan out :) [15:40:04] I think docker requires containerd and podman defaults to cri-o , [15:40:40] I have seen apps that naively assume a docker socket, such as https://github.com/woodpecker-ci/woodpecker/issues/757 [15:42:08] I have been using cri-o with minikube and it works fine, at least for my testing purposes [15:43:06] if we choose cri-o we can use kata containers and migrate all our workloads to windows ;) [15:43:16] >_> [15:43:43] Isn't that the key to being a good SRE? Make it someone else's problem ;P [15:45:15] at my old job, we used this guy and it worked surprisingly well https://developer.hashicorp.com/nomad/plugins/drivers/community/iis [15:45:59] interesting [15:46:04] (with "well" defined as "how much can I avoid touching Windows servers?" [15:46:05] ) [15:47:46] but yeah, any app that spins up its own containers (CI?) probably needs to be checked