[09:19:47] hello folks [09:20:03] so on ml-staging I have deployed up to Istio this morning, all good [09:20:15] now of course knative doesn't work but it seems a webhook/tls-generation issue [09:20:30] (I didn't expect to jump from 0.18 to 1.7 without having any issue) [09:40:55] o/ [09:45:04] hi akosiaris :) [09:50:43] reminder to everyone that tomorrow we are upgrading WikiKube codfw [09:50:47] fun times! [09:53:26] I can help with reimages :) [09:53:44] (the upgrade cookbook can do one at the time, the workers are too many) [10:34:30] akosiaris: I created https://phabricator.wikimedia.org/T330060 since it is relevant for tomorrow [10:34:40] any suggestion is appreciated :) [11:42:13] elukey: akosiaris: I've added you as reviewers to the patches for tomorrows upgrade (https://gerrit.wikimedia.org/r/c/operations/puppet/+/890390/ and https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/890392/) [11:43:42] I did not yet reserve the IPv4 pools in netbox because I wasn't sure if there is anything to consider (netbox wise) [11:44:45] (aka feel free to do so if you feel confident :)) [11:55:30] back to bed now. I'll check back in a couple of hours [12:16:17] cool, thanks [12:16:53] elukey: I understand you met some pain with re-imaging etcd, that mirrors some of my experiences [12:17:03] I 'll read up and see if I can have any smart ideas [12:17:06] but first... lunch [14:31:21] akosiaris: definitely no rush, I think that tomorrow we can probably reimage etcd beforehand (plus enabling PKI etc..) and then proceed with the cookbook [14:31:34] it would be nice in the future to have something ironed out and more automated [14:32:59] +1 [15:53:23] very weird - the knative autoscaler pod tries to contact the kube api, but it fails (i/o timeout) [15:53:45] I checked with nsenter and indeed curl hangs, but I verified and it should have egress rules to allow it [15:54:07] the log is [15:54:08] Failed to get k8s version Get "https://10.194.62.1:443/version": dial tcp 10.194.62.1:443: i/o timeout [15:54:22] and the IP is the kubernetes ClusterIP [15:54:28] (on other pods it works nicely) [15:54:33] anything that I may be missing? [16:12:34] it may be due to pod selector rules... [16:18:17] I used to add labels to the Deployment resource and IIRC those were passed down to pods, this may have changed with 1.23? [16:20:20] no it is Luca's fault of course [16:20:30] this doesn't change between k8s versions :D