[08:55:42] btullis: don't worry. I just wanted to inform about the situation and maybe get feedback on how you would like to proceed [08:56:11] elukey_: do you happen to have some time today do test the change on ml-staging? [08:56:47] I'd probably just manuelly change the config and restart kubelets so that we don't have to disable puppet on all k8s clusters [09:02:12] jayme: sure anytime [09:03:35] XioNoX: o/ what did you deploy on ml-serve clusters? (curiosity just as FYI, I noticed the logs in the sal) [09:05:00] elukey: that was https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/887945 [09:07:16] ah wow nice! Will try to read it later on thanks [09:08:05] I've created https://phabricator.wikimedia.org/T333302 out of that [09:51:44] elukey: I'll go with disabling puppet on all k8s ...just to be sure [09:51:59] will merge the change in a bit in case you don't object [09:52:15] I saw the task yes, will work on it today [09:52:19] +1 [10:01:53] yeah...worked great [10:01:57] kubelet[2385004]: E0328 09:57:53.748494 2385004 server.go:217] "Failed to set feature gates from initial flags-based config" err="cannot set feature gate IPv6DualStack to false, feature is locked to true" [10:10:10] reverted [10:10:16] jayme: would that mean having to reinitialise the cluster to go back to ipv4 only? [10:10:57] I don't know yet [10:19:18] <_joe_> ugh [10:20:11] could also be that it's impossible to disable and is just has no effect if the other components are not configured for ipv6. I need to check [10:20:16] jayme: ok if I deploy a typha change in ml-staging-codfw? [10:20:59] elukey: yeah, should be all good. All reverted and puppet re-enabled [10:26:34] done, worked :) [12:24:08] In prevision of merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/889069/ is there a cumin selector for ALL the k8s nodes? The idea is to disable Puppet on all the impacted hosts and then enabling it progressively (eg. by clusters) (cc volans) [12:24:52] checking [12:25:13] XioNoX: you're toucjing profile::calico::kubernetes, so match on that one [12:25:26] $ sudo cumin 'P:calico::kubernetes' [12:25:27] 88 hosts will be targeted: [12:25:56] or the modified yaml variables are also read by other puppet classes? [12:26:37] volans: I think that will be enough, I forgot we could select on Puppet profiles and not only role, thanks! [12:26:54] any class, resource, or their parameters too :D [12:27:25] what can't you filter on? [12:27:38] is there a ChatGPT selector? [12:27:48] :) [12:27:54] hiera values, structured facts (but depends) [12:46:31] XioNoX: fwiw I usually use P:kubernetes::node [12:47:14] thanks! [12:47:28] but as your change affects the calico profile, that seems pretty reasonable to use as selector :) [12:50:07] akosiaris: do you recall if we at some point actively decided gainst ipv6 for k8s services? We do allocate prefixes in netbox but we never assigned them in k8s AIUI [12:50:16] *against [12:51:41] jayme: strictly against, no. But we did evaluate the situation of the feature and it wasn't ready. We only relied on alpha/beta/GA status of the feature though [12:51:51] I guess in 1.23 we can re-evaluate [12:53:24] The only place we have ever enabled IPv6 is in the kubelet btw [12:53:57] control plane and kube-proxy were never enabled, which explains what you see [12:55:31] yeah, that was my recollection as well. With 1.23 I've almost removed the frankenstein mode, though (as long as profile::kubernetes::ipv6dualstack == true) [12:55:56] apart from service-cluster-cidr apparently [13:19:12] So my understanding now is that it's not possible to disable the IPv6DualStack feature gate since 1.23 (although the docs say it was removed in 1.24 - https://v1-25.docs.kubernetes.io/docs/reference/command-line-tools-reference/feature-gates-removed/). This only means that there is no difference in code path with feature gate enabled/disabled but the actual IPv6 features (assigning IPv6 IPs to Pods/Services etc.) will only be enabled for [13:19:13] clusters configured for dual stack (https://v1-23.docs.kubernetes.io/docs/concepts/services-networking/dual-stack/#configure-ipv4-ipv6-dual-stack) [13:20:46] elukey || akosiaris if one of you has a couple of spare minutes I'd love a second pair of eyes / another brain on this :) [13:41:03] will check in a bit! [13:43:47] <3 [13:54:20] jayme: yeah from a quick check what you wrote seems sound [13:54:35] let's hope it is :-p [13:54:57] is it a problem for us? IIUC no right? [13:55:13] I mean we have it enabled and we'll keep it, so the feature flag in hiera can be removed [13:56:03] no, there is not an actual problem I know of [13:56:47] the hiera key must stay, though as it controlls the configure-ipv4-ipv6-dual-stack settings - not only the featuregate [13:56:48] super [13:56:49] looking now [14:01:40] same understanding [14:01:47] cool, cool. [14:02:02] thanks for looking [14:02:19] IPv6 no longer guarded behind a feature gate, but it requires special configuration in all 4 components (apiserver, controller-manager, kubelet, kube-proxy) to enable [14:02:26] of those, we only configure the kubelet currently [14:02:51] btw, we need to fix the DNS for the new ip address ranges [14:03:05] I don't think I 've fix that one, I forgot about it [14:04:02] akosiaris: it is not correct that we only configure that for kubelet. With 1.23 I configured IPv6 almost like it's supposed to be done (apart from service ip range) [14:04:18] I prepared an updated version of the patch at https://gerrit.wikimedia.org/r/c/operations/puppet/+/903560 - PCC running [14:04:29] what did I miss? I went thorugh a ps |grep on all components [14:04:42] you probably missed config files [14:04:47] sigh [14:04:55] ofc, why wouldn't I ? [14:05:01] it's a messy mix of both now [14:05:08] the worst of both worlds [14:05:22] because obviously some things are arguments only and some are config only [14:06:05] and as they propagate to remove *all* arguments. I went the way of putting everything I can in config files [14:08:24] (forgot a file in the CR, re-running PCC) [14:09:52] ...if gerrit wasn't down [15:01:34] FYI, all kubernetes hardware nodes have been switched to the performance CPU governor [15:02:01] nice, thanks [16:29:10] oops, I totally missed the -sig meeting