[10:57:22] ok, so I tried to write "yesterday" up in https://phabricator.wikimedia.org/T329826 [10:57:54] this is obviously a hard blocker for cluster upgrades [11:02:45] thanks! [11:06:25] thanks for noticing! This could have easily become pretty unpleasant (more unpleasant than it already is :)) [11:40:34] good writeup thanks jayme [11:40:52] elukey: cdanis: do you have an istio config CR ready for aux? [11:40:55] cdanis: thanks [11:41:22] I don't -- we had only configured ingress there, deliberately, decided we didn't need full Istio [11:41:38] go back to sleep chris [11:41:43] Ok. I'll prepare one then [12:14:59] I've deployed the admin_ng stuff + istio to aux but I do see errors calling the webhook [12:15:07] Feb 16 12:14:59 aux-k8s-ctrl1002 kube-apiserver[541371]: W0216 12:14:59.557715 541371 dispatcher.go:142] Failed calling webhook, failing open rev.validation.istio.io: failed calling webhook "rev.validation.istio.io": failed to call webhook: Post "https://istiod.istio-system.svc:443/validate?timeout=10s": context deadline exceeded [12:15:38] those appeared in wikikube staging as well but then went away..which they don't know [13:09:03] elukey: lmk if/when you have a minute (or 99) for a sync :) [13:36:59] jayme: o/ [13:38:26] elukey: https://meet.google.com/fyx-wjfo-hyt ? [15:16:12] elukey: this might be the hack until proper solution https://gerrit.wikimedia.org/r/c/operations/puppet/+/889808/ [15:17:50] that way we can continue using the existing cergen cert to sign/validate service account tokens (so no additional manual steps) and it should not disrupt clusters already on 1.23 as the pki public key will still be available for validation of the already existing tokens [15:18:13] I think :-p [15:25:22] need to run an errand, I'll be back in a bit [16:14:00] jayme: sorry was in meetings, lemme check [16:16:59] jayme: i have one question - the tokens already emitted and signed by the PKI masters on 1.23 clusters are going to be still present after this change right? If so, would we still see issues with them (like the ones with calico pods) or not? [16:18:24] (going afk for a bit, back later) [16:37:06] elukey: yes, we would still see issues as long as the've not been refreshed (which happens within an hour AIUI) [16:37:29] or we've deleted all of them manually :) [16:43:12] unfortunately this does not seem to be true. Tokens don't change [16:47:39] maybe that's only true for default tokens... https://v1-23.docs.kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume says it's 1h by default [16:53:43] hm...no. This means something else is odd as well [17:02:14] no, it's me not being smart enough. The "legacy" tokens (those in type=kubernetes.io/service-account-token secret objects) don't expire, e.g. we'll run into issues with pods using those on existing clusters. The new "bound service account tokens" have a lifetime of 1h. So they will be auto-fixed [17:07:12] but AIUI (since 1.22) the legacy tokens are no longer used (by default) so we //should// be safe after 1h [17:08:00] (see the volumes list of pods containing "name: kube-api-access-*" volumes instead of mounting the secret [17:22:35] jayme: okok seems fine to test, we'll see if we find issues [17:23:28] elukey: you're around tomorrow for a merge party? [17:25:16] jayme: +1ed, the change looks good. Yep I'll be around :) [17:25:28] if you are and up early it would be nice if you could take a look at the istio webhook error [17:25:29] it seems the best compromise for the moment [17:25:41] that just disapeared... [17:25:44] damn [17:25:47] is there a task and/or something that I can check? [17:26:00] no, have not created anything yet [17:26:01] but I am pretty sure that it will come up also on ml-staging [17:26:36] just the "dispatcher.go:142] Failed calling webhook, failing open rev.validation.istio.io: failed calling webhook "rev.validation.istio.io": failed to call webhook: Post "https://istiod.istio-system.svc:443/validate?timeout=10s": context deadline exceeded" logs [17:27:10] an this on the ingress pods right? [17:27:31] istiod itself AIUI [17:28:21] the errors did only show up on 1002...the successfull request was now logged on 1001 ...so this might be another problem regarding multi control-plane [17:28:51] weird, but I can check tomorrow for sure, pretty sure that istio will give us some "fun" in various clusters (and for me istio-cni will be even better) [17:36:42] cool, thanks! tty tomorrow o/ [17:39:35] o/