[08:56:31] it's fine without I'd say [10:54:23] super [10:54:29] I am going to rollout to staging then [10:54:47] jayme: for cert-manager I filed, not sure if it is the best approach or not [10:55:15] in the chart I found the digest as alternative, not a specific placeholder for the image's version (maybe I missed it somewhere) [10:59:45] elukey: I think there is image.tag [11:00:30] and webhook.image.tag and cainjector.image.tag :) [11:00:42] okok missed, what do you prefer? [11:01:29] I would prefer overwriting for staging only for now, leave it over the weekend and then bump the appVersion and chartversion... [11:01:34] probably the easiest [11:03:00] +1 ok [11:03:03] you can probably use some yaml anchor like here helmfile.d/admin_ng/cert-manager/cert-manager-values.yaml contains the [11:03:20] * elukey nods [11:03:32] last one - ok to proceed with the istio upgrade in wikikube staging? [11:03:59] yeah, go ahead [11:04:22] would you be so kind to to do staging-codfw as well please? [11:04:32] yep yep already in my list [11:05:32] cool, thanks [11:06:42] lovely: [11:06:42] ailed to pull image "docker-registry.discovery.wmnet/istio/proxyv2:1.15.7-2": rpc error: code = Unknown desc = Error response from daemon: Head "https://docker-registry.discovery.wmnet/v2/istio/proxyv2/manifests/1.15.7-2": no basic auth credentials [11:07:11] on ml-staging-codfw worked nicely [11:35:46] the docker pull commands are the same in both clusters [11:36:07] (from the kubelet) [11:36:08] Pulling image "docker-registry.discovery.wmnet/istio/proxyv2:1.15.7-2" [11:38:20] the change is https://gerrit.wikimedia.org/r/977214 [11:38:36] I had to use {{ registry }} since we have a special "istio" namespace in the registry [11:38:49] so docker-pkg on my laptop complained without it [11:43:09] ah wait, it fails on the control plane [11:43:21] istio-ingressgateway-2rwc5 0/1 ImagePullBackOff 0 38m 10.64.75.131 kubestagemaster1001.eqiad.wmnet [11:43:24] istio-ingressgateway-ckj9c 0/1 ImagePullBackOff 0 35m 10.64.75.195 kubestagemaster1002.eqiad.wmnet [11:46:03] and they have the NoSchedule taint [11:48:31] attempting another istioctl manifest apply [11:48:42] that of course now hangs in ingress [11:48:56] I kinda hoped it would have cleared the current situation [11:49:17] as FYI I did [11:49:33] root@deploy2002:/srv/deployment-charts/custom_deploy.d/istio/main# istioctl-1.15.7 manifest apply -f config.yaml [11:49:48] (after kube_admin staging etc.. ) [11:50:44] the desired pods in the daemonset is 4 [11:52:43] jayme: I may need your help after lunch for --^, maybe there is a gotcha that I don't know [11:54:55] going afk for lunch, ttl! [14:49:31] elukey: I'm looking at modules/ingress/istio_1.0.3.tpl where there is this special casing for staging and ml-staging. For staging it's no longer required because all certs do have proper SANs now - looks like there is still some work to do for ml to catch up...do you know what/where exactly? [14:50:15] (sorry, did not see the ping earlire for whatever reason) [15:06:33] jayme: I'd need to check but I'll do it today/tomorrow, is it blocking you now? [15:06:44] it may very well not be needed [15:07:10] elukey: nono, next days is fine. I can also try to figure it out myself if you don't have time [15:07:42] I'll tell you tomorrow if it is needed or not, just written down :) [15:07:54] any idea about the ingress on control plane nodes thing? [15:10:09] no...but I bet it's something related to a registry change [15:10:23] ailed to pull image "docker-registry.discovery.wmnet/istio/proxyv2:1.15.7-2": rpc error: code = Unknown desc = Error response from daemon: Head "https://docker-registry.discovery.wmnet/v2/istio/proxyv2/manifests/1.15.7-2": no basic auth credentials [15:11:15] yeah I found it as well, but it is weird that it happens only in staging and also -o wide reports control plane nodes [15:11:19] the desired replica is 4 [15:11:25] (for the daemonset) [15:11:28] I'd have expected 2 [15:12:20] nono, 4 is fine I suppose [15:12:31] I think they tolerate the master taint [15:13:34] the control planes are unable to pull the old image as well...nice that you uncovered that now :) [15:14:27] ah, the workers are not able to pull as well...so at least consistency [15:27:18] oof