[13:52:36] elukey: cert-manager in staging looks good I'd say. Guess we can continue rolling out to prod [13:55:54] jayme: \o/ [15:13:47] hi all. I'm working on deploying a service in the dse-k8s cluster that will need ingress but not the service mesh. However, it seems that we mix ingress and mesh logic in the templates: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/master/modules/ingress/istio_1.0.3.tpl#19 [15:14:48] as a result, the default destination for my VirtualService is $app-tls-service.$ns-svc.cluster.local, when that $app-tls service is only defined when mesh is enabled [15:16:39] I can work on a patch that would use the default service if mesh is disabled, but I just wanted to get some external eyes on the logic there. Am I missing something obvious? [15:19:40] brouberol: sorry my bad, I totally forgot one bit in #analytics [15:19:54] so we use "mesh" also for the TLS terminator in front of the application [15:20:31] so envoy will be involved also in the traffic between istio gateway pods and the "application" pods [15:21:10] this is why ingress uses that value, and it makes sense.. [15:21:36] if you don't add any "discovery" settings, the envoy proxy should only act as TLS terminator (in "mesh" I mean) [15:22:10] so in theory you could use ingress + mesh now, without "discovery", and the spark history server use case should be good [15:22:16] (sorry just thought about it now) [15:22:17] I started to implement a possible "fix" for ^ this and realized that without the envoy sidecar, there wouldn't be any TLS termination, indeed [15:22:30] yes yes my bad, totally forgot about it [15:22:45] no worries at all, the more I dig, the more I understand [15:36:50] and just so I know, where is this $app-tls-setup service defined? I was looking for a Sevice with that name somewhere in our templates/macros, to no avail [15:37:59] so I'm guessing it's not a Service per se, but something created by istio which resolves like a service? [15:42:51] brouberol: no, it's actually the service created by the mesh module when tls is enabled (modules/mesh/service_1.1.0.tpl) [15:43:12] elukey: I might have found some cergen certs still in use in ml [15:43:19] uffff [15:43:27] certificates/istio-egressgateway.istio-system.svc.cluster.local/istio-egressgateway.istio-system.svc.cluster.local.crt.pem [15:43:36] certificates/kserve-webhook-server-service.kserve.svc.cluster.local/kserve-webhook-server-service.kserve.svc.cluster.local.crt.pem [15:43:39] ah lol the egress gateway! [15:43:42] I forgot about that [15:44:04] the kserve webhook is something that we can easily fix [15:44:15] it's a different story ofc. but I'm currently going over private puppet to remove cergen stuff [15:44:17] jayme: thanks, I don't have these in my chart, which I think comes from how I used ./create_new_service.sh initially [15:44:51] brouberol: probably...if you did not select mesh initially I would suggest to create a new one and merge your already done changes... [15:44:54] easier that way [15:45:49] yep, that's what I've just done, and I now have the appropriate templates. Thanks! [15:48:15] elukey: not need to take action immediately - I just wanted to let you know [15:49:05] jayme: yeah sure first you nerdsnipe me then you telle me "no need to worry" :D [15:49:19] eheh, sorr [15:49:21] +y [15:49:28] ahahha no I am joking :D [15:49:46] I meant I'm obviously not going to remove ml certs from private puppet [15:49:47] I'll try to fix it before I am off so you are not blocked [15:52:19] removing the egress gw one now [15:52:46] I do also see usages of the DNS:default-ml-staging-certificate.wmnet, DNS:ml-staging.svc.codfw.wmnet, DNS:ml-staging.svc.eqiad.wmnet [15:52:56] configmap/ores-legacy-main-tls-proxy-certs [15:53:18] weird, I removed those certs [15:53:21] configmap/recommendation-api-ng-main-tls-proxy-certs [15:53:22] so for kserve I see {{- if .Values.kserve.webhook.cert_manager }} [15:53:33] so we already use cert-manager [15:53:43] I'll double check but I believe those are all leftovers [15:55:16] I was just looking at staging tbh [15:57:15] it would probably be best to drop everything (certs, cergen config, deployment-server config) in one commit from private puppet, run puppet on deploy* and then run a helmfile diff for everything :/ [15:57:31] just to make sure nothing was missed [16:00:06] definitely yes [16:00:17] 🤮 [16:00:24] so kserve should be gone in a min, checking also the ml-staging ones [16:01:25] wow so in prod I don't see ores-legacy-main-tls-proxy-certs [16:01:29] but I do see it on ml-staging [16:01:36] hmmm... [16:02:26] it's also still referenced in the deployment [16:07:36] ok kserve is gone as well [16:08:09] ah, now I got you actually dropped them from private puppet ;) [16:08:28] yep yep [16:08:49] cool, thanks :) [16:09:12] hi Eric o/ [16:09:26] Luca just finished his work in puppet private [16:09:31] you should be clash-free now [16:09:38] oh, yeah, I just saw the same :) [16:10:00] urandom: o/ I unstaged your files and proceeded with my commit :) [16:11:39] urandom: one minute that I need to fix another file [16:11:41] sorryyyy [16:11:49] no it's ok, I haven't staged anything [16:11:57] I'll wait :) [16:12:20] done! [16:12:26] optimistic concurrency control! [16:12:40] I'm optimistic this time it will work [16:14:58] (Narrator: and it did.) [16:25:07] elukey: I'll circle back to this tomorrow (and chat with you before ;)) [16:25:14] ttyl o/