[06:03:35] o/ [07:45:16] I found this nice diagram: https://github.com/kubeflow/kfserving/blob/master/docs/diagrams/sklearn-iris-tracing.png [07:45:40] that clears out some stuff (also it seems a super nice inspection tool from knative) [07:46:05] there are istio virtual services created for multiple things [07:46:20] from kfserving to knative to istio [07:46:33] and surely all of them need TLS certificates [07:46:44] (all injected into istio) [07:50:02] the kfserving webhooks should take care of provisioning all of them [07:51:11] but if the above is right, and TLS is optional for istio sidecars, I am not getting why cert-manager is needed [07:57:21] the only explanation that I can give is that all the istio virtual services are bound to the same cluster-local-gateway, that needs TLS certs for all the different services to terminate TLS traffic [07:57:40] but even in this case, if we decide to allow unencrypted internal traffic, cert manager shouldn't be needed [07:57:43] * elukey bbiab [10:23:49] (I am following up with Theo on slack, the above is basically not correct, will try to get a more precise view) [10:39:47] * elukey lunch [10:51:25] 10Machine-Learning-Team, 10ORES, 10Continuous-Integration-Config: ORES wheels deployment repo Jenkins job should gate-and-submit - https://phabricator.wikimedia.org/T211041 (10hashar) 05Open→03Declined My understanding is that research/ores/wheels is abandoned or scheduled to be replaced entirely. So I d... [12:33:48] 10Machine-Learning-Team, 10Continuous-Integration-Config: CI should check to see if our wheels are good - https://phabricator.wikimedia.org/T250746 (10hashar) 05Open→03Declined My understand is ORES is being phased out, does not seem we need to invest in adding CI for `research/ores/wheels` [13:06:09] ok so what I was mentioning before is that the kfserving webhook certs are not related to istio [13:07:09] but it seems that they are related to the kfserving-controller/manager pod [13:07:36] so our understanding of having a self-managed TLS cert for ingress is fine [15:06:17] I think I have it now [15:06:28] it took a while but now everything that I read make sense [15:06:53] https://www.velotio.com/engineering-blog/managing-tls-certificate-for-kubernetes-admission-webhook was nice as introduction (even the first part is good enough for our purposes) [15:07:08] so there are two separate sets of TLS certificates that kfserving may need [15:07:36] 1) TLS endpoints to deploy as virtual services on Istio via sidecar (that we don't use for the moment) [15:09:06] 2) a TLS CA + certificates for the kfserving webhook service, that is a simple HTTP endpoint (backed up by a go binary) that is contacted by the kubernetes api as part of the admission controller extension workflow (basically extra checks made by kfserving for the various services deployed) [15:10:40] the main caveat is that the HTTP endpoint needs to provide HTTPS, since the kubernetes api is HTTPS only. The k8s api needs to know what TLS certificates to trust, so part of the webhook config is what CA is to be trusted when validating the webhook's HTTPS endpoint [15:10:47] this is what https://github.com/kubeflow/kfserving/blob/master/hack/self-signed-ca.sh does basically [15:11:04] it creates a self signed CA for the last use case [15:11:21] and the TLS certificates that the HTTPS endpoint needs to expose [15:11:55] the suggested way from upstream is to use cert-manager, but since we don't use the istio sidecars it is too much in my opinion [15:12:57] wc -l cert-manager.yaml [15:12:57] 7221 cert-manager.yaml [15:13:11] so my understanding is that we can skip the above [15:13:43] and either use a self-signed CA (probably not very elegant) or simply integrate cfssl without cert-manager [15:18:39] or we could generate the certs via certmanager and add the puppet CA crt [15:18:49] way easier [15:18:57] am I crazy or what I wrote above makes sense? [15:46:56] o/ [15:48:02] elukey: i think this makes sense, my guess is generate the certs via certmanager and add the puppet CA crt would be fairly straightforward [15:50:17] going to read that blog post you linked in a bit here [15:53:04] yes sorry the certs can be generated via "cergen" our internal tool, not certmanager (so we can ditch it entirely). I wrote the wrong name :D [15:53:20] ohhhh i see, yeah even better! [15:56:29] I was having a chat with John (SRE) and we could even use cfssl with some automation for auto-renew [16:00:58] like a script on kubernetes masters [16:20:57] elukey: that sklearn-iris-tracing diagram you shared is pretty nice. Looks like we can configure a similar visualization tool like zipkin or jaeger and trace the calls between all the different components [16:21:49] I thought that viz was related to knative instrumentation [16:22:04] but yes any tracing like that would be really helpful [16:22:21] there are a lot of steps before hitting the "real" api [16:22:27] istio knative etc.. [16:22:45] yeah I think it hooks in at the knative level: https://knative.dev/docs/serving/accessing-traces/ [16:23:27] https://github.com/kubeflow/kfserving/blob/master/docs/KFSERVING_DEBUG_GUIDE.md#investigate-performance-issues [16:23:31] ah yes yes [16:24:07] kubeflow/minikf ships with zipkin configured, i'm trying to spin it up on one of the sandbox clusters but the weird auth setup i have is giving me a 403 forbidden code [16:24:51] keep in mind that to use all those tools in prod we'll need to add docker images etc.. [16:24:59] yeah.... [16:25:07] so if we decide to use one it is better for mental sanity :D [16:25:38] lol good point [16:28:50] btw the revscoring inference service runs fine on the wmf buster base image [16:29:06] \o/ [16:29:26] at some point I can introduce you to the magic world of docker-pkg [16:29:47] yes yes that would be greatly appreciated [16:33:15] let's set up some time next week whenever you want [16:51:31] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Install certmanager on ml-serve cluster (if needed) - https://phabricator.wikimedia.org/T280661 (10elukey) After some readings and help from the Kfserving community I think that cert-manager is not needed for our use case. I am going to add my underst... [16:51:36] tried to summarize in --^ [17:01:35] going afk.. have a good weekend folks! [17:02:06] see ya elukey, have a good one!