[06:32:24] hello folks! [06:32:40] just added enwiki-damaging to my test namespace, worked like a charm! [06:32:57] now I have two models (goodfaith and damaging) running in parallel [06:36:48] I am wondering what is the best practice to rollout new models though [06:37:00] leveraging knative as much as possible [06:37:52] https://github.com/kubeflow/kfserving/tree/master/docs/samples/v1beta1/rollout [06:39:25] really nice --^ [10:10:24] * elukey lunch! [10:14:42] 10artificial-intelligence, 10Community-Tech: Deploy Image content filtration model for Wikimedia Commons - https://phabricator.wikimedia.org/T279416 (10Aklapper) There seems to be https://github.com/HarshineeSriram/Image-Content-Filtration , I assume that is related? [11:45:59] got to a good stage for https://github.com/kubeflow/kfserving/pull/1780, but it seems that tests are failing for something (I hope) unrelated [11:46:40] while waiting I am going to try a canary rollout [11:49:15] ah no I'd need a different model [11:49:28] (for example, enwiki-goodfaith v1 and v2) [11:52:04] I deployed itwiki-goodfaith, all good [11:52:37] we could also attempt serverless, in theory knative should be able to do it without the metric server [11:52:59] but, as highlight, we have four models deployed and working on ml-serve-eqiad [11:53:07] that IIRC was the goal for the MVP [11:53:26] there is still a lot of config to add before reaching production-ready state [11:53:42] but probably time to celebrate a little :D [12:09:30] so, in theory it should be as simple as setting minReplica: 0 in the InferenceService specs [12:17:02] serverless worked really nicely [12:17:14] it took 10s to spin up the pod and be ready [12:17:30] request queued in the knative activator up to the revision/pod was up and running [12:21:23] so autoscaling for istio/knative itself needs the metric server, meanwhile what we care about (namely kfserving pods) is handled very well by knative [12:31:57] side note - it takes minutes to terminate kfserving pods [15:19:44] I created https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/715747 as improvement of the chart that we'll use to deploy the InferenceService specs [15:20:07] it is currently missing the service account bits, that are IIUC tightly coupled with helm2 vs helm3 [15:20:38] see https://phabricator.wikimedia.org/T251305 [16:06:41] o/ [16:06:56] elukey: that's awesome you got all those models running! [16:07:44] also yeah kfserving pods can take a little bit to scale down [16:12:57] so glad to hear the serverless features work for the kfserving pod, i had just assumed that was off the table due to the metrics server issue [16:23:59] accraze: o/ [16:24:33] yes I wasn't counting on it too but today I realized that knative serving uses the number of requests as metric, something that it knows [16:24:53] and that the metric server is basically only for autoscaling istio/knative/etc.. pods, that we'll probably never use [16:25:03] (it would be nice but static scaling for those pods is ok too) [16:26:03] accraze: not sure if you saw https://github.com/kubeflow/kfserving/tree/master/docs/samples/v1beta1/rollout, but I think that this is the kind of pay off that investing time in istio+knative brings [16:26:24] the canary release seems so easy, I haven't tested it yet [16:26:29] but it looks really nice [16:26:42] and also something totally controllable from the helm deployment point of view [16:26:51] it is just a matter of adding a line in the yaml config [16:27:18] i did a tutorial for canary rollout when we first spun up the sandbox and it was really really nice [16:27:49] that would be so cool to eventually have for ml-serve [16:28:08] I think we should already have it [16:28:41] ohhh actually yeah i think we should [16:28:53] adding something like "canaryTrafficPercent: 10" will create a new revision (knative) and istio will be instructed to route traffic accordingly [16:29:07] then when it reaches 100 the revision will be the only one to get traffic [16:29:16] rollback super easy since the old revision is kep [16:29:21] *kept [16:31:27] we'll see :) [16:31:42] accraze: thanks a lot for the 4 models and unblocking the editquality pipeline! [16:32:13] no prob! im glad we're getting this all sorted out [16:32:52] i'm definitely impressed that the internal mw endpoint worked with minimal pain [16:33:37] we'll eventually have to do the same for the other model-servers but for now we can just focus on editquality [16:34:11] definitely eys [16:34:12] *yes [16:34:49] I observed some occasional 503 from the api-ro endpoint, that was reflected to the score (so a final HTTP 503 returned by istio) [16:35:17] this may need some follow up, say a basic retry mechanism, but we can work on in later on [16:39:34] oh wow, just reading the messages from earlier, i didn't realize you were making predictions with minReplica: 0 [16:40:21] so then you're seeing knative queue the request and the kfserving spins up the inference service and then returns a prediction and scales down again? [16:42:05] accraze: yes exactly [16:42:15] that's pretty impressive :) [16:58:05] :) [16:58:10] going to log off for today o/ [16:58:25] later elukey