[06:32:24] <elukey>	 hello folks!
[06:32:40] <elukey>	 just added enwiki-damaging to my test namespace, worked like a charm!
[06:32:57] <elukey>	 now I have two models (goodfaith and damaging) running in parallel
[06:36:48] <elukey>	 I am wondering what is the best practice to rollout new models though
[06:37:00] <elukey>	 leveraging knative as much as possible 
[06:37:52] <elukey>	 https://github.com/kubeflow/kfserving/tree/master/docs/samples/v1beta1/rollout
[06:39:25] <elukey>	 really nice --^
[10:10:24] * elukey lunch!
[10:14:42] <wikibugs>	 10artificial-intelligence, 10Community-Tech: Deploy Image content filtration model for Wikimedia Commons - https://phabricator.wikimedia.org/T279416 (10Aklapper) There seems to be https://github.com/HarshineeSriram/Image-Content-Filtration , I assume that is related?
[11:45:59] <elukey>	 got to a good stage for https://github.com/kubeflow/kfserving/pull/1780, but it seems that tests are failing for something (I hope) unrelated
[11:46:40] <elukey>	 while waiting I am going to try a canary rollout
[11:49:15] <elukey>	 ah no I'd need a different model
[11:49:28] <elukey>	 (for example, enwiki-goodfaith v1 and v2)
[11:52:04] <elukey>	 I deployed itwiki-goodfaith, all good
[11:52:37] <elukey>	 we could also attempt serverless, in theory knative should be able to do it without the metric server
[11:52:59] <elukey>	 but, as highlight, we have four models deployed and working on ml-serve-eqiad
[11:53:07] <elukey>	 that IIRC was the goal for the MVP
[11:53:26] <elukey>	 there is still a lot of config to add before reaching production-ready state
[11:53:42] <elukey>	 but probably time to celebrate a little :D
[12:09:30] <elukey>	 so, in theory it should be as simple as setting minReplica: 0 in the InferenceService specs
[12:17:02] <elukey>	 serverless worked really nicely
[12:17:14] <elukey>	 it took 10s to spin up the pod and be ready
[12:17:30] <elukey>	 request queued in the knative activator up to the revision/pod was up and running
[12:21:23] <elukey>	 so autoscaling for istio/knative itself needs the metric server, meanwhile what we care about (namely kfserving pods) is handled very well by knative
[12:31:57] <elukey>	 side note - it takes minutes to terminate kfserving pods
[15:19:44] <elukey>	 I created https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/715747 as improvement of the chart that we'll use to deploy the InferenceService specs
[15:20:07] <elukey>	 it is currently missing the service account bits, that are IIUC tightly coupled with helm2 vs helm3
[15:20:38] <elukey>	 see https://phabricator.wikimedia.org/T251305
[16:06:41] <accraze>	 o/
[16:06:56] <accraze>	 elukey: that's awesome you got all those models running!
[16:07:44] <accraze>	 also yeah kfserving pods can take a little bit to scale down
[16:12:57] <accraze>	 so glad to hear the serverless features work for the kfserving pod, i had just assumed that was off the table due to the metrics server issue
[16:23:59] <elukey>	 accraze: o/
[16:24:33] <elukey>	 yes I wasn't counting on it too but today I realized that knative serving uses the number of requests as metric, something that it knows
[16:24:53] <elukey>	 and that the metric server is basically only for autoscaling istio/knative/etc.. pods, that we'll probably never use
[16:25:03] <elukey>	 (it would be nice but static scaling for those pods is ok too)
[16:26:03] <elukey>	 accraze: not sure if you saw https://github.com/kubeflow/kfserving/tree/master/docs/samples/v1beta1/rollout, but I think that this is the kind of pay off that investing time in istio+knative brings
[16:26:24] <elukey>	 the canary release seems so easy, I haven't tested it yet
[16:26:29] <elukey>	 but it looks really nice
[16:26:42] <elukey>	 and also something totally controllable from the helm deployment point of view
[16:26:51] <elukey>	 it is just a matter of adding a line in the yaml config
[16:27:18] <accraze>	 i did a tutorial for canary rollout when we first spun up the sandbox and it was really really nice
[16:27:49] <accraze>	 that would be so cool to eventually have for ml-serve
[16:28:08] <elukey>	 I think we should already have it
[16:28:41] <accraze>	 ohhh actually yeah i think we should
[16:28:53] <elukey>	 adding something like "canaryTrafficPercent: 10" will create a new revision (knative) and istio will be instructed to route traffic accordingly
[16:29:07] <elukey>	 then when it reaches 100 the revision will be the only one to get traffic
[16:29:16] <elukey>	 rollback super easy since the old revision is kep
[16:29:21] <elukey>	 *kept
[16:31:27] <elukey>	 we'll see :)
[16:31:42] <elukey>	 accraze: thanks a lot for the 4 models and unblocking the editquality pipeline!
[16:32:13] <accraze>	 no prob! im glad we're getting this all sorted out
[16:32:52] <accraze>	 i'm definitely impressed that the internal mw endpoint worked with minimal pain
[16:33:37] <accraze>	 we'll eventually have to do the same for the other model-servers but for now we can just focus on editquality
[16:34:11] <elukey>	 definitely eys
[16:34:12] <elukey>	 *yes
[16:34:49] <elukey>	 I observed some occasional 503 from the api-ro endpoint, that was reflected to the score (so a final HTTP 503 returned by istio)
[16:35:17] <elukey>	 this may need some follow up, say a basic retry mechanism, but we can work on in later on
[16:39:34] <accraze>	 oh wow, just reading the messages from earlier, i didn't realize you were making predictions with minReplica: 0
[16:40:21] <accraze>	 so then you're seeing knative queue the request and the kfserving spins up the inference service and then returns a prediction and scales down again?
[16:42:05] <elukey>	 accraze: yes exactly
[16:42:15] <accraze>	 that's pretty impressive :)
[16:58:05] <elukey>	 :)
[16:58:10] <elukey>	 going to log off for today o/
[16:58:25] <accraze>	 later elukey