[07:05:34] thanks for the wikitech update Aiko! [07:05:46] (aiko --^) [08:12:29] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Create articletopic inference services - https://phabricator.wikimedia.org/T313307 (10kevinbazira) Inference services were created for all the 11 articletopic models and they are all up and running in produc... [08:14:26] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Migrate articletopic models - https://phabricator.wikimedia.org/T313304 (10kevinbazira) a:03kevinbazira [08:15:23] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Migrate articletopic models - https://phabricator.wikimedia.org/T313304 (10kevinbazira) [08:17:26] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Migrate articletopic models - https://phabricator.wikimedia.org/T313304 (10kevinbazira) The migration of articletopic models has been completed. 11/11 articletopic [[ https://phabricator.wikimedia.org/T313305 | models were uploa... [08:47:08] aiko: o/ [08:47:37] I am reading https://github.com/triton-inference-server/server/blob/main/Dockerfile.sdk and I have some doubts (mostly related to licensing) that we'll be able to build/import the triton server in our docker registry :( [08:47:59] elukey: morning o/ [08:53:48] elukey: no worries! I don't think triton is necessary for us and from yesterday's experienment, seems it requires cuda driver and based on nvidia architecture. [08:54:04] ahh okok [09:03:38] (03PS1) 10Elukey: editquality: use a Ray worker for model serving [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/819500 (https://phabricator.wikimedia.org/T313915) [09:04:15] the ray worker code is ready --^ [09:07:18] aiko: wrk back on deploy1002 [09:37:05] elukey: thanks for bringing wrk back :) [09:46:14] aiko: if you have time for a quick review - https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/819500 [09:46:28] (after that I'll try to do some perf tests on staging) [09:56:21] elukey: I have a question - we now move the model serving code to ray worker, what's the job in the tornado loop? because we don't just move pre/post-process to the ray worker (seems not possible), we move the whole model serving code to the ray worker. [10:03:19] (03CR) 10AikoChou: [C: 03+2] "LGTM. We can do some perf tests on staging to see how it goes!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/819500 (https://phabricator.wikimedia.org/T313915) (owner: 10Elukey) [10:06:34] aiko: IIUC the tornado loop should still run pre/postprocess [10:06:35] in theory [10:06:43] or do you think that it runs everything? [10:08:39] (03Merged) 10jenkins-bot: editquality: use a Ray worker for model serving [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/819500 (https://phabricator.wikimedia.org/T313915) (owner: 10Elukey) [10:15:21] elukey: my understanding is ray worker is another python process separate from tornado http server, so I think now the ray worker runs everything [10:16:52] elukey: but I'm not sure.. maybe I misunderstand [10:17:15] I am confused by [10:17:16] https://kserve.github.io/website/master/modelserving/v1beta1/custom/custom_model/#parallel-inference [10:17:25] "Modify the Procfile to web: python -m model_remote and then run the above pack command, it builds the serving image which launches each model as separate python worker and tornado webserver routes to the model workers by name. " [10:18:00] ok so maybe tornado just routes requests [10:18:12] uff ok so it is not super good [10:18:29] going afk for lunch, I'll think about it :) [10:19:37] elukey: ttl :) [11:38:57] aiko: an alternative path could be to re-add transformers to revscoring models [11:39:20] with kserve 0.9 we should be able to scale them separately, but it would mean a lot more pods [12:01:02] elukey: that is one path, but a bit complicate. We need to investigate more about the feature in kserve 0.9. [12:01:37] I think we can do some perf test based on what we have now (async architecture, with or without ray worker), so we know where we are and how much we want to improve [12:07:30] I expect the async architecture already have at least some improvement on performance. [12:21:42] aiko: the ray worker image is broken (my bad), I am now testing yesterday's one (async preprocess() only) [12:21:46] and indeed I see some improvements [12:24:06] ohhh \o/ [12:25:12] on staging, for enwiki-goodfaith, I see Requests/sec: 16.37 [12:25:22] Latency Distribution [12:25:22] 50% 292.89ms [12:25:22] 75% 308.04ms [12:25:22] 90% 350.22ms [12:25:22] 99% 479.61ms [12:25:42] I am going to execute tests with the new image and the old one [12:27:46] wow the difference seems huge [12:28:01] that's great!! I'm also testing outlink model now and I see improvement [12:29:59] (03PS1) 10Elukey: Revert "editquality: use a Ray worker for model serving" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/819538 [12:30:48] going to revert it since it doesn't work, will do more testing. I thought locally worked but for some reason I missed a bug [12:30:59] I'll concentrate on testing async predict only [12:31:02] err preprocess [12:31:32] there is also the possibility to increase the tornado processes now that I think about it, it should be fixed with kserve 0.8 [12:34:53] yeah, we can try that! We can set the number of tornado processes to an env variable [12:48:51] (03CR) 10AikoChou: [C: 03+2] Revert "editquality: use a Ray worker for model serving" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/819538 (owner: 10Elukey) [12:50:53] thanks :) [12:53:11] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Move revscoring isvcs to async architecture - https://phabricator.wikimedia.org/T313915 (10elukey) Some high level numbers in staging for the non-async docker image of editquality-goodfaith, enwiki: ` elukey@deploy1002:~$ wrk -c 1 -t 1 --timeout 2s -s infer... [12:53:26] aiko: --^ [12:53:40] async preprocess() seems to work really well [12:53:46] especially as we scale up clients [12:54:20] (03Merged) 10jenkins-bot: Revert "editquality: use a Ray worker for model serving" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/819538 (owner: 10Elukey) [12:56:42] prod seems to have a baseline latency lower than staging, that is weird [12:57:19] not lower but worse [12:57:44] anyway, I think the new editquality image scales up way better, we are on a good path! [12:57:49] taking a little break [14:02:14] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Use non-blocking HTTP calls to get outlinks for Outlinks topic model - https://phabricator.wikimedia.org/T311043 (10achou) **Performance test results** Test article: [[ https://en.wikipedia.org/wiki/Toni_Morrison | Toni Morrison ]] ` aikochou@deploy1002:~$... [14:11:17] /11 [14:11:21] uff [14:34:34] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Deploy Outlinks topic model to production - https://phabricator.wikimedia.org/T287056 (10calbon) [14:34:37] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Deploy Outlinks topic model to production - https://phabricator.wikimedia.org/T287056 (10calbon) [14:34:48] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Upload outlink topic model to storage - https://phabricator.wikimedia.org/T313887 (10calbon) 05Open→03Resolved [14:35:22] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Create outlink topic model inference service - https://phabricator.wikimedia.org/T313888 (10calbon) 05Open→03Resolved [14:48:23] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Use non-blocking HTTP calls to get outlinks for Outlinks topic model - https://phabricator.wikimedia.org/T311043 (10calbon) [14:48:27] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Test async preprocess on kserve - https://phabricator.wikimedia.org/T309623 (10calbon) [14:48:46] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Use non-blocking HTTP calls to get outlinks for Outlinks topic model - https://phabricator.wikimedia.org/T311043 (10calbon) 05Open→03Resolved [14:48:49] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Deploy Outlinks topic model to production - https://phabricator.wikimedia.org/T287056 (10calbon) [14:49:19] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Upload articletopic model binaries to storage - https://phabricator.wikimedia.org/T313305 (10calbon) 05Open→03Resolved [14:49:22] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Migrate articletopic models - https://phabricator.wikimedia.org/T313304 (10calbon) [14:49:48] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Create articletopic inference services - https://phabricator.wikimedia.org/T313307 (10calbon) 05Open→03Resolved [14:49:50] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Migrate articletopic models - https://phabricator.wikimedia.org/T313304 (10calbon) [14:50:18] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Fix wikidatawiki articletopic predictor Init:CrashLoopBackOff issue - https://phabricator.wikimedia.org/T314278 (10calbon) 05Open→03Resolved [14:50:22] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Create articletopic inference services - https://phabricator.wikimedia.org/T313307 (10calbon) [14:50:42] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Migrate articletopic models - https://phabricator.wikimedia.org/T313304 (10calbon) 05Open→03Resolved [16:28:51] going afk folks, have a good rest of the day :) [16:30:24] bye Luca! :)