[06:01:32] (03CR) 10Kevin Bazira: [C: 03+1] ores-legacy: return features in response [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928055 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos) [08:31:59] hello folks :) [08:32:01] so I was reading https://lernapparat.de/pytorch-rocm/ [08:32:30] and if stuff didn't change, there is a script used to move CUDA-related things to HIP when releasing the pytorch rocm wheel [08:32:45] that is unfortunate since we cannot avoid the bloated wheel [08:34:42] I can confirm that the 10G size of our bloom image is mostly due to python packages [08:34:45] 23 hours ago COPY /opt/lib/python/site-packages /opt/lib/… 10.1GB buildkit.dockerfile.v0 [08:37:31] 10Machine-Learning-Team, 10ORES, 10Documentation, 10User-AKlapper: Update docs that ORES will be replaced by Lift Wing - https://phabricator.wikimedia.org/T305963 (10elukey) I opened two pull requests to bot owners asking if they want to migrate, let's see if we get some useful feedback! [09:06:59] o/ [09:07:50] thats a huge image :) at least we can try to maintain one image for all models [09:09:12] unless we mount some stuff via volumes. What I'm thinking is that in order for a kserve a pod to start it will require to download the docker image (unless it already exists) and also the model [09:10:15] yeah.. [09:10:36] trying to mount a volume but the isvc specs + knative ones don't make the thing easy [09:11:02] I sorted out the issue with pods, but bloom-560m-gpu still shows the weird error [09:11:46] my theory is that it may be failing due to the missing rocm packages on the image [09:12:00] debian package I mean [09:24:52] I will try to debug that later today or tomorrow [09:28:48] and knative doesn't support hostPath volumes https://github.com/knative/serving/blob/main/pkg/apis/serving/k8s_validation.go#L124 [09:28:51] sigh [09:33:04] 10Lift-Wing, 10Machine-Learning-Team: Move model binaries hosted on Lift Wing to Analytics published space - https://phabricator.wikimedia.org/T334111 (10achou) [09:43:51] (03PS1) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) [09:43:54] 10Lift-Wing, 10Machine-Learning-Team: Move model binaries hosted on Lift Wing to Analytics published space - https://phabricator.wikimedia.org/T334111 (10achou) All the model binaries in LW production have been mirrored to `/srv/published/wmf-ml-models`: ` aikochou@stat1008:~$ ls -al /srv/published/wmf-ml-mode... [09:44:30] isaranto: I created a code review for bloom, not sure if it makes sense or not, lemme know if it is worth to test [09:48:19] wow had no idea that the is_available function was actually doing sth [09:49:04] * elukey afk for a bit [09:49:31] without the spawn we were getting different error (cant recall at the moment) [09:54:42] I found it: without spawn we were getting this error {"error":"RuntimeError : Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method"} [09:58:52] hi folks, all the models on LW are available at https://analytics.wikimedia.org/published/wmf-ml-models/ [10:00:35] elukey: jumping in a meeting and will review the patch afterwards [10:28:48] aiko: nice! [10:34:50] nice! [10:43:17] * elukey lunch! [11:15:38] (03CR) 10Ilias Sarantopoulos: "I'm a little hesitant whether this change will have any effect, but feel free to merge so we can try stuff out (after adding the device to" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey) [11:20:52] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Move Revert-risk multilingual model from staging to production - https://phabricator.wikimedia.org/T333124 (10achou) Added return object https://api.wikimedia.org/wiki/API_reference/Service/Lift_Wing/Revert_risk_score_object [11:23:04] 10Lift-Wing, 10Machine-Learning-Team: Move Revert-risk multilingual model from staging to production - https://phabricator.wikimedia.org/T333124 (10achou) [12:14:24] 10Machine-Learning-Team, 10ORES: Update ORES to use the new HookContainer/HookRunner system - https://phabricator.wikimedia.org/T338444 (10Umherirrender) [12:14:33] 10Machine-Learning-Team, 10ORES: Update ORES to use the new HookContainer/HookRunner system - https://phabricator.wikimedia.org/T338444 (10Umherirrender) [12:14:38] (03PS3) 10Umherirrender: Create HookRunner class and the hook handler interfaces [extensions/ORES] - 10https://gerrit.wikimedia.org/r/926735 (https://phabricator.wikimedia.org/T338444) [12:30:58] (03CR) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey) [12:35:07] earthquake in athens :) [12:35:45] or somewhere else [12:43:04] ouch :( [12:43:23] not that big though, all good [12:43:37] ok I was about to ask, good [12:45:38] (03CR) 10Ilias Sarantopoulos: [C: 03+2] ores-legacy: return features in response [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928055 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos) [12:48:35] (03Merged) 10jenkins-bot: ores-legacy: return features in response [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928055 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos) [12:50:13] isaranto: found https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/torchserve/v1/bloom [12:50:18] that looks promising [12:51:03] this is what I was initially was trying [12:51:11] but it uses torchserve [12:52:34] (03PS1) 10Daimona Eaytoy: Replace deprecated MWException [extensions/ORES] - 10https://gerrit.wikimedia.org/r/928534 (https://phabricator.wikimedia.org/T328220) [12:53:21] hmm I had the tutorial somewhere [12:53:56] in theory kserve has special docker images to run non-custom things like torch etc.. [12:54:22] the trick would be to import them in our docker registry [12:56:09] but even the custom predictor should work, I am looking for examples about how to use the gpu in that case [12:56:19] we are definitely not the first ones [12:58:29] in https://github.com/kserve/kserve/blob/master/docs/samples/fluid/docker/models.py they use flask, but no mp code.. [12:58:47] there is the special @app.before_first_request [12:59:32] isaranto: what if we run the device code in the preprocess or predict, initializing it the first time? [12:59:51] the first request pays the price, but the other ones should go smoother [12:59:55] and we skip init/load [13:01:21] then it would run every time [13:01:39] unless you mean having like a variable that will allow initialization only once [13:03:00] the latter yes [13:04:04] doesn't hurt to try [13:04:56] I am not fond of it either I know [13:04:57] I found out that I had some problem with torchserve and cuda when trying out that example [13:05:17] yy we'll have to try out a lot of dirty stuff to debug [13:05:56] we need to setup our environment to allow for faster experimentation instead of going through CI/CD [13:07:29] (03PS2) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) [13:08:21] (03CR) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey) [13:09:22] isaranto: definitely, not sure how yet [13:09:29] updated the patch with the new horror [13:11:44] (03CR) 10Ilias Sarantopoulos: "Let's try!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey) [13:12:16] I need to rename also everything related with bloom to something more generic e.g. transformer llm [13:12:38] i'm referring to deployment pipelines and python classes [13:20:03] (03PS3) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) [13:20:17] (03CR) 10Elukey: [C: 03+2] bloom: move gpu pytorch device context bootstrap to load() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey) [13:20:34] (03CR) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey) [13:20:58] not sure if I am in time, I wanted to change the commit msg [13:51:37] elukey: I am currently working on mediawiki extension again, probably tomorrow as well. I think next week I can focus a bit on llms along with ores-legacy [13:52:24] yesyes sure! [14:01:20] isaranto: the invalid pointer issue is fixed, but there is an error in the code [14:37:50] (03PS1) 10Elukey: bloom: fix model reference missing self [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928567 [14:41:56] (03CR) 10Ilias Sarantopoulos: [C: 03+2] "👍" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928567 (owner: 10Elukey) [14:42:59] (03Merged) 10jenkins-bot: bloom: fix model reference missing self [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928567 (owner: 10Elukey) [15:03:28] 10Machine-Learning-Team, 10serviceops: Replace the current recommendation-api service with a newer version - https://phabricator.wikimedia.org/T338471 (10elukey) [15:03:58] elukey: I updated the image with the above change [15:05:08] isaranto: ack, testing it manually, if it works I'll merge [15:05:19] fingers crosse [15:05:21] *crossed [15:07:54] always! [15:09:08] it works! [15:09:28] woooooww [15:10:02] and it is way faster [15:10:05] like 10x [15:10:33] \o/ [15:11:04] if u try with a bigger result length the difference is significant [15:11:10] e.g. 15 seconds vs ~2s [15:12:47] shall we go for the bloom3b now? [15:16:20] (03PS1) 10AikoChou: events: remove content_slots field from prediction classification event [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928583 (https://phabricator.wikimedia.org/T328899) [15:18:34] isaranto: sure! [15:19:20] great milestone, congrats! 🎉 [15:19:58] merged your change [15:20:05] well congrats to us! [15:21:53] isaranto: do you have a min to report in the task the results? [15:22:02] sure [15:22:21] wow that's great!! \o/ [15:22:34] 10Machine-Learning-Team, 10serviceops: Replace the current recommendation-api service with a newer version - https://phabricator.wikimedia.org/T338471 (10akosiaris) Adding some more information: The service was maintained by @bmansurov. It was deployed on the scb cluster. I am the one that moved it to Wikikub... [15:25:59] 10Machine-Learning-Team, 10Patch-For-Review: Host open source LLM (bloom, etc.) on Lift Wing - https://phabricator.wikimedia.org/T333861 (10isarantopoulos) We have successfully deployed bloom-560m with and without GPU on LiftWing 🎉 Preliminary results show an out-of-the-box (without additional inference optim... [15:26:03] 10Machine-Learning-Team, 10serviceops: Replace the current recommendation-api service with a newer version - https://phabricator.wikimedia.org/T338471 (10akosiaris) Oh I forgot to add that we have https://meta.wikimedia.org/wiki/Recommendation_API for explaining what it is. Finally the referers in turnilo impl... [15:27:07] 10Machine-Learning-Team, 10serviceops: Replace the current recommendation-api service with a newer version - https://phabricator.wikimedia.org/T338471 (10akosiaris) My personal take is btw that is unowned. I 'd say Code Stewardship request and maybe it's enough of a lost cause that we undeploy it? [15:35:50] (03PS1) 10Ilias Sarantopoulos: fix: rename bloom to llm [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928591 [15:38:18] I finally did the rename , I couldn't see that anymore [15:39:03] ahahahah [15:39:29] does it need a CI change too? [15:39:39] as follow up step we'll also need to clean up the old images [15:39:44] (03CR) 10Ilias Sarantopoulos: "Will w8 for https://gerrit.wikimedia.org/r/c/integration/config/+/928592 to be merged first" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928591 (owner: 10Ilias Sarantopoulos) [15:40:18] yes I did the CI change here -> https://gerrit.wikimedia.org/r/c/integration/config/+/928592 [15:40:28] will w8 for that one to be merged first [15:40:32] super [15:40:58] (03CR) 10Elukey: [C: 03+1] fix: rename bloom to llm [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928591 (owner: 10Ilias Sarantopoulos) [15:46:01] 10Machine-Learning-Team, 10serviceops: Replace the current recommendation-api service with a newer version - https://phabricator.wikimedia.org/T338471 (10elukey) Thanks for the info! >>! In T338471#8914520, @akosiaris wrote: > Oh I forgot to add that we have https://meta.wikimedia.org/wiki/Recommendation_API... [15:46:13] Going afk o/ [15:46:26] o/ [15:46:33] isaranto: shall I try bloom-3b? [15:46:36] to see the mess [15:47:01] Ofc [15:47:10] * elukey doing it [15:50:49] I'll try https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/928593 [16:08:33] bloom-3b works on gpu! [16:10:12] Very nice! [16:10:32] what's the difference in latency? [16:11:06] Ilias wrote something in the task for 506m, it is several x [16:11:11] really noticeable [16:11:58] for 3b I tried to generate something with result len 100: with gpu ~7s, without 1:30 mins [16:12:01] ahahahah [16:12:16] very happy [16:13:03] very nice! [16:13:05] 10Machine-Learning-Team, 10Epic: Experiment with GPUs in the Machine Learning infrastructure - https://phabricator.wikimedia.org/T333462 (10isarantopoulos) We have successfully deployed bloom-560m with and without GPU on LiftWing 🎉 Preliminary results show an out-of-the-box (without additional inference optimi... [16:15:27] 10Machine-Learning-Team, 10Epic: Experiment with GPUs in the Machine Learning infrastructure - https://phabricator.wikimedia.org/T333462 (10elukey) bloom-3b works as well! Tried to generate 100 tokens: with GPU ~7s, without 1:30 mins :D [16:19:01] \o/ [16:19:33] I ran for 200 result length - without GPu ~3m - with GPU 13s [16:20:05] for the specific model it seems proportional to the output requested [16:21:28] isaranto: we are probably ready for falcon :D [16:22:03] MOAR parameters [16:24:20] going afk folks, have a nice rest of the day! [16:24:42] l\o [17:06:45] (03CR) 10DannyS712: [C: 03+2] Replace deprecated MWException [extensions/ORES] - 10https://gerrit.wikimedia.org/r/928534 (https://phabricator.wikimedia.org/T328220) (owner: 10Daimona Eaytoy) [17:28:41] (03Merged) 10jenkins-bot: Replace deprecated MWException [extensions/ORES] - 10https://gerrit.wikimedia.org/r/928534 (https://phabricator.wikimedia.org/T328220) (owner: 10Daimona Eaytoy) [21:06:50] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399 (10Ottomata) Alright, in the latest patch for including redirect page link...