[06:01:32] <wikibugs>	 (03CR) 10Kevin Bazira: [C: 03+1] ores-legacy: return features in response [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928055 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos)
[08:31:59] <elukey>	 hello folks :)
[08:32:01] <elukey>	 so I was reading https://lernapparat.de/pytorch-rocm/
[08:32:30] <elukey>	 and if stuff didn't change, there is a script used to move CUDA-related things to HIP when releasing the pytorch rocm wheel
[08:32:45] <elukey>	 that is unfortunate since we cannot avoid the bloated wheel
[08:34:42] <elukey>	 I can confirm that the 10G size of our bloom image is mostly due to python packages
[08:34:45] <elukey>	 <missing>      23 hours ago   COPY /opt/lib/python/site-packages /opt/lib/…   10.1GB    buildkit.dockerfile.v0
[08:37:31] <wikibugs>	 10Machine-Learning-Team, 10ORES, 10Documentation, 10User-AKlapper: Update docs that ORES will be replaced by Lift Wing - https://phabricator.wikimedia.org/T305963 (10elukey) I opened two pull requests to bot owners asking if they want to migrate, let's see if we get some useful feedback!
[09:06:59] <isaranto>	 o/ 
[09:07:50] <isaranto>	 thats a huge image :) at least we can try to maintain one image for all models 
[09:09:12] <isaranto>	 unless we mount some stuff via volumes. What I'm thinking is that in order for a kserve a pod to start it will require to download the docker image (unless it already exists) and also the model 
[09:10:15] <elukey>	 yeah..
[09:10:36] <elukey>	 trying to mount a volume but the isvc specs + knative ones don't make the thing easy
[09:11:02] <elukey>	 I sorted out the issue with pods, but bloom-560m-gpu still shows the weird error
[09:11:46] <elukey>	 my theory is that it may be failing due to the missing rocm packages on the image
[09:12:00] <elukey>	 debian package I mean
[09:24:52] <isaranto>	 I will try to debug that later today or tomorrow
[09:28:48] <elukey>	 and knative doesn't support hostPath volumes https://github.com/knative/serving/blob/main/pkg/apis/serving/k8s_validation.go#L124
[09:28:51] <elukey>	 sigh
[09:33:04] <wikibugs>	 10Lift-Wing, 10Machine-Learning-Team: Move model binaries hosted on Lift Wing to Analytics published space - https://phabricator.wikimedia.org/T334111 (10achou)
[09:43:51] <wikibugs>	 (03PS1) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583)
[09:43:54] <wikibugs>	 10Lift-Wing, 10Machine-Learning-Team: Move model binaries hosted on Lift Wing to Analytics published space - https://phabricator.wikimedia.org/T334111 (10achou) All the model binaries in LW production have been mirrored to `/srv/published/wmf-ml-models`: ` aikochou@stat1008:~$ ls -al /srv/published/wmf-ml-mode...
[09:44:30] <elukey>	 isaranto: I created a code review for bloom, not sure if it makes sense or not, lemme know if it is worth to test
[09:48:19] <isaranto>	 wow had no idea that the is_available function was actually doing sth
[09:49:04] * elukey afk for a bit
[09:49:31] <isaranto>	 without the spawn we were getting different error (cant recall at the moment)
[09:54:42] <isaranto>	 I found it: without spawn we were getting this error {"error":"RuntimeError : Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method"}
[09:58:52] <aiko>	 hi folks, all the models on LW are available at https://analytics.wikimedia.org/published/wmf-ml-models/
[10:00:35] <isaranto>	 elukey: jumping in a meeting and will review the patch afterwards
[10:28:48] <elukey>	 aiko: nice!
[10:34:50] <isaranto>	 nice!
[10:43:17] * elukey lunch!
[11:15:38] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "I'm a little hesitant whether this change will have any effect, but feel free to merge so we can try stuff out (after adding the device to" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[11:20:52] <wikibugs>	 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Move Revert-risk multilingual model from staging to production - https://phabricator.wikimedia.org/T333124 (10achou) Added return object https://api.wikimedia.org/wiki/API_reference/Service/Lift_Wing/Revert_risk_score_object
[11:23:04] <wikibugs>	 10Lift-Wing, 10Machine-Learning-Team: Move Revert-risk multilingual model from staging to production - https://phabricator.wikimedia.org/T333124 (10achou)
[12:14:24] <wikibugs>	 10Machine-Learning-Team, 10ORES: Update ORES to use the new HookContainer/HookRunner system - https://phabricator.wikimedia.org/T338444 (10Umherirrender)
[12:14:33] <wikibugs>	 10Machine-Learning-Team, 10ORES: Update ORES to use the new HookContainer/HookRunner system - https://phabricator.wikimedia.org/T338444 (10Umherirrender)
[12:14:38] <wikibugs>	 (03PS3) 10Umherirrender: Create HookRunner class and the hook handler interfaces [extensions/ORES] - 10https://gerrit.wikimedia.org/r/926735 (https://phabricator.wikimedia.org/T338444)
[12:30:58] <wikibugs>	 (03CR) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[12:35:07] <isaranto>	 earthquake in athens :) 
[12:35:45] <isaranto>	 or somewhere else
[12:43:04] <elukey>	 ouch :(
[12:43:23] <isaranto>	 not that big though, all good
[12:43:37] <elukey>	 ok I was about to ask, good
[12:45:38] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] ores-legacy: return features in response [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928055 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos)
[12:48:35] <wikibugs>	 (03Merged) 10jenkins-bot: ores-legacy: return features in response [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928055 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos)
[12:50:13] <elukey>	 isaranto: found https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/torchserve/v1/bloom
[12:50:18] <elukey>	 that looks promising
[12:51:03] <isaranto>	 this is what I was initially was trying
[12:51:11] <isaranto>	 but it uses torchserve
[12:52:34] <wikibugs>	 (03PS1) 10Daimona Eaytoy: Replace deprecated MWException [extensions/ORES] - 10https://gerrit.wikimedia.org/r/928534 (https://phabricator.wikimedia.org/T328220)
[12:53:21] <isaranto>	 hmm I had the tutorial somewhere
[12:53:56] <elukey>	 in theory kserve has special docker images to run non-custom things like torch etc..
[12:54:22] <elukey>	 the trick would be to import them in our docker registry
[12:56:09] <elukey>	 but even the custom predictor should work, I am looking for examples about how to use the gpu in that case
[12:56:19] <elukey>	 we are definitely not the first ones
[12:58:29] <elukey>	 in https://github.com/kserve/kserve/blob/master/docs/samples/fluid/docker/models.py they use flask, but no mp code..
[12:58:47] <elukey>	 there is the special @app.before_first_request
[12:59:32] <elukey>	 isaranto: what if we run the device code in the preprocess or predict, initializing it the first time?
[12:59:51] <elukey>	 the first request pays the price, but the other ones should go smoother
[12:59:55] <elukey>	 and we skip init/load
[13:01:21] <isaranto>	 then it would run every time
[13:01:39] <isaranto>	 unless you mean having like a variable that will allow initialization only once
[13:03:00] <elukey>	 the latter yes
[13:04:04] <isaranto>	 doesn't hurt to try
[13:04:56] <elukey>	 I am not fond of it either I know
[13:04:57] <isaranto>	 I found out that I had some problem with torchserve and cuda when trying out that example
[13:05:17] <isaranto>	 yy we'll have to try out a lot of dirty stuff to debug
[13:05:56] <isaranto>	 we need to setup our environment to allow for faster experimentation instead of going through CI/CD
[13:07:29] <wikibugs>	 (03PS2) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583)
[13:08:21] <wikibugs>	 (03CR) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[13:09:22] <elukey>	 isaranto: definitely, not sure how yet
[13:09:29] <elukey>	 updated the patch with the new horror
[13:11:44] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "Let's try!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[13:12:16] <isaranto>	 I need to rename also everything related with bloom to something more generic e.g. transformer llm
[13:12:38] <isaranto>	 i'm referring to deployment pipelines and python classes
[13:20:03] <wikibugs>	 (03PS3) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583)
[13:20:17] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] bloom: move gpu pytorch device context bootstrap to load() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[13:20:34] <wikibugs>	 (03CR) 10Elukey: bloom: move gpu pytorch device context bootstrap to load() [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928464 (https://phabricator.wikimedia.org/T334583) (owner: 10Elukey)
[13:20:58] <elukey>	 not sure if I am in time, I wanted to change the commit msg
[13:51:37] <isaranto>	 elukey: I am currently working on mediawiki extension again, probably tomorrow as well. I think next week I can focus a bit on llms along with ores-legacy
[13:52:24] <elukey>	 yesyes sure!
[14:01:20] <elukey>	 isaranto: the invalid pointer issue is fixed, but there is an error in the code
[14:37:50] <wikibugs>	 (03PS1) 10Elukey: bloom: fix model reference missing self [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928567
[14:41:56] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] "👍" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928567 (owner: 10Elukey)
[14:42:59] <wikibugs>	 (03Merged) 10jenkins-bot: bloom: fix model reference missing self [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928567 (owner: 10Elukey)
[15:03:28] <wikibugs>	 10Machine-Learning-Team, 10serviceops: Replace the current recommendation-api service with a newer version - https://phabricator.wikimedia.org/T338471 (10elukey)
[15:03:58] <isaranto>	 elukey: I updated the image with the above change
[15:05:08] <elukey>	 isaranto: ack, testing it manually, if it works I'll merge 
[15:05:19] <elukey>	 fingers crosse
[15:05:21] <elukey>	 *crossed
[15:07:54] <isaranto>	 always!
[15:09:08] <isaranto>	 it works!
[15:09:28] <elukey>	 woooooww
[15:10:02] <elukey>	 and it is way faster
[15:10:05] <elukey>	 like 10x
[15:10:33] <elukey>	 \o/
[15:11:04] <isaranto>	 if u try with a bigger result length the difference is significant
[15:11:10] <isaranto>	 e.g. 15 seconds vs ~2s
[15:12:47] <isaranto>	 shall we go for the bloom3b now?
[15:16:20] <wikibugs>	 (03PS1) 10AikoChou: events: remove content_slots field from prediction classification event [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928583 (https://phabricator.wikimedia.org/T328899)
[15:18:34] <elukey>	 isaranto: sure!
[15:19:20] <isaranto>	 great milestone, congrats! 🎉
[15:19:58] <elukey>	 merged your change
[15:20:05] <elukey>	 well congrats to us! 
[15:21:53] <elukey>	 isaranto: do you have a min to report in the task the results?
[15:22:02] <isaranto>	 sure
[15:22:21] <aiko>	 wow that's great!! \o/
[15:22:34] <wikibugs>	 10Machine-Learning-Team, 10serviceops: Replace the current recommendation-api service with a newer version - https://phabricator.wikimedia.org/T338471 (10akosiaris) Adding some more information:  The service was maintained by @bmansurov. It was deployed on the scb cluster. I am the one that moved it to Wikikub...
[15:25:59] <wikibugs>	 10Machine-Learning-Team, 10Patch-For-Review: Host open source LLM (bloom, etc.) on Lift Wing - https://phabricator.wikimedia.org/T333861 (10isarantopoulos) We have successfully deployed bloom-560m with and without GPU on LiftWing 🎉  Preliminary results show an out-of-the-box (without additional inference optim...
[15:26:03] <wikibugs>	 10Machine-Learning-Team, 10serviceops: Replace the current recommendation-api service with a newer version - https://phabricator.wikimedia.org/T338471 (10akosiaris) Oh I forgot to add that we have https://meta.wikimedia.org/wiki/Recommendation_API for explaining what it is. Finally the referers in turnilo impl...
[15:27:07] <wikibugs>	 10Machine-Learning-Team, 10serviceops: Replace the current recommendation-api service with a newer version - https://phabricator.wikimedia.org/T338471 (10akosiaris) My personal take is btw that is unowned. I 'd say Code Stewardship request and maybe it's enough of a lost cause that we undeploy it?
[15:35:50] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: fix: rename bloom to llm [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928591
[15:38:18] <isaranto>	 I finally did the rename , I couldn't see that anymore
[15:39:03] <elukey>	 ahahahah
[15:39:29] <elukey>	 does it need a CI change too?
[15:39:39] <elukey>	 as follow up step we'll also need to clean up the old images
[15:39:44] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "Will w8 for https://gerrit.wikimedia.org/r/c/integration/config/+/928592 to be merged first" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928591 (owner: 10Ilias Sarantopoulos)
[15:40:18] <isaranto>	 yes I did the CI change here -> https://gerrit.wikimedia.org/r/c/integration/config/+/928592
[15:40:28] <isaranto>	 will w8 for that one to be merged first
[15:40:32] <elukey>	 super
[15:40:58] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] fix: rename bloom to llm [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/928591 (owner: 10Ilias Sarantopoulos)
[15:46:01] <wikibugs>	 10Machine-Learning-Team, 10serviceops: Replace the current recommendation-api service with a newer version - https://phabricator.wikimedia.org/T338471 (10elukey) Thanks for the info!  >>! In T338471#8914520, @akosiaris wrote: > Oh I forgot to add that we have https://meta.wikimedia.org/wiki/Recommendation_API...
[15:46:13] <isaranto>	 Going afk o/
[15:46:26] <elukey>	 o/
[15:46:33] <elukey>	 isaranto: shall I try bloom-3b?
[15:46:36] <elukey>	 to see the mess
[15:47:01] <isaranto>	 Ofc
[15:47:10] * elukey doing it
[15:50:49] <elukey>	 I'll try https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/928593
[16:08:33] <elukey>	 bloom-3b works on gpu!
[16:10:12] <klausman>	 Very nice!
[16:10:32] <klausman>	 what's the difference in latency?
[16:11:06] <elukey>	 Ilias wrote something in the task for 506m, it is several x 
[16:11:11] <elukey>	 really noticeable
[16:11:58] <elukey>	 for 3b I tried to generate something with result len 100: with gpu ~7s, without 1:30 mins
[16:12:01] <elukey>	 ahahahah
[16:12:16] <elukey>	 very happy
[16:13:03] <klausman>	 very nice!
[16:13:05] <wikibugs>	 10Machine-Learning-Team, 10Epic: Experiment with GPUs in the Machine Learning infrastructure - https://phabricator.wikimedia.org/T333462 (10isarantopoulos) We have successfully deployed bloom-560m with and without GPU on LiftWing 🎉 Preliminary results show an out-of-the-box (without additional inference optimi...
[16:15:27] <wikibugs>	 10Machine-Learning-Team, 10Epic: Experiment with GPUs in the Machine Learning infrastructure - https://phabricator.wikimedia.org/T333462 (10elukey) bloom-3b works as well! Tried to generate 100 tokens: with GPU ~7s, without 1:30 mins :D
[16:19:01] <isaranto>	 \o/
[16:19:33] <isaranto>	 I ran for 200 result length - without GPu ~3m - with GPU 13s
[16:20:05] <isaranto>	 for the specific model it seems proportional to the output requested
[16:21:28] <elukey>	  isaranto: we are probably ready for falcon :D
[16:22:03] <elukey>	 MOAR parameters
[16:24:20] <elukey>	 going afk folks, have a nice rest of the day!
[16:24:42] <klausman>	 l\o
[17:06:45] <wikibugs>	 (03CR) 10DannyS712: [C: 03+2] Replace deprecated MWException [extensions/ORES] - 10https://gerrit.wikimedia.org/r/928534 (https://phabricator.wikimedia.org/T328220) (owner: 10Daimona Eaytoy)
[17:28:41] <wikibugs>	 (03Merged) 10jenkins-bot: Replace deprecated MWException [extensions/ORES] - 10https://gerrit.wikimedia.org/r/928534 (https://phabricator.wikimedia.org/T328220) (owner: 10Daimona Eaytoy)
[21:06:50] <wikibugs>	 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399 (10Ottomata) Alright, in the latest patch for including redirect page link...