[06:13:51] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+1] docs: add info how to use a newly released hf model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1056504 (owner: 10Ilias Sarantopoulos)
[06:24:59] <wikibugs>	 (03CR) 10Santhosh: [C:04-1] "I had to do a minor fix to get it running." [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1056556 (https://phabricator.wikimedia.org/T370746) (owner: 10Nik Gkountas)
[06:46:32] <isaranto>	 aloha!
[07:48:31] <isaranto>	 I instead of just writing a script to make requests to huggingface images I started adding a locust file we can use instead
[07:49:36] <isaranto>	 I thought it would be easier to reuse, although the input will probably be gibberish. I'm planning to provide variant input/output sizes
[07:50:56] <isaranto>	 Let me know if you have any ideas or suggestions on this
[08:26:39] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: [LLM] add locust entry for huggingfaceserver - https://phabricator.wikimedia.org/T370992 (10isarantopoulos) 03NEW
[08:43:15] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: (WIP) locust entry for hf [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1056872 (https://phabricator.wikimedia.org/T370992)
[08:45:12] <wikibugs>	 (03PS2) 10Ilias Sarantopoulos: (WIP) locust entry for hf [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1056872 (https://phabricator.wikimedia.org/T370992)
[09:24:53] <klausman>	 m,orning, and sorry for missing the standup. I was in the middle of a reboot cycle and missed the calendar notification
[09:26:01] <isaranto>	 morning Tobias! no worries
[09:26:37] <isaranto>	 klausman: I'm looking into some VRAM usage spikes that took place while I was making requests and this could be also the cause of the outage of the 27b model
[09:26:48] <isaranto>	 https://grafana.wikimedia.org/goto/gOeLKyXIR?orgId=1
[09:26:57] <isaranto>	 nothing to do just an fyi
[09:27:03] <klausman>	 ack
[09:27:06] <isaranto>	 I'll probably redeploy a couple of times
[09:28:00] <isaranto>	 aiko: kevinbazira: this is the updated kserve roadmap with a lot of work planned for GenAI inference https://github.com/kserve/kserve/pull/3810
[09:28:36] <kevinbazira>	 isaranto: o/ thanks for sharing. let me have a look
[09:57:57] <wikibugs>	 (03CR) 10Nik Gkountas: Add support for section translation recommendations (035 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1056556 (https://phabricator.wikimedia.org/T370746) (owner: 10Nik Gkountas)
[09:58:03] <wikibugs>	 (03PS4) 10Nik Gkountas: Add support for section translation recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1056556 (https://phabricator.wikimedia.org/T370746)
[10:11:36] <wikibugs>	 (03CR) 10Santhosh: [C:03+2] Add support for section translation recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1056556 (https://phabricator.wikimedia.org/T370746) (owner: 10Nik Gkountas)
[10:12:53] <wikibugs>	 (03Merged) 10jenkins-bot: Add support for section translation recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1056556 (https://phabricator.wikimedia.org/T370746) (owner: 10Nik Gkountas)
[10:21:05] * isaranto lunch
[10:24:29] * klausman too
[12:02:59] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: [LLM] add locust entry for huggingfaceserver - https://phabricator.wikimedia.org/T370992#10014330 (10isarantopoulos) I've managed to run the locust tests from stat1008 for `gemma2-9b-it` using the following process:     # Checkout the isvc repo and th...
[13:23:27] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "Apologies for the late reploy, I haven't managed to solve that yet :(" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1035044 (https://phabricator.wikimedia.org/T218132) (owner: 10Rockingpenny4)
[13:29:07] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: [articletopic-outlink] fetch data from mwapi using revid instead of article title - https://phabricator.wikimedia.org/T371021 (10isarantopoulos) 03NEW
[13:29:59] <isaranto>	 I created a task for the outlink model preprocessing. I requested a bit more information on the reasoning that would justify this change and the work involved and I'll update it once it is more clear
[13:43:17] <isaranto>	 tried to use the memory util in huggingface spaces but can't get it to work for gemma2 https://huggingface.co/spaces/hf-accelerate/model-memory-usage
[13:57:11] <isaranto>	 the guide for inference on amd gpus in hf seems that it is updated https://huggingface.co/docs/optimum/onnxruntime/usage_guides/amdgpu#accelerated-inference-on-amd-gpus although incomplete
[13:57:45] <isaranto>	 it uses optimum.onnxruntime which would be tricky to test using our current implementation
[14:01:18] <chrisalbon>	 Good morning
[14:30:36] <isaranto>	 \o morning Chris 
[14:43:00] <isaranto>	 so, putting the model on 2 gpus is as easy as assigning 2 gpus to the pod. I tried it and it works. However running inference isn't that simple as you can see in the related error https://phabricator.wikimedia.org/P66927
[14:43:56] <isaranto>	 by default everything (all tensors for model + input) is expected to be on the same device
[14:44:25] <isaranto>	 I'll open up a task with info so that we can add resources and track this properly
[14:45:32] <isaranto>	 it is probably sth we'll do 2-3 steps down the road from now but it seems that we'll need it
[14:45:38] <isaranto>	 thanks kevinbazira: for bringing this up today, it totally slipped my mind!
[14:55:08] <chrisalbon>	 Oh interesting!
[14:55:30] <chrisalbon>	 I definitely think it will be needed
[14:56:58] <chrisalbon>	 Perplexity runs on llama3 70b, we might get away with a smaller model but I doubt a 7b model will do everything product and research want
[15:01:20] <isaranto>	 I'm confident we can manage to fit 70b, brining down inference latency will be a different beast :)
[15:20:03] <wikibugs>	 (03PS1) 10KartikMistry: Fix accesslog file name [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1056966
[15:24:28] <wikibugs>	 (03CR) 10Abijeet Patro: [C:03+2] Fix accesslog file name [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1056966 (owner: 10KartikMistry)
[15:25:12] <wikibugs>	 (03Merged) 10jenkins-bot: Fix accesslog file name [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1056966 (owner: 10KartikMistry)
[16:03:43] <isaranto>	 going afk folks o/
[16:07:55] <klausman>	 \o
[16:59:48] <aiko>	 o/