[05:21:12] <isaranto>	 Good morning!
[06:48:47] <wikibugs>	 06Machine-Learning-Team: 14Add pyopencl requirements to images that use resource_utils - 14https://phabricator.wikimedia.org/T360212#9678582 (10isarantopoulos) 05Open→03Resolved
[06:49:38] <wikibugs>	 06Machine-Learning-Team, 06Wikipedia-Android-App-Backlog: 14Investigate increased preprocessing latencies on LW of article-descriptions model - 14https://phabricator.wikimedia.org/T358195#9678584 (10isarantopoulos) 05Open→03Resolved
[08:13:20] <aiko>	 morning folks :) 
[08:26:19] <isaranto>	 hey aiko o/
[08:30:24] <isaranto>	 I think I have never built so many docker images in my life before :D
[09:30:40] <wikibugs>	 06Machine-Learning-Team, 13Patch-For-Review: Create a Pytorch base image - https://phabricator.wikimedia.org/T360638#9679204 (10isarantopoulos) The above "issue" with numpy seems that it is not an issue after all. Numpy was removed as a requirement after torch 1.9 but they do maintain an aggressive warning as...
[11:26:46] <isaranto>	 I've made a "breakthrough" on the hf image size and the docker layers -> https://phabricator.wikimedia.org/T357986#9679664
[11:26:51] <isaranto>	 let me know what you think
[11:30:58] <elukey>	 o/
[11:49:24] <isaranto>	 Hey Luca!
[11:50:18] <isaranto>	 I'm going afk folks, will be back to check later
[11:54:33] <elukey>	 ttl!
[12:49:28] <wikibugs>	 06Machine-Learning-Team, 13Patch-For-Review: Use Huggingface model server image for HF LLMs - https://phabricator.wikimedia.org/T357986#9679664 (10isarantopoulos) I've built an image by explicitly defining all the requirements (instead of pip resolving the dependencies by its own). This resulted in a reduced i...
[12:51:08] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 10ORES, 10ChangeProp, and 5 others: Selectively disable changeprop functionality that is no longer used - https://phabricator.wikimedia.org/T361483#9679703 (10elukey) Hello!  There `ores_cache` job should be defined but disabled in the running config, we don't use it a...
[13:01:09] <wikibugs>	 06Machine-Learning-Team, 13Patch-For-Review: Use Huggingface model server image for HF LLMs - https://phabricator.wikimedia.org/T357986#9679991 (10elukey) I have seen the same behavior, namely pip trying to download the torch's cpu version and ending up only installing nvidia-related packages. I like the expli...
[13:10:03] <chrisalbon>	 Morning all
[13:11:45] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 10ORES, 10ChangeProp, and 5 others: Selectively disable changeprop functionality that is no longer used - https://phabricator.wikimedia.org/T361483#9680024 (10akosiaris) >>! In T361483#9679703, @elukey wrote: > Hello! >  > There `ores_cache` job should be defined but d...
[13:20:23] <wikibugs>	 (03PS1) 10AikoChou: revertrisk: error handling for batch requests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1016341 (https://phabricator.wikimedia.org/T360406)
[13:22:50] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 10ORES, 10ChangeProp, and 5 others: Selectively disable changeprop functionality that is no longer used - https://phabricator.wikimedia.org/T361483#9680093 (10elukey) >>! In T361483#9680024, @akosiaris wrote: >>>! In T361483#9679703, @elukey wrote: >> Hello! >>  >> The...
[13:26:01] <aiko>	 hi Chris o/
[13:52:50] <chrisalbon>	 My apologies for the last minute notice, my wife needs to go to the hospital (nothing super serious, she woke up with a painful ear infection) so I have to cancel today's team meeting and my 1:1s after wards.
[13:55:58] <elukey>	 chrisalbon: +1 np!
[13:56:34] <chrisalbon>	 Thanks elukey, sorry. I hate when I cancel
[13:57:15] <elukey>	 np! Take care!
[14:02:07] <aiko>	 np!
[14:47:46] <wikibugs>	 (03CR) 10Umherirrender: [C:04-1] "looks good, small issues to improve" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1007862 (https://phabricator.wikimedia.org/T358831) (owner: 10MPGuy2824)
[14:50:57] <wikibugs>	 (03CR) 10Kevin Bazira: "The expected results are returned when I run the base_model:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1016341 (https://phabricator.wikimedia.org/T360406) (owner: 10AikoChou)
[15:43:22] <isaranto>	 o/ 
[15:43:24] <isaranto>	 back!
[15:46:19] <elukey>	 o/
[15:52:24] <elukey>	 isaranto: I modified the base image to install a couple of apt packages as well
[15:54:04] <isaranto>	 I'm looking at it now
[15:54:49] <isaranto>	 elukey: in my patch I have just copied what you have done, shall I keep also the comments etc? referring to this https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1015297/5/images/amd/pytorch21/Dockerfile.template
[15:55:15] <elukey>	 sorry just fixed it, it didn't work at first :D
[15:55:37] <elukey>	 yeah I think we should, it is sad that we have to repeat all that blurb though
[15:56:03] <elukey>	 maybe we could have a base image that does everything, and smaller dockerfiles that just install pytorch?
[16:01:11] <wikibugs>	 06Machine-Learning-Team, 13Patch-For-Review: Use Huggingface model server image for HF LLMs - https://phabricator.wikimedia.org/T357986#9681165 (10isarantopoulos) I believe we would have the same issue even with poetry in the following scenario:     1. We have torch-rocm installed in the base image   2. the pa...
[16:02:45] <isaranto>	 good idea to include the apt packages!
[16:04:37] <isaranto>	 > maybe we could have a base image that does everything, and smaller dockerfiles that just install pytorch?
[16:04:37] <isaranto>	 the smaller image could be named something similar to python-bookworm or liftwing-bookworm (?)
[16:07:04] <wikibugs>	 (03CR) 10AikoChou: "Thanks for testing it! Just to confirm, did you set USE_BATCHER=True when you ran the batch model server? It seems like the error is occur" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1016341 (https://phabricator.wikimedia.org/T360406) (owner: 10AikoChou)
[16:26:43] <isaranto>	 elukey: so I'll w8 for your patch to be merged and then we can merge the pytorch2.1 patch and the hf one, right?
[16:27:09] <elukey>	 ack!
[16:27:34] <isaranto>	 regarding the base image we'd still have to add the symbolic links after we pip install torch though
[16:28:34] <elukey>	 yeah
[16:28:43] <elukey>	 we can live with copy atm
[16:28:51] <elukey>	 I'll try to sort it out tomorrow
[16:28:59] <elukey>	 going afk for today folks! Have a nice rest of the day :)
[16:30:56] <isaranto>	 ciao elukey , have a nice evening!
[16:32:57] <wikibugs>	 (03CR) 10Kevin Bazira: "I have run it with `USE_BATCHER=True` using the 2 scenarios below and still got the same error:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1016341 (https://phabricator.wikimedia.org/T360406) (owner: 10AikoChou)
[17:10:12] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "@kevinbazira you'd have to se the USE_BATCHER=true when starting the model server as it is an env var used" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1016341 (https://phabricator.wikimedia.org/T360406) (owner: 10AikoChou)
[17:41:22] <wikibugs>	 (03CR) 10Kevin Bazira: "Thank you for the clarification Ilias. I have used the commands below and batch_model run successfully:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1016341 (https://phabricator.wikimedia.org/T360406) (owner: 10AikoChou)
[18:04:14] <wikibugs>	 06Machine-Learning-Team, 13Patch-For-Review: Use Huggingface model server image for HF LLMs - https://phabricator.wikimedia.org/T357986#9681807 (10isarantopoulos) I have built the "final" image that also includes `vllm==0.2.7` which will be used for GPU inference optimization. The final size is 11.4GB and cont...
[18:05:33] <isaranto>	 logging off folks, have a nice evening!