[00:09:36] <wikibugs>	 (03PS2) 10Kevin Bazira: article-country: normalize sums using a fixed minimum sum of 1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100009 (https://phabricator.wikimedia.org/T371897)
[00:12:07] <wikibugs>	 (03CR) 10Kevin Bazira: article-country: normalize sums using a fixed minimum sum of 1 (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100009 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[00:38:40] <wikibugs>	 (03PS2) 10Kevin Bazira: article-country: return wikidata_properties as a list [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099524 (https://phabricator.wikimedia.org/T371897)
[00:40:45] <wikibugs>	 (03CR) 10Kevin Bazira: article-country: return wikidata_properties as a list (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099524 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[01:17:04] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+1] "was able to build this image locally without issues." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100078 (owner: 10Ilias Sarantopoulos)
[04:27:44] <aiko>	 morning folks :) 
[04:42:34] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: [LLM] quantization: allow loading model weights as int8/int4 with HF - https://phabricator.wikimedia.org/T377848#10378230 (10achou) **AWQ**  I was able to quantize aya-expanse-8b using AWQ. The quantized model is saved in my home directory at `/home/aikochou/aya-expanse-8b...
[04:50:57] <aiko>	 running an example from the AWQ repo that loads Meta-Llama-3.1-8B-Instruct-AWQ-INT4 for inference, but encountered a permission error
[04:50:59] <aiko>	 PermissionError: [Errno 13] Permission denied: '/srv/hf-cache/hub/.locks/models--hugging-quants--Meta-Llama-3.1-8B-Instruct-AWQ-INT4/db88166e2bc4c799fd5d1ae643b75e84d03ee70e.lock'
[04:53:43] <aiko>	 has anyone seen this error? I'm using my hf token and haven't had this problem downloading other models before 
[05:01:54] <wikibugs>	 (03PS6) 10Santhosh: performance: Use asynchronous iterator for fetching from collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1100055 (https://phabricator.wikimedia.org/T381366)
[05:02:25] <wikibugs>	 (03CR) 10Santhosh: performance: Use asynchronous iterator for fetching from collections (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1100055 (https://phabricator.wikimedia.org/T381366) (owner: 10Santhosh)
[08:23:48] <kevinbazira>	 o/
[08:25:14] <kevinbazira>	 I've run latency benchmarks for aya-expanse-8b:
[08:25:14] <kevinbazira>	 1. non-quantized succeeded: https://phabricator.wikimedia.org/P71521
[08:25:14] <kevinbazira>	 2. GPTQ quantized failed: https://phabricator.wikimedia.org/P71523 
[08:25:14] <kevinbazira>	 looks like it's failing because of a memory issue:
[08:25:14] <kevinbazira>	 ```
[08:25:14] <kevinbazira>	 Memory access fault by GPU node-1 (Agent handle: 0xbd86680) on address 0x6f3ef82bd000. Reason: Unknown.
[08:25:14] <kevinbazira>	 Fatal Python error: Aborted
[08:25:15] <kevinbazira>	 ```
[08:25:18] <isaranto>	 good morning!
[08:40:01] <isaranto>	 Looking at the errors above --^
[08:43:14] <aiko>	 kevinbazira: I encountered the same error https://phabricator.wikimedia.org/T377848#10378230
[08:49:41] <kevinbazira>	 aiko: o/ interesting ...
[08:51:11] <isaranto>	 lets talk about these in a bit then
[08:51:16] <kevinbazira>	 okok
[09:27:09] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: use torch base image and update deps [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100078 (owner: 10Ilias Sarantopoulos)
[09:29:57] <wikibugs>	 (03Merged) 10jenkins-bot: llm: use torch base image and update deps [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100078 (owner: 10Ilias Sarantopoulos)
[09:39:37] <wikibugs>	 (03PS4) 10Ilias Sarantopoulos: llm: move dir under src/models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099166 (https://phabricator.wikimedia.org/T369344)
[09:40:23] <wikibugs>	 (03CR) 10CI reject: [V:04-1] llm: move dir under src/models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099166 (https://phabricator.wikimedia.org/T369344) (owner: 10Ilias Sarantopoulos)
[09:44:33] <kevinbazira>	 aiko: in the meeting, you asked whether I was able to run inference after the GPU restart.
[09:44:33] <kevinbazira>	 I've tested and both `CohereForAI/aya-expanse-8b` and `2z299/aya-expanse-8b-GPTQ-4bit` are running fine, just like before the restart.
[09:44:53] <aiko>	 thanks Kevin!!
[09:46:09] <kevinbazira>	 okok
[09:48:55] <aiko>	 ahh I think found it!! \o/
[09:49:00] <aiko>	 I set the device_map to {'': 'cuda:0'}, then everything works!
[09:49:19] <aiko>	 it was set to 'auto'
[09:51:27] <isaranto>	 ok!
[09:55:18] <wikibugs>	 (03PS5) 10Ilias Sarantopoulos: llm: move dir under src/models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099166 (https://phabricator.wikimedia.org/T369344)
[09:56:03] <wikibugs>	 (03CR) 10CI reject: [V:04-1] llm: move dir under src/models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099166 (https://phabricator.wikimedia.org/T369344) (owner: 10Ilias Sarantopoulos)
[09:56:38] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] article-country: return wikidata_properties as a list (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099524 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[09:56:41] <wikibugs>	 (03PS3) 10Kevin Bazira: article-country: return wikidata_properties as a list [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099524 (https://phabricator.wikimedia.org/T371897)
[09:57:27] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] article-country: normalize sums using a fixed minimum sum of 1 (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100009 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[09:57:30] <wikibugs>	 (03PS3) 10Kevin Bazira: article-country: normalize sums using a fixed minimum sum of 1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100009 (https://phabricator.wikimedia.org/T371897)
[09:59:16] <wikibugs>	 (03PS6) 10Ilias Sarantopoulos: llm: move dir under src/models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099166 (https://phabricator.wikimedia.org/T369344)
[10:07:26] <isaranto>	 finally looked at this carefully and fixed it https://gerrit.wikimedia.org/r/1099166
[10:07:40] <isaranto>	 just a move of the llm dir under src/models/
[10:22:31] <klausman>	 device_map sure seems more and more magic to me :)
[10:23:22] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] article-country: return wikidata_properties as a list [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099524 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[10:24:08] <wikibugs>	 (03Merged) 10jenkins-bot: article-country: return wikidata_properties as a list [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099524 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[10:29:16] <wikibugs>	 (03PS4) 10Kevin Bazira: article-country: normalize sums using a fixed minimum sum of 1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100009 (https://phabricator.wikimedia.org/T371897)
[10:31:10] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] article-country: normalize sums using a fixed minimum sum of 1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100009 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[10:31:56] <wikibugs>	 (03Merged) 10jenkins-bot: article-country: normalize sums using a fixed minimum sum of 1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100009 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[10:39:55] <isaranto>	 I think that device_map='auto' is supposed to be magic by design :)
[10:41:15] <klausman>	 Something about sorcerer's apprentices and buckets and brooms :)
[11:25:50] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+1] llm: move dir under src/models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099166 (https://phabricator.wikimedia.org/T369344) (owner: 10Ilias Sarantopoulos)
[11:27:17] <wikibugs>	 (03PS7) 10Ilias Sarantopoulos: llm: move dir under src/models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099166 (https://phabricator.wikimedia.org/T369344)
[11:33:00] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: move dir under src/models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099166 (https://phabricator.wikimedia.org/T369344) (owner: 10Ilias Sarantopoulos)
[11:33:45] <wikibugs>	 (03Merged) 10jenkins-bot: llm: move dir under src/models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1099166 (https://phabricator.wikimedia.org/T369344) (owner: 10Ilias Sarantopoulos)
[11:58:15] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: llm: change local and docker dir to be the same [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100432
[12:06:00] <wikibugs>	 (03PS2) 10Ilias Sarantopoulos: llm: change local and docker dir to be the same [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100432
[12:34:21] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: (WIP) llm: add aya with bitsandbytes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100441
[12:34:49] * isaranto afk lunch o clock
[12:35:05] <wikibugs>	 (03CR) 10CI reject: [V:04-1] (WIP) llm: add aya with bitsandbytes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100441 (owner: 10Ilias Sarantopoulos)
[13:52:33] <kevinbazira>	 I ran 2 `llama-3.1-8b-instruct` models on ml-lab GPUs with the same prompt, and here are their inference speeds:
[13:52:33] <kevinbazira>	 1. non-quantized: ~6s as shown in https://phabricator.wikimedia.org/P71538
[13:52:33] <kevinbazira>	 2. GPTQ quantized: <25s as shown in https://phabricator.wikimedia.org/P71539
[13:52:33] <kevinbazira>	 these results are similar to what I got with `aya-expanse`
[14:25:47] <isaranto>	 I see that uses a version of the model available from huggingface. did you try to quantized the llama/aya model yourself using gptq and the c4 dataset?
[14:27:39] <isaranto>	 I mean to follow the process documented here https://huggingface.co/docs/transformers/main/quantization/gptq#gptq
[14:30:54] <isaranto>	 please add your findings in the task https://phabricator.wikimedia.org/T377848
[14:38:28] <wikibugs>	 (03PS5) 10Ilias Sarantopoulos: llm: add aya with bitsandbytes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100441 (https://phabricator.wikimedia.org/T379052)
[16:57:36] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+1] llm: change local and docker dir to be the same [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100432 (owner: 10Ilias Sarantopoulos)
[16:58:26] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: change local and docker dir to be the same [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100432 (owner: 10Ilias Sarantopoulos)
[16:59:50] <aiko>	 logging off today! have a nice evening folks :)
[17:06:25] <wikibugs>	 (03Merged) 10jenkins-bot: llm: change local and docker dir to be the same [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100432 (owner: 10Ilias Sarantopoulos)
[17:14:58] <isaranto>	 night Aiko! logging off as well folks, cu tomorrow o/
[17:45:53] <chrisalbon>	 night all!
[18:11:30] <wikibugs>	 (03PS1) 10Nik Gkountas: Use strategy pattern to support different recommendation usecases [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1100512