[03:07:41] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work, 13Patch-For-Review: Request to host article-country model on Lift Wing - https://phabricator.wikimedia.org/T371897#10382107 (10kevinbazira) @Isaac thank you for sharing feedback and suggesting improvements. We have: - handled empty article-country results g... [03:08:30] o/ In order to share this --^ communication with Isaac to test the latest endpoint, I edited the deployment config for the article-country isvc directly in the experimental ns. [03:08:30] I pushed a patch for this change here: https://gerrit.wikimedia.org/r/1100571 [04:11:34] o/ +1 [05:43:36] 10Lift-Wing, 06Machine-Learning-Team: [LLM] quantization: allow loading model weights as int8/int4 with HF - https://phabricator.wikimedia.org/T377848#10382293 (10achou) In short, I was able to run inference using quantized models, but I’m still figuring out the right settings for the Aya model on ROCm GPU.... [07:17:46] thanks for the review, Aiko! [07:17:47] I've merged and synced the deployment. [08:19:11] Hello! [08:22:26] hello o/ [08:22:51] sth is wrong with my irc :) [08:23:16] kevinbazira: I deployed article-country to prod (eqiad& codfw yesterday) [08:23:55] I just synced the patch you had created [08:24:00] isaranto: o/ thanks for the update! [08:28:14] ok, I'm back as isaranto [08:33:42] 10Lift-Wing, 06Machine-Learning-Team: [LLM] quantization: allow loading model weights as int8/int4 with HF - https://phabricator.wikimedia.org/T377848#10382470 (10isarantopoulos) > I'm now testing an alternative way to load the quantized model: using AutoModelForCausalLM.from_pretrained with quantization_confi... [10:22:23] 10Lift-Wing, 06Machine-Learning-Team: [LLM] quantization: allow loading model weights as int8/int4 with HF - https://phabricator.wikimedia.org/T377848#10382809 (10kevinbazira) **GPTQ** I tested the inference performance of both non-quantized and [[ https://huggingface.co/docs/transformers/main/quantization/gp... [10:23:37] I've finally added the GPTQ update: https://phabricator.wikimedia.org/T377848#10382809 [10:23:37] tl;dr: I ran huggingface optimum benchmarks and compared the mean latency during each inference stage of the `aya-expanse-8b` model in its non-quantized version versus the GPTQ-4bit quantized version. The quantized model consistently shows slower inference speeds compared to its non-quantized counterpart. [10:51:16] thanks for the nice update! could you add the benchmark results as well as a short guide on how to run the benchmark ? (just a chain of commands would do for now) [10:51:50] if it suits us we can then transfer this to a readme in the infservices repo and wikitech [11:15:28] * isaranto afk - bbl [12:30:23] 06Machine-Learning-Team: Debian hipcc package conflicts with hipcc from AMD's ROCm repository - https://phabricator.wikimedia.org/T381567#10383111 (10MoritzMuehlenhoff) One option would be to simply pin hipcc to a specific version, this should do the trick: ` package { 'hipcc': ensure => '1.0.0.60100-82~22.... [12:42:26] 06Machine-Learning-Team, 13Patch-For-Review: Debian hipcc package conflicts with hipcc from AMD's ROCm repository - https://phabricator.wikimedia.org/T381567#10383130 (10klausman) >>! In T381567#10383111, @MoritzMuehlenhoff wrote: > One option would be to simply pin hipcc to a specific version, this should do... [13:10:15] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Edit-Review-Improvements-RC-Page, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: [SPIKE] How could we add topic filtering to Recent Changes? - https://phabricator.wikimedia.org/T381569 (10Samwalton9-WMF) 03NEW [13:32:25] * isaranto back! [13:38:07] isaranto: and I have review fopor you :) [13:38:15] fopor?! [13:38:21] o/ great! [13:38:23] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1100813 [13:38:40] fopor = new abbreviation :P [13:38:42] Turns out the hipcc fix was much simpler than feared. Now I just need to clean up the env stuff [13:40:18] I tried hipconfig just now it can't be found (doesnt exist under /usr/bin) [13:40:38] just noting it, if it is expected as you're working on it plz diregard [13:40:39] yes, you need to relogin, the env is only set on login [13:42:04] 06Machine-Learning-Team: Debian hipcc package conflicts with hipcc from AMD's ROCm repository - https://phabricator.wikimedia.org/T381567#10383276 (10klausman) `$ hipconfig --check HIP version : 6.1.40091-a8dbc0c19 == hipconfig HIP_PATH : /opt/rocm-6.1.0 ROCM_PATH : /opt/rocm-6.1.0 HIP_COMPILER : clang... [13:42:30] ack, I just saw the patch as well so it makes sense [14:34:19] could I get a review here when somebody has some time ? https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1100441 [14:34:21] thanks! [14:53:07] I've +1'ed [14:53:17] (03CR) 10Kevin Bazira: [C:03+1] llm: add aya with bitsandbytes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100441 (https://phabricator.wikimedia.org/T379052) (owner: 10Ilias Sarantopoulos) [15:05:28] 06Machine-Learning-Team: Debian hipcc package conflicts with hipcc from AMD's ROCm repository - https://phabricator.wikimedia.org/T381567#10383618 (10klausman) [16:57:42] (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: add aya with bitsandbytes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100441 (https://phabricator.wikimedia.org/T379052) (owner: 10Ilias Sarantopoulos) [16:59:20] thanks Kevin! [17:00:30] (03Merged) 10jenkins-bot: llm: add aya with bitsandbytes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1100441 (https://phabricator.wikimedia.org/T379052) (owner: 10Ilias Sarantopoulos) [17:45:31] (03CR) 10Nik Gkountas: "The requests I tested successfully:" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1100512 (owner: 10Nik Gkountas) [19:08:56] * isaranto afk! [19:44:41] (03PS2) 10Nik Gkountas: Use strategy pattern to support different recommendation usecases [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1100512 (https://phabricator.wikimedia.org/T381366) [19:45:10] (03CR) 10Nik Gkountas: Use strategy pattern to support different recommendation usecases (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1100512 (https://phabricator.wikimedia.org/T381366) (owner: 10Nik Gkountas)