[00:43:25] <wikibugs>	 (03CR) 10Eamedina: [C:03+1] fix return type for all __hash__ methods to be int [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1088377 (owner: 10Nik Gkountas)
[00:44:42] <wikibugs>	 (03CR) 10Eamedina: [C:03+1] remove level 1 and 2 pages from "Vital articles" default collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1088382 (https://phabricator.wikimedia.org/T374597) (owner: 10Nik Gkountas)
[05:05:31] <wikibugs>	 (03PS4) 10Kevin Bazira: article-country: update response schema [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088214 (https://phabricator.wikimedia.org/T371897)
[05:06:53] <wikibugs>	 (03CR) 10CI reject: [V:04-1] article-country: update response schema [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088214 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[05:09:35] <wikibugs>	 (03PS5) 10Kevin Bazira: article-country: update response schema [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088214 (https://phabricator.wikimedia.org/T371897)
[05:12:33] <wikibugs>	 (03CR) 10Kevin Bazira: article-country: update response schema (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088214 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[08:32:45] <isaranto>	 mooorning o/
[08:49:23] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] "LGTM, thanks!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088214 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[08:59:08] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] "Thanks for the reviews :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088214 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[08:59:54] <wikibugs>	 (03Merged) 10jenkins-bot: article-country: update response schema [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1088214 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[11:55:08] * isaranto afk lunch
[14:05:45] <wikibugs>	 (03PS10) 10Sbisson: API Continue support [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1061713 (https://phabricator.wikimedia.org/T379037) (owner: 10Santhosh)
[14:06:10] <wikibugs>	 (03CR) 10Sbisson: API Continue support (032 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1061713 (https://phabricator.wikimedia.org/T379037) (owner: 10Santhosh)
[16:15:40] <isaranto>	 klausman: o/ Is there any possibility that we could increase memory in experimental ml-staging-codfw to 64GB?
[16:16:08] <isaranto>	 I want to deploy a bigger model and it is failing (getting OOMkilled)
[16:16:27] <isaranto>	 otherwise we can do it next week :D
[16:21:58] <elukey>	 a single pod? :D
[16:23:35] <wikibugs>	 06Machine-Learning-Team, 06Data-Platform-SRE, 05Goal: Goal 2:  People outside the ML team can ssh into an ml-lab machine, run a Jupyter Notebook, and run PyTorch powered by a GPU. - https://phabricator.wikimedia.org/T371396#10304516 (10Ottomata)
[16:31:06] <isaranto>	 oui, a single pod :D 
[16:36:45] <elukey>	 it should be possible, it is just a matter of adding the pods limit ranges for experimental
[16:37:07] <elukey>	 assuming you have a host with 64G of ram to allocate
[16:37:15] <elukey>	 if not the kube scheduler will be really sad
[16:47:09] * isaranto nods
[16:48:29] <isaranto>	 I know, I was just asking if we could do it "on the fly" for experimental but I see the values are in admin_ng so I guess it would be best to go through CI/CD
[16:51:36] <isaranto>	 I'll check the resources - since I don't have permission to view the nodes is there any way to tell the allocatable memory of a node? from grafana I can see the sum (which is 644GB) https://grafana.wikimedia.org/d/pz5A-vASz/kubernetes-resources?orgId=1&var-ds=thanos&var-site=codfw&var-prometheus=k8s-mlstaging
[16:52:56] <elukey>	 you have the new pod already scheduled right?
[16:53:16] <elukey>	 aya23-predictor ?
[16:53:50] <isaranto>	 I just did - I had another one that was failing
[16:53:56] <isaranto>	 yes aya23-predictor
[16:54:44] <isaranto>	 I mean I just created a new revision (0007) which has 64Gi
[16:55:22] <elukey>	 bumped limitranges to 70GB
[16:55:27] <elukey>	 for experimental I mean
[16:55:47] <isaranto>	 that is awesome, thank you
[16:55:49] <elukey>	 seems to have found home at ml-staging2001.codfw.wmnet
[16:56:25] <isaranto>	 I guess k8s devs never expected this would be a use case when they first thought of k8s :D
[16:57:52] <elukey>	 I am curious to see how much time it takes to bootstrap
[16:58:06] <elukey>	 does it take so much memory because it loads a huge model?
[16:58:53] <isaranto>	 yes, model is 61GB on disk
[16:59:28] <elukey>	 ahahahah
[16:59:42] <elukey>	 I guess that maybe 70GB could not be enough
[17:00:02] <isaranto>	 https://huggingface.co/CohereForAI/aya-expanse-32b
[17:00:05] <isaranto>	 it is this one
[17:00:33] <elukey>	 TIL cohere for AI
[17:06:58] <elukey>	 readiness probe failed, it probably needs the longer probes
[17:07:59] <isaranto>	 btw in the end I don't think we'll be needing that much pod memory as we'd work in loading directly to GPU (or stream the weights from cpu to gpu)
[17:08:29] <elukey>	 that would be great yes
[17:09:43] <isaranto>	 yes, there is no reason to occupy resources that are used just for model load
[17:10:01] <isaranto>	 also -> https://www.amd.com/en/developer/resources/technical-articles/introducing-the-first-amd-1b-language-model.html
[17:10:30] <isaranto>	 I remember you share the first olmo model (by allenAI). I guess these models will work great on amd gpus :P 
[17:11:26] <elukey>	 \o/
[17:13:20] <isaranto>	 it seems that the server has an error and is restarting `2024-11-08 17:09:38.478 7 kserve ERROR [__main__.py:<module>():259] Failed to start model server: You can't move a model that has some modules offloaded to cpu or disk`
[17:13:32] <isaranto>	 I'll look into it, thank for helping Luca!
[17:18:18] <elukey>	 ack! good luck :)
[17:20:24] <isaranto>	 tbh for now I think I'll just revert everything
[17:37:22] <isaranto>	 https://m.mediawiki.org/wiki/Wikimedia_Hackathon_2025 !!
[17:46:48] <wikibugs>	 06Machine-Learning-Team: Test the feasibility of deployment of Aya-23 model in LiftWing - https://phabricator.wikimedia.org/T379052#10304826 (10isarantopoulos) I made a first attempt to deploy the 32B model on LiftWing and I'm dumping some notes for future reference:  It seems that the model couldn't fit on the...
[18:38:15] <isaranto>	 I uploaded the latest 8b aya model to deploy that one instead of aya23. https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1088609
[18:38:49] <isaranto>	 going afk have a nice weekend folks o/