[00:14:16] <groceryheist>	 cool!
[07:50:46] <isaranto>	 Good morning!
[07:51:09] <isaranto>	 GPUUUU!!!
[08:28:09] <elukey>	 morning!
[08:28:12] <elukey>	 wow nice!
[08:30:03] <klausman>	 Just one item:
[08:30:13] <klausman>	 [drm:amdgpu_pci_probe [amdgpu]] *ERROR* amdgpu requires firmware installed
[08:30:14] <klausman>	 amdgpu: See https://wiki.debian.org/Firmware for information about missing firmware
[08:30:20] <klausman>	 But that should be easily fixable
[08:30:30] <klausman>	 also, good morning :)
[08:30:44] <elukey>	 klausman: puppet will take care of that, we'll probably need to reboot though
[08:30:59] <elukey>	 there is a flag to install all the configs for a kubernetes node
[08:31:06] <elukey>	 if you want we can configure it now
[08:31:38] <klausman>	 As for the install: two GPUs would fit, with one smal and a bigger caveat. 1) DCops would need to order some more cabling. 2) heat output of the machine would likely mean that we can't have more than 1-2 multi-CPU machines on a rack
[08:32:05] <klausman>	 elukey: Is there a guide/docs on wikitech regarding that?
[08:32:29] <elukey>	 yeah IIRC 1) was anticipated by Papaul in another task, 2) was probably expected.. maybe supermicro will have some extra cooling solutions?
[08:32:47] <elukey>	 klausman: you can check puppet, ml-serve1001 already has one gpu (dse workers as well)
[08:32:53] <klausman>	 ack, will do
[08:34:20] <klausman>	 is it just `profile::amd_gpu::rocm_version: '54'`?
[08:35:59] <elukey>	 how can we verify if it is enough?
[08:36:56] <klausman>	 make a change, run pcc?
[08:37:24] <elukey>	 sure, but the puppet classes/profiles should also give us a hint if anything is needed or not
[08:37:51] <elukey>	 the above hiera setting is related to a profile, we can start from there
[08:38:29] <klausman>	 Well, role::ml_k8s::worker includes ::profile::amd_gpu, so we should be good
[08:39:16] <klausman>	 Since profile::amd_gpu has only if $rocm_version { as a gate, I doubt any other flags are needed
[08:39:18] <elukey>	 what role is used by the staging nodes?
[08:40:01] <klausman>	 role::ml_k8s::staging::worker which does not use the amd_gpu include, so we have to add that
[08:40:22] <elukey>	 ok perfect, then there is another bit in the profile::amd_gpu that is relevant for k8s
[08:40:29] <elukey>	 that needs to be taken into consideration as well
[08:40:52] <elukey>	 try to check what profile::amd_gpu does, we use it in two places
[08:40:52] <klausman>	 'profile::amd_gpu::allow_gpu_broader_access'?
[08:41:01] <elukey>	 1) regular nodes (like stat100x)
[08:41:04] <elukey>	 2) kubernetes nodes
[08:41:26] <klausman>	 'profile::amd_gpu::is_kubernetes_node' is also used
[08:41:43] <elukey>	 in 2) we deploy more things if you recall, like the gpu device plufin
[08:41:46] <elukey>	 *plugin
[08:41:48] <elukey>	 yes
[08:42:15] <elukey>	 and you can see in the class amd_rocm that we install firmware-amd-graphics
[08:43:05] <elukey>	 basically what I am trying to suggest is a quick exploratory reading of profiles when we need to install something, so we have an idea about what they do
[08:43:09] <elukey>	 pcc will confirm it of course
[08:43:52] <elukey>	 (I didn't really remember what profile::amd_gpu did in detail, I rechecked it 10 mins ago)
[08:44:18] <klausman>	 I will try and make a change for review
[08:46:47] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add resource_utils shared module [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982401 (owner: 10Elukey)
[08:50:24] <klausman>	 https://puppet-compiler.wmflabs.org/output/982761/881/ml-staging2001.codfw.wmnet/index.html
[08:51:19] <klausman>	 looks fine to me
[08:52:59] <klausman>	 currently also running pcc against 2002, to see if anything changes there
[08:53:21] <wikibugs>	 (03Merged) 10jenkins-bot: Add resource_utils shared module [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982401 (owner: 10Elukey)
[08:53:35] <klausman>	 Only a profile change, but ni on-disk changes
[08:53:39] <klausman>	 no*
[08:55:21] <elukey>	 perfect
[08:57:11] <elukey>	 kevinbazira: o/
[08:57:28] <elukey>	 when you have a moment I'd like to discuss with you https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/982407
[08:57:42] <kevinbazira>	 elukey: o/
[08:57:57] <elukey>	 I think that it should work fine, but we'd need to test if it does what we think it does
[08:58:15] <kevinbazira>	 sure sure, thank you for working on this.
[08:58:23] <elukey>	 I can try to test locally, but I was wondering if you had a quicker way to test the new change
[08:58:40] <elukey>	 basically the idea is to unset OMP_NUM_THREADS
[08:58:41] <kevinbazira>	 let me test it on the ml-sandbox and let you know how it goes
[08:58:53] <elukey>	 and check with various cpu settings 
[08:59:11] <elukey>	 the caveat is that we should also check how many threads are created
[08:59:28] <elukey>	 with something like ps -eLf 
[09:00:21] <elukey>	 once it is deployed I can jump on the node and check the threads 
[09:00:34] <elukey>	 I'd expect them to be somehow in line with how many cpus we set
[09:01:35] <elukey>	 I read https://docs.python.org/3/library/os.html#os.putenv and afaics setting os.environ["something"] also forces a call to putenv(), and OMP_NUM_THREADS should be preserved
[09:13:26] <kevinbazira>	 https://www.irccloud.com/pastebin/AeRl2MwN/
[09:13:38] <klausman>	 elukey: ran puppet-agent on 2001 successfully, now rebooting it (using cookbook, of course)
[09:13:57] <kevinbazira>	 elukey: it's throwing the error below:
[09:14:02] <kevinbazira>	 https://www.irccloud.com/pastebin/CcGqjm4P/
[09:18:05] <elukey>	 ah snap fixing!
[09:19:21] <wikibugs>	 (03PS7) 10Elukey: article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123)
[09:19:24] <elukey>	 kevinbazira: --^
[09:19:31] <kevinbazira>	 checking ...
[09:27:43] <kevinbazira>	 the error is gone and the test was able to return a prediction. it also showed the message below:
[09:27:43] <kevinbazira>	 ```
[09:27:44] <kevinbazira>	 INFO:root:Not inside a Cgroup v2, defaulting to the host's cpu count
[09:27:44] <kevinbazira>	 ```
[09:27:44] <kevinbazira>	 I've been using 1 CPU, so I'll run the same test with more CPUs and let you know how it goes.
[09:28:05] <klausman>	 elukey: interesting problem: the version of firmware-amd-graphics  we have on the staging machines (bullseye, 20210315-3) does not have the required firmware files for the MI100 (/usr/lib/firmware/amdgpu/arcturus_gpu_info.bin). Checking whether bookworm does./
[09:28:13] <elukey>	 kevinbazira: mmm lemme check one thing, it shouldn't say that
[09:28:25] <elukey>	 it should recognize the right number of cpus
[09:29:10] <elukey>	 klausman: let's check first https://packages.debian.org/bullseye-backports/firmware-amd-graphics
[09:29:33] <elukey>	 same as bookworm afaics
[09:29:38] <klausman>	 It's in both, yes
[09:31:20] <elukey>	 nice :)
[09:31:36] <elukey>	 we just need to pin the package to backports for bullseye
[09:31:47] <klausman>	 Yes, using apt::package_from_bpo I suppose
[09:32:09] <elukey>	 never seen it, but if you see other examples in puppet yes
[09:33:11] <klausman>	 They are all in modules, so I'm not sure where we'd put it
[09:34:02] <elukey>	 the package is defined in a module IIRC
[09:34:12] <klausman>	 If we put it in  modules/amd_rocm/manifests/init.pp, we'd also affect non-ML machines
[09:34:31] <elukey>	 yes but it is fine, we want exactly that
[09:35:03] <elukey>	 kevinbazira: ahhh wow the ml-sandbox uses cgroups v1!
[09:35:11] <elukey>	 that we don't support anymore
[09:35:19] <aiko>	 morning! gpuuuuu \o/
[09:35:26] <elukey>	 this is why it is not recognizing the cgroup
[09:35:30] <elukey>	 aiko: o/
[09:36:07] <elukey>	 klausman: any other machine, like a stat100x, will have the same problem if a MI100 is installed
[09:39:38] <klausman>	 True, I was just worried about changing other people's machines behind their back, so to speak.
[09:40:28] <elukey>	 It is fine, you add quickly ping folks in #wikimedia-analytics as FYI, but we are mostly in charge of the GPUs across nodes
[09:41:42] <elukey>	 kevinbazira: I'd need to reboot the ml-sandbox, is it ok? We loose all containers running, and we'll have to bootstrap again minikube
[09:44:05] <kevinbazira>	 elukey: woah ... is there a way to quickly backup the files in my profile before the reboot? If I do it on my end the backup will take forever.
[09:44:36] <elukey>	 kevinbazira: what do you need to backup?
[09:44:37] <kevinbazira>	 its ok to loose the containers
[09:45:06] <elukey>	 the VM may crash anytime, I'd suggest not to save important stuff on it
[09:46:20] <kevinbazira>	 nothing is important, just the workflow I setup in:
[09:46:20] <kevinbazira>	 ```
[09:46:20] <kevinbazira>	 kevinbazira@ml-sandbox:~$ pwd
[09:46:20] <kevinbazira>	 /home/kevinbazira
[09:46:20] <kevinbazira>	 ```
[09:46:50] <kevinbazira>	 if it is only containers that will be lost then I can quickly rebuild those
[09:48:36] <elukey>	 I can try to do one, but you have some model binaries that are big
[09:49:32] <elukey>	 and the space under /srv is not big
[09:49:33] <elukey>	 mmm
[09:50:15] <kevinbazira>	 yep, those are the binaries I've been using to run the tests
[09:51:29] <elukey>	 kevinbazira: I'd suggest one thing - copy only the text files to your laptop (the ones with scripts etc..), and add them to a gitlab repo later on (one under your username)
[09:51:57] <elukey>	 worst case we loose the binaries but we can recover them
[09:52:06] <elukey>	 and you have your logic backed up
[09:52:39] <elukey>	 if you keep the repo in sync then if the VM disappears for any reason you don't need to rebuild everything
[09:52:45] <elukey>	 just copy again the model binaries
[09:53:02] <elukey>	 (basically pull them from https://analytics.wikimedia.org/published/wmf-ml-models/)
[09:53:05] <elukey>	 does it make sense?
[09:54:49] <kevinbazira>	 yes, it does. most the text files I have them locally on my laptop too. I mainly use the sandbox to run them for testing purposes as it has more resources than my machine.
[09:55:19] <kevinbazira>	 we can go ahead and reboot. I'll set the test environment up again.
[09:58:06] <elukey>	 super, in theory it shouldn't cause anything weird, I hope it will reboot fine
[09:58:21] <elukey>	 is there any tutorial about how minikube runs?
[09:58:29] <elukey>	 I wanted to stop it, but sudo minikube shows no cluster
[09:59:06] <kevinbazira>	 IIRC: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/ML-Sandbox/Configuration
[10:01:35] <kevinbazira>	 and: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/ML-Sandbox
[10:04:53] <elukey>	 kevinbazira: ah ok minikube runs under your username
[10:05:06] <elukey>	 I stopped it via
[10:05:06] <elukey>	  sudo -u kevinbazira minikube stop
[10:05:08] <elukey>	 okok
[10:05:10] <elukey>	 rebooting
[10:11:59] <isaranto>	 o/ if anyone has time for a quick review https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/982770
[10:13:28] <elukey>	 isaranto: 5 euros!
[10:13:31] <elukey>	 kevinbazira: the VM is up
[10:13:48] <isaranto>	 elukey: that's cheap!
[10:15:03] <isaranto>	 figured out we need to modify the model-function to be able to upload directory
[10:15:21] <elukey>	 ?
[10:15:53] <elukey>	 one question - how is nllb using multi cpus? 
[10:16:03] <isaranto>	 if I'm not mistaken the command will upload one file
[10:16:35] <elukey>	 ahhh okok so you were talking about the upload script
[10:17:56] <isaranto>	 sorry didnt write the context, I thought you were in my head 😛
[10:18:03] <elukey>	 :D
[10:18:23] <elukey>	 so the change looks good, do we need any OMP_NUM_THREADS etc.. to be set?
[10:19:06] <elukey>	 kevinbazira: ok rechecked and now we are using cgroups v2, so in theory now if you test again article-descriptions it should tell you the right number of CPUs in the log msg
[10:19:18] <elukey>	 (basically the same ones that you set via --cpus=X)
[10:19:37] <kevinbazira>	 okok, let me check ...
[10:22:54] <isaranto>	 for nllb and cpus: I don't recall how we ended up requesting more cpus. I'll remove this for now and we can increase CPUs when we experiment with multiprocessing
[10:23:51] <elukey>	 sure sure, go ahead +1ed
[10:24:04] <elukey>	 I was just curious :)
[10:25:10] <klausman>	 elukey: stat1004 in PCC was a typo :D
[10:25:18] <isaranto>	 I'm not sure if tokenizing is faster. But regarding the internals of the underlying frameworks afaik pytorch won't be faster out-of-the-box with more cpus 
[10:25:28] <kevinbazira>	 elukey: I am still getting the message below:
[10:25:28] <kevinbazira>	 ```
[10:25:28] <kevinbazira>	 INFO:root:Not inside a Cgroup v2, defaulting to the host's cpu count
[10:25:28] <kevinbazira>	 ```
[10:25:28] <kevinbazira>	 based on this: https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/python/resource_utils.py#L18-L20
[10:25:29] <kevinbazira>	 indeed that file doesn't exist the container I am using on the ML sandbox:
[10:25:29] <kevinbazira>	 ```
[10:25:30] <kevinbazira>	 somebody@d2a45cd90677:/srv/article_descriptions/model_server$ ls /sys/fs/cgroup/
[10:25:30] <kevinbazira>	 cgroup.controllers  cgroup.max.descendants  cgroup.stat             cgroup.threads  system.slice
[10:25:31] <kevinbazira>	 cgroup.max.depth    cgroup.procs            cgroup.subtree_control  init.scope      user.slice
[10:25:31] <kevinbazira>	 ```
[10:27:05] <isaranto>	 aiko: if you end up writing a summary for batch requests, you could add it to the readme.md file of the repo (either the main one or just for revertrisk)
[10:27:26] <isaranto>	 just a suggestion so that we have the documentation next to the code
[10:28:41] <elukey>	 kevinbazira: mmmm I tested one docker run and it was using a cgroup v2, how do you run the container?
[10:31:04] <kevinbazira>	 elukey: I used `docker exec -it d2a45cd90677 /bin/bash`
[10:31:17] <elukey>	 kevinbazira: no no I mean how did you start it
[10:33:03] <kevinbazira>	 elukey: to start it I used `docker start d2a45cd90677`
[10:33:03] <kevinbazira>	 to create it I used `docker run -it --cpus=1 --memory=4g --entrypoint=/bin/bash article-descriptions:local-run`
[10:35:29] <elukey>	 kevinbazira: aahhh okok so please use the docker run, with docker start you use the previously built container with the old cgroup settings
[10:35:33] <elukey>	 it should work now
[10:36:02] <kevinbazira>	 okok let me check ...
[10:45:18] <aiko>	 isaranto: ok! I'll do that
[10:45:44] <isaranto>	 only if you find it useful as well
[10:49:06] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834)
[10:51:00] <wikibugs>	 (03PS2) 10Ilias Sarantopoulos: llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834)
[10:52:41] <isaranto>	 well, nllb was missing python directory. I plan to deploy the changes for the rest of the model servers either later today or tomorrow
[10:57:23] <kevinbazira>	 elukey: the message is gone and it's now picking the right number of CPUs set:
[10:57:23] <kevinbazira>	 ```
[10:57:23] <kevinbazira>	 >>> from python.resource_utils import get_cpu_count
[10:57:23] <kevinbazira>	 >>> get_cpu_count()
[10:57:23] <kevinbazira>	 1
[10:57:23] <kevinbazira>	 ```
[10:57:33] <klausman>	 AMD firmware change merged and machine rebooted: works fine!
[10:57:38] <elukey>	 kevinbazira: \o/
[10:57:48] <elukey>	 ok so lemme check the OMP threads
[10:58:01] <klausman>	 radeontop also working and reporting credible numbers
[10:58:06] <wikibugs>	 (03CR) 10Kevin Bazira: [C: 03+1] article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey)
[10:58:43] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey)
[10:58:57] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey)
[11:00:21] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos)
[11:00:56] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+1] article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey)
[11:01:40] <elukey>	 klausman: nice!
[11:01:45] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos)
[11:02:02] <isaranto>	 nice work!
[11:02:30] <isaranto>	 elukey: I accidentally +2 and then reset the vote. all good though I put +1
[11:03:22] <elukey>	 :D
[11:04:36] <elukey>	 kevinbazira: the article-description container on ml-sandbox is ready to predict (so model loaded etc) ?
[11:05:16] <klausman>	 Is there a somewhat self-contained "hello world" I could run to test the GPU working?
[11:06:24] <elukey>	 klausman: IIRC I used https://phabricator.wikimedia.org/P54375 in the default namespace
[11:07:19] <kevinbazira>	 elukey: yes the one in the experimental namespace. we've been using this to query it:
[11:07:19] <kevinbazira>	 ```
[11:07:19] <kevinbazira>	 time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}'
[11:07:19] <kevinbazira>	 ```
[11:08:01] <klausman>	 elukey: thx, will try that after lunch
[11:08:17] <elukey>	 kevinbazira: ah nono I mean where you tested the various cpus, since I wanted to make sure that the OMP env variable is set with the new code
[11:08:26] <elukey>	 in theory we shouldn't see too many threads etc..
[11:09:41] <kevinbazira>	 I've tested it with 1 CPU so far, going to test with more CPUs and share the results
[11:10:17] <elukey>	 kevinbazira: no rush, I wanted to know where you are testing though, since I'd need to check the number of threads
[11:11:32] <kevinbazira>	 I am testing from the ml sanbox using a container with id: 790124dcddf9
[11:11:49] <kevinbazira>	 how are you checking the number of threads?
[11:12:09] <elukey>	 in theory via ps -eLf it should be sufficient
[11:12:40] <elukey>	 but I don't see any for the running container
[11:13:02] <elukey>	 so I was wondering if maybe we need to load the model or similar, or make the first request
[11:17:16] <kevinbazira>	 To make a request, you can run:
[11:17:17] <kevinbazira>	 ```
[11:17:17] <kevinbazira>	 $ docker exec -it 790124dcddf9 /bin/bash
[11:17:17] <kevinbazira>	 $ time curl localhost:8080/v1/models/article-descriptions:predict -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}' -H "Content-type: application/json"
[11:17:17] <kevinbazira>	 ```
[11:17:43] <elukey>	 ah yes now I see python threads :)
[11:18:18] <elukey>	 I see 32 of them, that hopefully are ok
[11:23:21] <elukey>	 29 sorry
[11:23:28] <elukey>	 ps -eLf | grep [m]odel_server | wc -l
[11:31:35] <elukey>	 kevinbazira: ok so lemme know when you test other cpu values so I'll check threads
[11:31:49] <elukey>	 (or you can, I am interested in the output of ps)
[11:32:05] <elukey>	 with one cpu we have 29 threads, that is not what I expected but it may be ok-ish
[11:32:15] <elukey>	 if we double with two cpus etc.. it is ok
[11:32:38] <kevinbazira>	 sure sure, let me run the test with more CPUs 
[11:36:55] <elukey>	 anytime, no rush!
[11:46:09] <kevinbazira>	 elukey: sure, I've checked 2 CPUs in container with id: f0fdf2409f41, and still get 29 threads
[11:46:32] <elukey>	 and two cpus recognized right?
[11:47:09] <kevinbazira>	 yes, they are:
[11:47:09] <kevinbazira>	 ```
[11:47:09] <kevinbazira>	 >>> from python.resource_utils import get_cpu_count
[11:47:09] <kevinbazira>	 >>> get_cpu_count()
[11:47:09] <kevinbazira>	 2
[11:47:09] <kevinbazira>	 ```
[11:49:04] <elukey>	 okok
[11:50:07] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey)
[11:50:42] <elukey>	 kevinbazira: I think that we can test --^ in staging with multiple cpus, to see how it goes
[11:50:58] <elukey>	 going afk for lunch, if you have time we can do it later
[11:51:05] * elukey lunch!
[11:51:07] <wikibugs>	 (03Merged) 10jenkins-bot: article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey)
[11:51:10] <kevinbazira>	 ok, that's fine. 
[11:51:17] <kevinbazira>	 enjoy your lunch!
[11:51:22] <elukey>	 thanks!
[11:51:42] <isaranto>	 going for lunch as well!
[11:52:04] <kevinbazira>	 enjoy your lunch too!
[11:53:19] <isaranto>	 If I can get a review here from someone that would be awesome https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/982776/
[11:53:30] <isaranto>	 I'm going to open a patch afterwards to update all the images
[11:54:19] <wikibugs>	 (03CR) 10Kevin Bazira: [C: 03+1] llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos)
[11:56:31] <isaranto>	 thanks Kevin!
[11:56:37] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos)
[11:56:45] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos)
[11:56:48] <wikibugs>	 (03PS3) 10Ilias Sarantopoulos: llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834)
[11:56:57] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos)
[12:08:07] <wikibugs>	 (03Merged) 10jenkins-bot: llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos)
[12:13:41] <wikibugs>	 (03PS1) 10AikoChou: outlink: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549)
[12:18:09] <aiko>	 ---^ I built the docker images locally and have tested them. The model server works well
[12:23:42] <aiko>	 isaranto: tested https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/982043 with kserve 0.11.2 without any issues :)
[12:25:45] * aiko lunch!
[13:14:04] <isaranto>	 I updated llm and readability images https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/982795
[13:19:24] <klausman>	 elukey: how do nodes get labels like amd.com/gpu? 2001 doesn't have it, so I presume it's not happening automagically?
[13:19:38] <klausman>	 I dug around puppet, but there wasn't anything obvious
[13:22:33] <klausman>	 nvm, it's not a label
[13:23:08] <klausman>	 Hm. Still not scheduling for some reason
[13:25:03] <klausman>	 Ah. the GPU LLM pod stole it :D
[13:26:22] <klausman>	 isaranto: do we have an example query for the GPU nllb200?
[13:36:06] <isaranto>	 https://phabricator.wikimedia.org/P54378
[13:36:48] <isaranto>	 klausman: I'll have to deploy the latest image first
[13:36:56] <klausman>	 ack
[13:38:25] <isaranto>	 and we have two issues we need to solve (not now but generally) . 1) we'll have 1 GPU in staging so redeploying sth is going to be a challenge. 
[13:39:50] <klausman>	 Yeah, I'd like to have two machines with a GPU in staging in the mid-term, to avoid a SPOF
[13:39:58] <isaranto>	 cool!
[13:41:21] <klausman>	 And since we're planning on adding machines to staging anyway, I think that may happen simultaneously. As we found out yesterday, we need more than two staging machines anyway, orthogonal to having GPUs there. Fortunately, they're budgeted/in procurement
[13:53:54] <isaranto>	 klausman: shall I sync changes in ml-staging as well?
[13:54:03] <klausman>	 please do
[13:55:00] <isaranto>	 will do
[13:55:16] <isaranto>	 I updated the paste above to include a sample request for ml-staging
[13:55:37] <klausman>	 merci!
[14:01:55] <klausman>	 That 404s, checking whether I got the right endpoint
[14:02:30] <isaranto>	 shoot. the llm server is failing due to a bad import. I remember testing it but lost track with all the changes yesterday
[14:02:49] <isaranto>	 klausman: bright side ml-staging is in an old image so it is running find
[14:03:11] <isaranto>	 *fine. to try the GPU run the third request here https://phabricator.wikimedia.org/P54378
[14:03:42] <klausman>	 yep, that works and I can see GPU usage
[14:03:49] <klausman>	 real    0m1.666s
[14:03:51] <klausman>	 Not bad 
[14:06:51] <isaranto>	 1/10 of the time without the GPU
[14:07:02] <isaranto>	 *than without
[14:07:14] <klausman>	 GPU VRAM use is about 4.7G
[14:07:37] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802
[14:07:43] <wikibugs>	 (03PS2) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802
[14:08:28] <isaranto>	 sorry folks for the above, one last push! I tested it locally and it works
[14:10:49] <wikibugs>	 (03PS3) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802
[14:11:44] <wikibugs>	 (03PS4) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802
[14:12:57] <wikibugs>	 (03PS5) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802
[14:13:09] <isaranto>	 import hell
[14:15:58] <elukey>	 the gpu works :)
[14:17:07] <wikibugs>	 (03Abandoned) 10Elukey: blubber: add the transformer dir to outlink's transformer image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982433 (owner: 10Elukey)
[14:18:03] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: outlink: upgrade kserve to 0.11.2 (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou)
[14:18:49] <isaranto>	 great work folks setting it up! that was really fast!
[14:22:59] <wikibugs>	 (03CR) 10Kevin Bazira: llm: fix circular import (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 (owner: 10Ilias Sarantopoulos)
[14:23:19] <elukey>	 if somebody has time for a quick change: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/982803/
[14:25:06] <elukey>	 thanks :)
[14:33:04] <wikibugs>	 (03PS6) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802
[14:35:12] <wikibugs>	 (03PS7) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802
[14:36:54] <elukey>	 kevinbazira: the change doesn't work sadly
[14:37:43] <wikibugs>	 (03PS8) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802
[14:38:51] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: llm: fix circular import (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 (owner: 10Ilias Sarantopoulos)
[14:38:53] <kevinbazira>	 elukey: thank you for helping with this option. I am going to continue looking into other optimization options. :)
[14:40:47] <wikibugs>	 (03CR) 10Kevin Bazira: [C: 03+1] llm: fix circular import (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 (owner: 10Ilias Sarantopoulos)
[14:41:10] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 (owner: 10Ilias Sarantopoulos)
[14:41:35] <elukey>	 kevinbazira: I'll keep working on the solution, worst case I'll restore OMP_NUM_THREADS
[14:42:02] <kevinbazira>	 okok, thank you!
[14:56:54] <wikibugs>	 (03CR) 10AikoChou: outlink: upgrade kserve to 0.11.2 (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou)
[14:58:28] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [V: 03+2 C: 03+2] llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 (owner: 10Ilias Sarantopoulos)
[14:59:28] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: outlink: upgrade kserve to 0.11.2 (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou)
[14:59:32] <wikibugs>	 (03PS2) 10Ilias Sarantopoulos: outlink: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou)
[15:01:23] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+1] outlink: upgrade kserve to 0.11.2 (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou)
[15:37:28] <wikibugs>	 10Machine-Learning-Team: Implement caching for revertrisk-multilingual - https://phabricator.wikimedia.org/T353333 (10isarantopoulos)
[15:39:16] <wikibugs>	 10Machine-Learning-Team: Goal: Increase the number of models hosted on Lift Wing - https://phabricator.wikimedia.org/T353335 (10isarantopoulos)
[15:54:40] <wikibugs>	 10Machine-Learning-Team: Goal: Inference Optimization for Hugging face models - https://phabricator.wikimedia.org/T353337 (10isarantopoulos)
[15:55:24] <wikibugs>	 10Machine-Learning-Team: Goal: Inference Optimization for Hugging face/Pytorch models - https://phabricator.wikimedia.org/T353337 (10isarantopoulos)
[15:55:40] <wikibugs>	 10Machine-Learning-Team: Goal: Implement caching for revertrisk-multilingual - https://phabricator.wikimedia.org/T353333 (10isarantopoulos)
[15:59:21] <wikibugs>	 10Machine-Learning-Team: Goal: Expand Lift Wing Cluster and add GPU capacity to production - https://phabricator.wikimedia.org/T353338 (10isarantopoulos)
[15:59:51] <wikibugs>	 10Machine-Learning-Team, 10Goal: Goal: Lift Wing users can request multiple predictions using a single request. - https://phabricator.wikimedia.org/T348153 (10isarantopoulos) p:05Medium→03Triage
[16:07:10] <isaranto>	 one final image update ( famous last words)  -> https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/982849
[16:10:52] <wikibugs>	 (03CR) 10AikoChou: [C: 03+2] outlink: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou)
[16:14:44] <isaranto>	 thanks for the review Luca!
[16:15:36] <wikibugs>	 (03Merged) 10jenkins-bot: outlink: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou)
[16:16:52] <elukey>	 np!
[16:17:17] <elukey>	 so in article-description my hack to add OMP_NUM_THREADS automatically doesn't work
[16:17:33] <elukey>	 namely torch creates a ton of threads that causes throttling
[16:18:50] <elukey>	 I've read some reports of people having the same issue with numpy, and the os.environ[] entry needed to be added before the import
[16:19:20] <elukey>	 in our case this would need to be done in __init__.py, in theory
[16:21:28] <isaranto>	 or model.py
[16:21:46] <elukey>	 I am doing it in there
[16:21:59] <elukey>	 but I believe that we import ModelLoader from utils, that carries torch
[16:22:07] <isaranto>	 or, run another script before the model.py command in the same container
[16:22:54] <isaranto>	 wdyt? kind of like a hack but it mimics an init container behavior
[16:22:57] <elukey>	 but the os env variable will not stick, it is process-based
[16:23:32] <elukey>	 export works in a bash shell since the processes that you create are are children of it
[16:23:44] <isaranto>	 ok, I take it back!
[16:25:35] <isaranto>	 ok, nllb is failing again. I'll work on it properly tomorrow. the issue I have is that I need to build a different image in order for it to run fast enough on M1, but I'll do that
[16:26:03] <isaranto>	 now I'm missing a requirement 👎
[16:30:19] <wikibugs>	 (03PS1) 10Elukey: article-descriptions: move the OMP_NUM_THREADS declaration sooner [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750)
[16:30:53] <elukey>	 I need to test it but --^ is the idea
[16:32:08] <wikibugs>	 (03PS2) 10Elukey: article-descriptions: move the OMP_NUM_THREADS declaration sooner [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750)
[16:39:10] <wikibugs>	 (03CR) 10Elukey: "Kevin, we'd need to test this on ml-sandbox if possible :(" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) (owner: 10Elukey)
[17:06:25] <wikibugs>	 10Machine-Learning-Team: Upgrade outlink docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347549 (10achou) Next steps:  * Roll out the new docker images to ml-staging * Perform load testing to ensure consistent performance * Test the publishing of events from staging to EventGate  Regarding the...
[17:09:53] <elukey>	 the space is almost used on ml-sandbox, I can't build a docker img
[17:10:54] <elukey>	 ok nice I cleaned up the dangling build cache
[17:10:58] <elukey>	 reclaimed ~16G
[17:11:45] <aiko>	 elukey: o/ ----^ I'll need to test outlink of event publishing from staging to EventGate, so I'd like to go for the first option you proposed in https://phabricator.wikimedia.org/T349919 to create a new testing stream specifically for prediction-change events. Wdyt?
[17:15:15] <isaranto>	 I'm going afk folks, will continue on the llm image tomorrow. have a nice evening!
[17:15:24] <klausman>	 \o heading out as well
[17:16:00] <wikibugs>	 10Machine-Learning-Team: Apply common settings to publish events from Lift Wing staging to EventGate - https://phabricator.wikimedia.org/T349919 (10achou) a:03achou
[17:16:20] <wikibugs>	 (03CR) 10Kevin Bazira: article-descriptions: move the OMP_NUM_THREADS declaration sooner (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) (owner: 10Elukey)
[17:17:31] <aiko>	 bye Ilias and Tobias, have a nice rest of the day :)
[17:20:29] <elukey>	 kevinbazira: (if you are still online) I don't see any log related to the cpu count from docker logs, do you see any?
[17:23:21] <kevinbazira>	 elukey: how are you checking for this?
[17:23:33] <elukey>	 docker logs $id-of-the-container
[17:27:02] <wikibugs>	 (03PS3) 10Elukey: article-descriptions: explictly set torch threads [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750)
[17:27:07] <elukey>	 new version :)
[17:28:09] <wikibugs>	 (03PS4) 10Elukey: article-descriptions: explictly set torch threads [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750)
[17:29:24] <wikibugs>	 (03PS5) 10Elukey: article-descriptions: explictly set torch threads [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750)
[17:29:30] <elukey>	 better --^
[17:30:28] <elukey>	 ah no it is a function
[17:31:06] <wikibugs>	 (03PS6) 10Elukey: article-descriptions: explictly set torch threads [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750)
[17:31:12] <elukey>	 this should work kevinbazira --^
[17:31:41] <elukey>	 (if it is late for you we can do it tomorrow)
[17:31:41] <kevinbazira>	 ok, checking ...
[17:32:32] <kevinbazira>	 nono since this was planned to be your last day before your leave, we can proceed and complete this today
[17:32:48] <elukey>	 I am in tomorrow afternoon, don't worry :)
[17:32:57] <elukey>	 (unless something changes in the meantime :D)
[17:40:27] <wikibugs>	 10Machine-Learning-Team: Apply common settings to publish events from Lift Wing staging to EventGate - https://phabricator.wikimedia.org/T349919 (10Ottomata) My brain doesn't remember exactly what this did, but is  > If the eventgate's chart is migrated to the ingress module  done now that https://gerrit.wikimed...
[17:41:44] <ottomata>	 aiko: new testing stream should be fine.  https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#Stream_versioning
[17:43:13] <elukey>	 ottomata: o/
[17:43:18] <ottomata>	 maybe .dev0 suffix?  or whatever you prefer.  consider settting canary_events_enabled: false
[17:43:23] <ottomata>	 elukey:  hewo!
[17:43:33] <elukey>	 re: staging endpoint for eventgate - I think that we need to add the ingress module
[17:43:39] <elukey>	 to the chart I mean
[17:43:57] <elukey>	 it is a big change, I think it needs to be done very carefully
[17:44:02] <elukey>	 the VIP changes etc..
[17:45:14] <ottomata>	 oh lkay
[17:45:15] <ottomata>	 okay
[17:45:21] <ottomata>	 i think i dont' know what 'ingress module' means then :)
[17:45:50] <elukey>	 basically you delegate all the routing etc.. to the Istio ingress
[17:46:11] <elukey>	 that has a single LVS VIP, and everything cnames to it
[17:46:31] <elukey>	 it is very handy, but it is a massive change in how the requests are routed
[17:46:51] <elukey>	 it also has a staging endpoint etc..
[17:47:37] <kevinbazira>	 elukey: yep, this change shows the OMP_NUM_THREADS log and the threads have dropped 24
[17:47:57] <elukey>	 oh wow
[17:48:41] <wikibugs>	 (03CR) 10Kevin Bazira: article-descriptions: explictly set torch threads (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) (owner: 10Elukey)
[17:51:34] <elukey>	 I don't see the log entry but we can check tomorrow
[17:51:36] <elukey>	 thanks for testing!
[17:52:34] <elukey>	 I think that the last solution may be better
[17:52:41] <elukey>	 let's also see what others think
[17:52:59] <elukey>	 (with the assumption that torch.set_num_threads does the same as OMP_NUM_THREADS)
[18:00:44] * elukey afk!
[18:03:46] <kevinbazira>	 please see screenshot below for the log entry:
[18:04:12] <kevinbazira>	 https://usercontent.irccloud-cdn.com/file/9w0rgNag/OMP_NUM_THREADS%20log.jpg
[18:04:36] <kevinbazira>	 enjoy your evening o/
[18:06:53] <aiko>	 ottomata: o/ ack, l'll work on that tomorrow!