[00:14:16] cool! [07:50:46] Good morning! [07:51:09] GPUUUU!!! [08:28:09] morning! [08:28:12] wow nice! [08:30:03] Just one item: [08:30:13] [drm:amdgpu_pci_probe [amdgpu]] *ERROR* amdgpu requires firmware installed [08:30:14] amdgpu: See https://wiki.debian.org/Firmware for information about missing firmware [08:30:20] But that should be easily fixable [08:30:30] also, good morning :) [08:30:44] klausman: puppet will take care of that, we'll probably need to reboot though [08:30:59] there is a flag to install all the configs for a kubernetes node [08:31:06] if you want we can configure it now [08:31:38] As for the install: two GPUs would fit, with one smal and a bigger caveat. 1) DCops would need to order some more cabling. 2) heat output of the machine would likely mean that we can't have more than 1-2 multi-CPU machines on a rack [08:32:05] elukey: Is there a guide/docs on wikitech regarding that? [08:32:29] yeah IIRC 1) was anticipated by Papaul in another task, 2) was probably expected.. maybe supermicro will have some extra cooling solutions? [08:32:47] klausman: you can check puppet, ml-serve1001 already has one gpu (dse workers as well) [08:32:53] ack, will do [08:34:20] is it just `profile::amd_gpu::rocm_version: '54'`? [08:35:59] how can we verify if it is enough? [08:36:56] make a change, run pcc? [08:37:24] sure, but the puppet classes/profiles should also give us a hint if anything is needed or not [08:37:51] the above hiera setting is related to a profile, we can start from there [08:38:29] Well, role::ml_k8s::worker includes ::profile::amd_gpu, so we should be good [08:39:16] Since profile::amd_gpu has only if $rocm_version { as a gate, I doubt any other flags are needed [08:39:18] what role is used by the staging nodes? [08:40:01] role::ml_k8s::staging::worker which does not use the amd_gpu include, so we have to add that [08:40:22] ok perfect, then there is another bit in the profile::amd_gpu that is relevant for k8s [08:40:29] that needs to be taken into consideration as well [08:40:52] try to check what profile::amd_gpu does, we use it in two places [08:40:52] 'profile::amd_gpu::allow_gpu_broader_access'? [08:41:01] 1) regular nodes (like stat100x) [08:41:04] 2) kubernetes nodes [08:41:26] 'profile::amd_gpu::is_kubernetes_node' is also used [08:41:43] in 2) we deploy more things if you recall, like the gpu device plufin [08:41:46] *plugin [08:41:48] yes [08:42:15] and you can see in the class amd_rocm that we install firmware-amd-graphics [08:43:05] basically what I am trying to suggest is a quick exploratory reading of profiles when we need to install something, so we have an idea about what they do [08:43:09] pcc will confirm it of course [08:43:52] (I didn't really remember what profile::amd_gpu did in detail, I rechecked it 10 mins ago) [08:44:18] I will try and make a change for review [08:46:47] (03CR) 10Elukey: [C: 03+2] Add resource_utils shared module [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982401 (owner: 10Elukey) [08:50:24] https://puppet-compiler.wmflabs.org/output/982761/881/ml-staging2001.codfw.wmnet/index.html [08:51:19] looks fine to me [08:52:59] currently also running pcc against 2002, to see if anything changes there [08:53:21] (03Merged) 10jenkins-bot: Add resource_utils shared module [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982401 (owner: 10Elukey) [08:53:35] Only a profile change, but ni on-disk changes [08:53:39] no* [08:55:21] perfect [08:57:11] kevinbazira: o/ [08:57:28] when you have a moment I'd like to discuss with you https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/982407 [08:57:42] elukey: o/ [08:57:57] I think that it should work fine, but we'd need to test if it does what we think it does [08:58:15] sure sure, thank you for working on this. [08:58:23] I can try to test locally, but I was wondering if you had a quicker way to test the new change [08:58:40] basically the idea is to unset OMP_NUM_THREADS [08:58:41] let me test it on the ml-sandbox and let you know how it goes [08:58:53] and check with various cpu settings [08:59:11] the caveat is that we should also check how many threads are created [08:59:28] with something like ps -eLf [09:00:21] once it is deployed I can jump on the node and check the threads [09:00:34] I'd expect them to be somehow in line with how many cpus we set [09:01:35] I read https://docs.python.org/3/library/os.html#os.putenv and afaics setting os.environ["something"] also forces a call to putenv(), and OMP_NUM_THREADS should be preserved [09:13:26] https://www.irccloud.com/pastebin/AeRl2MwN/ [09:13:38] elukey: ran puppet-agent on 2001 successfully, now rebooting it (using cookbook, of course) [09:13:57] elukey: it's throwing the error below: [09:14:02] https://www.irccloud.com/pastebin/CcGqjm4P/ [09:18:05] ah snap fixing! [09:19:21] (03PS7) 10Elukey: article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) [09:19:24] kevinbazira: --^ [09:19:31] checking ... [09:27:43] the error is gone and the test was able to return a prediction. it also showed the message below: [09:27:43] ``` [09:27:44] INFO:root:Not inside a Cgroup v2, defaulting to the host's cpu count [09:27:44] ``` [09:27:44] I've been using 1 CPU, so I'll run the same test with more CPUs and let you know how it goes. [09:28:05] elukey: interesting problem: the version of firmware-amd-graphics we have on the staging machines (bullseye, 20210315-3) does not have the required firmware files for the MI100 (/usr/lib/firmware/amdgpu/arcturus_gpu_info.bin). Checking whether bookworm does./ [09:28:13] kevinbazira: mmm lemme check one thing, it shouldn't say that [09:28:25] it should recognize the right number of cpus [09:29:10] klausman: let's check first https://packages.debian.org/bullseye-backports/firmware-amd-graphics [09:29:33] same as bookworm afaics [09:29:38] It's in both, yes [09:31:20] nice :) [09:31:36] we just need to pin the package to backports for bullseye [09:31:47] Yes, using apt::package_from_bpo I suppose [09:32:09] never seen it, but if you see other examples in puppet yes [09:33:11] They are all in modules, so I'm not sure where we'd put it [09:34:02] the package is defined in a module IIRC [09:34:12] If we put it in modules/amd_rocm/manifests/init.pp, we'd also affect non-ML machines [09:34:31] yes but it is fine, we want exactly that [09:35:03] kevinbazira: ahhh wow the ml-sandbox uses cgroups v1! [09:35:11] that we don't support anymore [09:35:19] morning! gpuuuuu \o/ [09:35:26] this is why it is not recognizing the cgroup [09:35:30] aiko: o/ [09:36:07] klausman: any other machine, like a stat100x, will have the same problem if a MI100 is installed [09:39:38] True, I was just worried about changing other people's machines behind their back, so to speak. [09:40:28] It is fine, you add quickly ping folks in #wikimedia-analytics as FYI, but we are mostly in charge of the GPUs across nodes [09:41:42] kevinbazira: I'd need to reboot the ml-sandbox, is it ok? We loose all containers running, and we'll have to bootstrap again minikube [09:44:05] elukey: woah ... is there a way to quickly backup the files in my profile before the reboot? If I do it on my end the backup will take forever. [09:44:36] kevinbazira: what do you need to backup? [09:44:37] its ok to loose the containers [09:45:06] the VM may crash anytime, I'd suggest not to save important stuff on it [09:46:20] nothing is important, just the workflow I setup in: [09:46:20] ``` [09:46:20] kevinbazira@ml-sandbox:~$ pwd [09:46:20] /home/kevinbazira [09:46:20] ``` [09:46:50] if it is only containers that will be lost then I can quickly rebuild those [09:48:36] I can try to do one, but you have some model binaries that are big [09:49:32] and the space under /srv is not big [09:49:33] mmm [09:50:15] yep, those are the binaries I've been using to run the tests [09:51:29] kevinbazira: I'd suggest one thing - copy only the text files to your laptop (the ones with scripts etc..), and add them to a gitlab repo later on (one under your username) [09:51:57] worst case we loose the binaries but we can recover them [09:52:06] and you have your logic backed up [09:52:39] if you keep the repo in sync then if the VM disappears for any reason you don't need to rebuild everything [09:52:45] just copy again the model binaries [09:53:02] (basically pull them from https://analytics.wikimedia.org/published/wmf-ml-models/) [09:53:05] does it make sense? [09:54:49] yes, it does. most the text files I have them locally on my laptop too. I mainly use the sandbox to run them for testing purposes as it has more resources than my machine. [09:55:19] we can go ahead and reboot. I'll set the test environment up again. [09:58:06] super, in theory it shouldn't cause anything weird, I hope it will reboot fine [09:58:21] is there any tutorial about how minikube runs? [09:58:29] I wanted to stop it, but sudo minikube shows no cluster [09:59:06] IIRC: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/ML-Sandbox/Configuration [10:01:35] and: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/ML-Sandbox [10:04:53] kevinbazira: ah ok minikube runs under your username [10:05:06] I stopped it via [10:05:06] sudo -u kevinbazira minikube stop [10:05:08] okok [10:05:10] rebooting [10:11:59] o/ if anyone has time for a quick review https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/982770 [10:13:28] isaranto: 5 euros! [10:13:31] kevinbazira: the VM is up [10:13:48] elukey: that's cheap! [10:15:03] figured out we need to modify the model-function to be able to upload directory [10:15:21] ? [10:15:53] one question - how is nllb using multi cpus? [10:16:03] if I'm not mistaken the command will upload one file [10:16:35] ahhh okok so you were talking about the upload script [10:17:56] sorry didnt write the context, I thought you were in my head 😛 [10:18:03] :D [10:18:23] so the change looks good, do we need any OMP_NUM_THREADS etc.. to be set? [10:19:06] kevinbazira: ok rechecked and now we are using cgroups v2, so in theory now if you test again article-descriptions it should tell you the right number of CPUs in the log msg [10:19:18] (basically the same ones that you set via --cpus=X) [10:19:37] okok, let me check ... [10:22:54] for nllb and cpus: I don't recall how we ended up requesting more cpus. I'll remove this for now and we can increase CPUs when we experiment with multiprocessing [10:23:51] sure sure, go ahead +1ed [10:24:04] I was just curious :) [10:25:10] elukey: stat1004 in PCC was a typo :D [10:25:18] I'm not sure if tokenizing is faster. But regarding the internals of the underlying frameworks afaik pytorch won't be faster out-of-the-box with more cpus [10:25:28] elukey: I am still getting the message below: [10:25:28] ``` [10:25:28] INFO:root:Not inside a Cgroup v2, defaulting to the host's cpu count [10:25:28] ``` [10:25:28] based on this: https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/python/resource_utils.py#L18-L20 [10:25:29] indeed that file doesn't exist the container I am using on the ML sandbox: [10:25:29] ``` [10:25:30] somebody@d2a45cd90677:/srv/article_descriptions/model_server$ ls /sys/fs/cgroup/ [10:25:30] cgroup.controllers cgroup.max.descendants cgroup.stat cgroup.threads system.slice [10:25:31] cgroup.max.depth cgroup.procs cgroup.subtree_control init.scope user.slice [10:25:31] ``` [10:27:05] aiko: if you end up writing a summary for batch requests, you could add it to the readme.md file of the repo (either the main one or just for revertrisk) [10:27:26] just a suggestion so that we have the documentation next to the code [10:28:41] kevinbazira: mmmm I tested one docker run and it was using a cgroup v2, how do you run the container? [10:31:04] elukey: I used `docker exec -it d2a45cd90677 /bin/bash` [10:31:17] kevinbazira: no no I mean how did you start it [10:33:03] elukey: to start it I used `docker start d2a45cd90677` [10:33:03] to create it I used `docker run -it --cpus=1 --memory=4g --entrypoint=/bin/bash article-descriptions:local-run` [10:35:29] kevinbazira: aahhh okok so please use the docker run, with docker start you use the previously built container with the old cgroup settings [10:35:33] it should work now [10:36:02] okok let me check ... [10:45:18] isaranto: ok! I'll do that [10:45:44] only if you find it useful as well [10:49:06] (03PS1) 10Ilias Sarantopoulos: llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) [10:51:00] (03PS2) 10Ilias Sarantopoulos: llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) [10:52:41] well, nllb was missing python directory. I plan to deploy the changes for the rest of the model servers either later today or tomorrow [10:57:23] elukey: the message is gone and it's now picking the right number of CPUs set: [10:57:23] ``` [10:57:23] >>> from python.resource_utils import get_cpu_count [10:57:23] >>> get_cpu_count() [10:57:23] 1 [10:57:23] ``` [10:57:33] AMD firmware change merged and machine rebooted: works fine! [10:57:38] kevinbazira: \o/ [10:57:48] ok so lemme check the OMP threads [10:58:01] radeontop also working and reporting credible numbers [10:58:06] (03CR) 10Kevin Bazira: [C: 03+1] article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey) [10:58:43] (03CR) 10Ilias Sarantopoulos: [C: 03+2] article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey) [10:58:57] (03CR) 10Ilias Sarantopoulos: article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey) [11:00:21] (03CR) 10CI reject: [V: 04-1] llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [11:00:56] (03CR) 10Ilias Sarantopoulos: [C: 03+1] article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey) [11:01:40] klausman: nice! [11:01:45] (03CR) 10Ilias Sarantopoulos: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [11:02:02] nice work! [11:02:30] elukey: I accidentally +2 and then reset the vote. all good though I put +1 [11:03:22] :D [11:04:36] kevinbazira: the article-description container on ml-sandbox is ready to predict (so model loaded etc) ? [11:05:16] Is there a somewhat self-contained "hello world" I could run to test the GPU working? [11:06:24] klausman: IIRC I used https://phabricator.wikimedia.org/P54375 in the default namespace [11:07:19] elukey: yes the one in the experimental namespace. we've been using this to query it: [11:07:19] ``` [11:07:19] time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}' [11:07:19] ``` [11:08:01] elukey: thx, will try that after lunch [11:08:17] kevinbazira: ah nono I mean where you tested the various cpus, since I wanted to make sure that the OMP env variable is set with the new code [11:08:26] in theory we shouldn't see too many threads etc.. [11:09:41] I've tested it with 1 CPU so far, going to test with more CPUs and share the results [11:10:17] kevinbazira: no rush, I wanted to know where you are testing though, since I'd need to check the number of threads [11:11:32] I am testing from the ml sanbox using a container with id: 790124dcddf9 [11:11:49] how are you checking the number of threads? [11:12:09] in theory via ps -eLf it should be sufficient [11:12:40] but I don't see any for the running container [11:13:02] so I was wondering if maybe we need to load the model or similar, or make the first request [11:17:16] To make a request, you can run: [11:17:17] ``` [11:17:17] $ docker exec -it 790124dcddf9 /bin/bash [11:17:17] $ time curl localhost:8080/v1/models/article-descriptions:predict -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}' -H "Content-type: application/json" [11:17:17] ``` [11:17:43] ah yes now I see python threads :) [11:18:18] I see 32 of them, that hopefully are ok [11:23:21] 29 sorry [11:23:28] ps -eLf | grep [m]odel_server | wc -l [11:31:35] kevinbazira: ok so lemme know when you test other cpu values so I'll check threads [11:31:49] (or you can, I am interested in the output of ps) [11:32:05] with one cpu we have 29 threads, that is not what I expected but it may be ok-ish [11:32:15] if we double with two cpus etc.. it is ok [11:32:38] sure sure, let me run the test with more CPUs [11:36:55] anytime, no rush! [11:46:09] elukey: sure, I've checked 2 CPUs in container with id: f0fdf2409f41, and still get 29 threads [11:46:32] and two cpus recognized right? [11:47:09] yes, they are: [11:47:09] ``` [11:47:09] >>> from python.resource_utils import get_cpu_count [11:47:09] >>> get_cpu_count() [11:47:09] 2 [11:47:09] ``` [11:49:04] okok [11:50:07] (03CR) 10Elukey: [C: 03+2] article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey) [11:50:42] kevinbazira: I think that we can test --^ in staging with multiple cpus, to see how it goes [11:50:58] going afk for lunch, if you have time we can do it later [11:51:05] * elukey lunch! [11:51:07] (03Merged) 10jenkins-bot: article-descriptions: set OMP_NUM_THREADS automatically [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982407 (https://phabricator.wikimedia.org/T343123) (owner: 10Elukey) [11:51:10] ok, that's fine. [11:51:17] enjoy your lunch! [11:51:22] thanks! [11:51:42] going for lunch as well! [11:52:04] enjoy your lunch too! [11:53:19] If I can get a review here from someone that would be awesome https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/982776/ [11:53:30] I'm going to open a patch afterwards to update all the images [11:54:19] (03CR) 10Kevin Bazira: [C: 03+1] llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [11:56:31] thanks Kevin! [11:56:37] (03CR) 10Ilias Sarantopoulos: [C: 03+2] llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [11:56:45] (03CR) 10Ilias Sarantopoulos: llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [11:56:48] (03PS3) 10Ilias Sarantopoulos: llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) [11:56:57] (03CR) 10Ilias Sarantopoulos: [C: 03+2] llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [12:08:07] (03Merged) 10jenkins-bot: llm: fix missing python utils [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982776 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [12:13:41] (03PS1) 10AikoChou: outlink: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) [12:18:09] ---^ I built the docker images locally and have tested them. The model server works well [12:23:42] isaranto: tested https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/982043 with kserve 0.11.2 without any issues :) [12:25:45] * aiko lunch! [13:14:04] I updated llm and readability images https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/982795 [13:19:24] elukey: how do nodes get labels like amd.com/gpu? 2001 doesn't have it, so I presume it's not happening automagically? [13:19:38] I dug around puppet, but there wasn't anything obvious [13:22:33] nvm, it's not a label [13:23:08] Hm. Still not scheduling for some reason [13:25:03] Ah. the GPU LLM pod stole it :D [13:26:22] isaranto: do we have an example query for the GPU nllb200? [13:36:06] https://phabricator.wikimedia.org/P54378 [13:36:48] klausman: I'll have to deploy the latest image first [13:36:56] ack [13:38:25] and we have two issues we need to solve (not now but generally) . 1) we'll have 1 GPU in staging so redeploying sth is going to be a challenge. [13:39:50] Yeah, I'd like to have two machines with a GPU in staging in the mid-term, to avoid a SPOF [13:39:58] cool! [13:41:21] And since we're planning on adding machines to staging anyway, I think that may happen simultaneously. As we found out yesterday, we need more than two staging machines anyway, orthogonal to having GPUs there. Fortunately, they're budgeted/in procurement [13:53:54] klausman: shall I sync changes in ml-staging as well? [13:54:03] please do [13:55:00] will do [13:55:16] I updated the paste above to include a sample request for ml-staging [13:55:37] merci! [14:01:55] That 404s, checking whether I got the right endpoint [14:02:30] shoot. the llm server is failing due to a bad import. I remember testing it but lost track with all the changes yesterday [14:02:49] klausman: bright side ml-staging is in an old image so it is running find [14:03:11] *fine. to try the GPU run the third request here https://phabricator.wikimedia.org/P54378 [14:03:42] yep, that works and I can see GPU usage [14:03:49] real 0m1.666s [14:03:51] Not bad [14:06:51] 1/10 of the time without the GPU [14:07:02] *than without [14:07:14] GPU VRAM use is about 4.7G [14:07:37] (03PS1) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 [14:07:43] (03PS2) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 [14:08:28] sorry folks for the above, one last push! I tested it locally and it works [14:10:49] (03PS3) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 [14:11:44] (03PS4) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 [14:12:57] (03PS5) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 [14:13:09] import hell [14:15:58] the gpu works :) [14:17:07] (03Abandoned) 10Elukey: blubber: add the transformer dir to outlink's transformer image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982433 (owner: 10Elukey) [14:18:03] (03CR) 10Ilias Sarantopoulos: outlink: upgrade kserve to 0.11.2 (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou) [14:18:49] great work folks setting it up! that was really fast! [14:22:59] (03CR) 10Kevin Bazira: llm: fix circular import (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 (owner: 10Ilias Sarantopoulos) [14:23:19] if somebody has time for a quick change: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/982803/ [14:25:06] thanks :) [14:33:04] (03PS6) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 [14:35:12] (03PS7) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 [14:36:54] kevinbazira: the change doesn't work sadly [14:37:43] (03PS8) 10Ilias Sarantopoulos: llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 [14:38:51] (03CR) 10Ilias Sarantopoulos: llm: fix circular import (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 (owner: 10Ilias Sarantopoulos) [14:38:53] elukey: thank you for helping with this option. I am going to continue looking into other optimization options. :) [14:40:47] (03CR) 10Kevin Bazira: [C: 03+1] llm: fix circular import (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 (owner: 10Ilias Sarantopoulos) [14:41:10] (03CR) 10Ilias Sarantopoulos: [C: 03+2] llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 (owner: 10Ilias Sarantopoulos) [14:41:35] kevinbazira: I'll keep working on the solution, worst case I'll restore OMP_NUM_THREADS [14:42:02] okok, thank you! [14:56:54] (03CR) 10AikoChou: outlink: upgrade kserve to 0.11.2 (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou) [14:58:28] (03CR) 10Ilias Sarantopoulos: [V: 03+2 C: 03+2] llm: fix circular import [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982802 (owner: 10Ilias Sarantopoulos) [14:59:28] (03CR) 10Ilias Sarantopoulos: outlink: upgrade kserve to 0.11.2 (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou) [14:59:32] (03PS2) 10Ilias Sarantopoulos: outlink: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou) [15:01:23] (03CR) 10Ilias Sarantopoulos: [C: 03+1] outlink: upgrade kserve to 0.11.2 (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou) [15:37:28] 10Machine-Learning-Team: Implement caching for revertrisk-multilingual - https://phabricator.wikimedia.org/T353333 (10isarantopoulos) [15:39:16] 10Machine-Learning-Team: Goal: Increase the number of models hosted on Lift Wing - https://phabricator.wikimedia.org/T353335 (10isarantopoulos) [15:54:40] 10Machine-Learning-Team: Goal: Inference Optimization for Hugging face models - https://phabricator.wikimedia.org/T353337 (10isarantopoulos) [15:55:24] 10Machine-Learning-Team: Goal: Inference Optimization for Hugging face/Pytorch models - https://phabricator.wikimedia.org/T353337 (10isarantopoulos) [15:55:40] 10Machine-Learning-Team: Goal: Implement caching for revertrisk-multilingual - https://phabricator.wikimedia.org/T353333 (10isarantopoulos) [15:59:21] 10Machine-Learning-Team: Goal: Expand Lift Wing Cluster and add GPU capacity to production - https://phabricator.wikimedia.org/T353338 (10isarantopoulos) [15:59:51] 10Machine-Learning-Team, 10Goal: Goal: Lift Wing users can request multiple predictions using a single request. - https://phabricator.wikimedia.org/T348153 (10isarantopoulos) p:05Medium→03Triage [16:07:10] one final image update ( famous last words) -> https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/982849 [16:10:52] (03CR) 10AikoChou: [C: 03+2] outlink: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou) [16:14:44] thanks for the review Luca! [16:15:36] (03Merged) 10jenkins-bot: outlink: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982783 (https://phabricator.wikimedia.org/T347549) (owner: 10AikoChou) [16:16:52] np! [16:17:17] so in article-description my hack to add OMP_NUM_THREADS automatically doesn't work [16:17:33] namely torch creates a ton of threads that causes throttling [16:18:50] I've read some reports of people having the same issue with numpy, and the os.environ[] entry needed to be added before the import [16:19:20] in our case this would need to be done in __init__.py, in theory [16:21:28] or model.py [16:21:46] I am doing it in there [16:21:59] but I believe that we import ModelLoader from utils, that carries torch [16:22:07] or, run another script before the model.py command in the same container [16:22:54] wdyt? kind of like a hack but it mimics an init container behavior [16:22:57] but the os env variable will not stick, it is process-based [16:23:32] export works in a bash shell since the processes that you create are are children of it [16:23:44] ok, I take it back! [16:25:35] ok, nllb is failing again. I'll work on it properly tomorrow. the issue I have is that I need to build a different image in order for it to run fast enough on M1, but I'll do that [16:26:03] now I'm missing a requirement 👎 [16:30:19] (03PS1) 10Elukey: article-descriptions: move the OMP_NUM_THREADS declaration sooner [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) [16:30:53] I need to test it but --^ is the idea [16:32:08] (03PS2) 10Elukey: article-descriptions: move the OMP_NUM_THREADS declaration sooner [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) [16:39:10] (03CR) 10Elukey: "Kevin, we'd need to test this on ml-sandbox if possible :(" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) (owner: 10Elukey) [17:06:25] 10Machine-Learning-Team: Upgrade outlink docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347549 (10achou) Next steps: * Roll out the new docker images to ml-staging * Perform load testing to ensure consistent performance * Test the publishing of events from staging to EventGate Regarding the... [17:09:53] the space is almost used on ml-sandbox, I can't build a docker img [17:10:54] ok nice I cleaned up the dangling build cache [17:10:58] reclaimed ~16G [17:11:45] elukey: o/ ----^ I'll need to test outlink of event publishing from staging to EventGate, so I'd like to go for the first option you proposed in https://phabricator.wikimedia.org/T349919 to create a new testing stream specifically for prediction-change events. Wdyt? [17:15:15] I'm going afk folks, will continue on the llm image tomorrow. have a nice evening! [17:15:24] \o heading out as well [17:16:00] 10Machine-Learning-Team: Apply common settings to publish events from Lift Wing staging to EventGate - https://phabricator.wikimedia.org/T349919 (10achou) a:03achou [17:16:20] (03CR) 10Kevin Bazira: article-descriptions: move the OMP_NUM_THREADS declaration sooner (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) (owner: 10Elukey) [17:17:31] bye Ilias and Tobias, have a nice rest of the day :) [17:20:29] kevinbazira: (if you are still online) I don't see any log related to the cpu count from docker logs, do you see any? [17:23:21] elukey: how are you checking for this? [17:23:33] docker logs $id-of-the-container [17:27:02] (03PS3) 10Elukey: article-descriptions: explictly set torch threads [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) [17:27:07] new version :) [17:28:09] (03PS4) 10Elukey: article-descriptions: explictly set torch threads [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) [17:29:24] (03PS5) 10Elukey: article-descriptions: explictly set torch threads [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) [17:29:30] better --^ [17:30:28] ah no it is a function [17:31:06] (03PS6) 10Elukey: article-descriptions: explictly set torch threads [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) [17:31:12] this should work kevinbazira --^ [17:31:41] (if it is late for you we can do it tomorrow) [17:31:41] ok, checking ... [17:32:32] nono since this was planned to be your last day before your leave, we can proceed and complete this today [17:32:48] I am in tomorrow afternoon, don't worry :) [17:32:57] (unless something changes in the meantime :D) [17:40:27] 10Machine-Learning-Team: Apply common settings to publish events from Lift Wing staging to EventGate - https://phabricator.wikimedia.org/T349919 (10Ottomata) My brain doesn't remember exactly what this did, but is > If the eventgate's chart is migrated to the ingress module done now that https://gerrit.wikimed... [17:41:44] aiko: new testing stream should be fine. https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#Stream_versioning [17:43:13] ottomata: o/ [17:43:18] maybe .dev0 suffix? or whatever you prefer. consider settting canary_events_enabled: false [17:43:23] elukey: hewo! [17:43:33] re: staging endpoint for eventgate - I think that we need to add the ingress module [17:43:39] to the chart I mean [17:43:57] it is a big change, I think it needs to be done very carefully [17:44:02] the VIP changes etc.. [17:45:14] oh lkay [17:45:15] okay [17:45:21] i think i dont' know what 'ingress module' means then :) [17:45:50] basically you delegate all the routing etc.. to the Istio ingress [17:46:11] that has a single LVS VIP, and everything cnames to it [17:46:31] it is very handy, but it is a massive change in how the requests are routed [17:46:51] it also has a staging endpoint etc.. [17:47:37] elukey: yep, this change shows the OMP_NUM_THREADS log and the threads have dropped 24 [17:47:57] oh wow [17:48:41] (03CR) 10Kevin Bazira: article-descriptions: explictly set torch threads (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982855 (https://phabricator.wikimedia.org/T352750) (owner: 10Elukey) [17:51:34] I don't see the log entry but we can check tomorrow [17:51:36] thanks for testing! [17:52:34] I think that the last solution may be better [17:52:41] let's also see what others think [17:52:59] (with the assumption that torch.set_num_threads does the same as OMP_NUM_THREADS) [18:00:44] * elukey afk! [18:03:46] please see screenshot below for the log entry: [18:04:12] https://usercontent.irccloud-cdn.com/file/9w0rgNag/OMP_NUM_THREADS%20log.jpg [18:04:36] enjoy your evening o/ [18:06:53] ottomata: o/ ack, l'll work on that tomorrow!