[06:29:16] Good morning! [06:30:02] I'll go through the above with Mercelis today as we're meeting later on. [10:09:17] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Growth-Team, 10Growth-Team-Filtering, and 2 others: Indicators for problematic changes (r) are missing from RC - https://phabricator.wikimedia.org/T248557#9672534 (10matej_suchanek) [10:09:31] (03PS1) 10Matěj Suchánek: Show "r" flag regardless of UI if enabled [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1015516 (https://phabricator.wikimedia.org/T248557) [10:49:25] 06Machine-Learning-Team, 10Structured-Data-Backlog (Current Work): Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9672568 (10CodeReviewBot) mfossati merged https://gitlab.wikimedia.org/mfossati/scriptz/-/merge_requests/7 lw_prototype: validate input data [10:55:44] 06Machine-Learning-Team, 13Patch-For-Review: Use Huggingface model server image for HF LLMs - https://phabricator.wikimedia.org/T357986#9672571 (10isarantopoulos) It seems that the most prominent solution at the moment that would not require so many hacks and forks would be to use a pytorch 2.1.2 and rocm5.5 b... [11:07:40] 06Machine-Learning-Team, 13Patch-For-Review, 10Structured-Data-Backlog (Current Work): Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9672573 (10CodeReviewBot) kevinbazira opened https://gitlab.wikimedia.org/mfossati/scriptz/-/merge_requests/8 lw_prototype: image... [11:17:51] * isaranto lunch! [12:29:06] I made hf server work with pytorch 2.1.2 base image I created yesterday. Now I'm looking at what happens to the image size when I remove the torch dependency from all the other packages to avoid what Luca described yesterday [12:29:09] referring to this https://phabricator.wikimedia.org/T360638#9670506 [13:02:09] Goood morning all [13:03:25] Hey Chris! [13:09:00] That feeling when you are getting better from feeling sick is the best feeling [13:11:03] nice! [13:11:11] (03PS2) 10Matěj Suchánek: Show "r" flag regardless of UI if enabled [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1015516 (https://phabricator.wikimedia.org/T248557) [13:19:00] (03CR) 10Jsn.sherman: Exclude first/only revision on page from scoring (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014572 (https://phabricator.wikimedia.org/T356281) (owner: 10Jsn.sherman) [13:23:08] 06Machine-Learning-Team, 13Patch-For-Review: Create a Pytorch base image - https://phabricator.wikimedia.org/T360638#9672857 (10isarantopoulos) I noticed something odd in the base image. When I import torch inside the image I get a warning about numpy missing: ` /usr/local/lib/python3.11/dist-packages/torch/nn... [13:39:32] hello folks! [13:41:44] \o/ [13:42:03] isaranto: re: numpy, I haven't checked but is it installed when you do pip install torch? (the vanilla version) [13:43:09] in the meantime I am going to test a new idea for the base image, hope that it works [13:44:01] basically I'll try to tell pip to install torch under /opt/lib/python/base-packages [13:44:28] and I'll add a symlink in site-packages [13:45:12] elukey: I just checked and it is not. sumpy is. perhaps the pyproject.toml I linked is not the one used [13:46:03] we can add numpy if needed to the base image [13:46:33] ok, I'll have to investigate why this happens [13:47:32] probably on Monday. for today I will "finish" this first work with the huggingface image [13:47:50] the image I end up with is 15.7 GB [13:47:58] I hope to give you a working pytorch image soon-ish [13:48:14] namely that you can specify torch as dependency in the requirements.txt [13:59:46] perhaps I should be helping you instead of working on the hf image [13:59:55] lemme know if there is anything I can check/test/do on my side [14:00:52] going to send you a patch soon, I am building rr-ml's image to see if I solved the 2x size issue [14:00:56] and if it works :D [14:01:39] pip seems in need to actually re-download the torch-rocm wheel before figuring out that torch is already there, not great but also not a huge deal [14:04:33] iirc it shouldn't do that as it looks in the site-packages and would see that the version is already there , but going through what you have posted don't know why exactly it happens [14:05:48] also I think blubber first does pip install and then copies files, so we wouldn't be able to copy the site-packages before pip installing [14:17:26] 10Lift-Wing, 06Machine-Learning-Team: Determine a structure for the python package repository - https://phabricator.wikimedia.org/T361370 (10isarantopoulos) 03NEW [14:18:04] 10Lift-Wing, 06Machine-Learning-Team: Determine a structure for the python package repository - https://phabricator.wikimedia.org/T361370#9672924 (10isarantopoulos) @Mercelisvaughan Apart from the official documentation other resources may be also be useful, so feel free to add them here if you find something... [14:28:17] isaranto: https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1015530 [14:28:20] this is the idea [14:28:39] rr-ml currently gives me some troubles when bootstrapping, it goes OOM when loading the model [14:29:31] didn't happened before, weird [14:30:44] but if I simply run bash and import torch it works [14:31:25] I give up, let AI take over all writing. [14:31:58] (goes back to writing.... sigh) [14:32:33] * elukey sends some hugs to chrisalbon [14:36:51] elukey: I'll check! seems nice though [14:38:27] I am curious if it works for hf [14:39:29] my main goal is to avoid the need of tweaking our requirements.txt to remove torch [14:39:38] because it will surely lead to nightmares [14:42:49] ok so I also did another experiment: set a completely different torch version in the rr-ml's requirement.txt [14:42:56] (a cpu-only one etc..) [14:43:24] the torch that python imports is the rocm one (from the base image), the main downside is that some nvidia packages sneak in [14:43:37] because only torch is not touched [14:43:46] > my main goal is to avoid the need of tweaking our requirements.txt to remove torch [14:43:47] > because it will surely lead to nightmares [14:43:47] I agree with this. cause the issue will arise when we install other packages that have torch as a dependency [14:44:22] yes this thing with the nvidia packages happens to me as well [14:44:56] okok so maybe we have a good version now [14:45:07] if I only make rr-ml work locally without oom [14:51:37] the build that was previously successful is now failing because of a dependency that wasn't there before [14:51:41] 🤯 [14:52:02] and they say it is the age of AI [14:52:28] nevermind could be a cache issue, a git clone I have in blubber seems to be cloning an old version of repo [14:54:13] ah I managed to make rr-ml working [15:01:29] it is interesting that rr-ml now fails with [15:01:30] ModuleNotFoundError: No module named 'torchgen' [15:04:03] File "/opt/lib/python/site-packages/torch/utils/_python_dispatch.py", line 7, in [15:04:06] import torchgen [15:04:42] ahhh no [15:04:49] torchgen is in base-packages [15:04:53] and I haven't linked it [15:05:01] oh [15:05:39] just fyi I am reviewing your change by making the same thing for torch2.1.2-rocm5.5 [15:06:01] yeah it is more complicated sigh, I need to link more things [15:09:23] I am getting an error `groupadd: invalid group ID 'somebody'` https://phabricator.wikimedia.org/P59011 [15:09:56] assuming my copy paste from your patch is correct [15:20:36] isaranto: you have also to modify config.yaml [15:20:55] (with the new user) [15:20:59] (sorry I was afk) [15:21:13] oh yes [15:21:26] I saw that in the patch but my mind didn't read it [15:22:07] thanks! [15:22:14] np :) [15:27:25] (03PS21) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) [15:27:59] (03PS22) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) [15:29:06] I've marke the above patch as WIP again [15:29:31] *marked [15:38:11] still building the image... [15:38:46] will continue with the hf image based on this one [15:39:05] I believe in you [15:39:26] elukey: I have to go now is it ok if I don't +1 on this and leave it for Monday? [15:40:18] isaranto: oh yes the change still need some work, I'll be off on Monday but I'll keep working on it on Tue! [15:41:25] ok, logging off then, have a nice weekend (and long weekend if that applies :) [15:41:41] btw I'm here on Monday (orthodox easter is more than a month away) [15:43:53] o/ [17:24:04] isaranto: https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1015530 is ready for your tests on Monday, I was able to add the missing symlinks and I tested everything with RR-ML. Hopefully it works fine for HF, lemme know on Tue :) [17:24:37] not sure if Janis will like the new Docker image, but I coudn't think another way to solve the issue [17:24:42] fingers crossed [17:25:00] logging off for the long weekend folks, have a nice one! [18:05:13] night elukey!