[07:50:24] moooorning! [09:19:25] Καλημέρα! [09:25:32] 早上好 <3 [09:26:26] you know I just looked that up 🙂‍↕️ [09:34:16] hahaha same 🤓 [09:46:41] 06Machine-Learning-Team: Support building and running of articletopic-outlink model-server via Makefile - https://phabricator.wikimedia.org/T360177 (10kevinbazira) 03NEW [09:46:45] Guten Morgen :) [09:48:05] 06Machine-Learning-Team: Support building and running of articletopic-outlink model-server via Makefile - https://phabricator.wikimedia.org/T360177#9633442 (10kevinbazira) While building the articletopic-outlink model-server locally, the error below is thrown. We encountered a similar error in T357382#9536821 an... [10:07:52] Morgen Tobias! [11:15:05] reading the kserve batcher PR. one thing I noticed is that the current behaviour is it will wait for previous batch to finish before starting the next batch [11:15:31] it would be better if it were async [11:19:59] * aiko lunch :D [11:20:38] do you mean that this is what kserve batch does? [11:20:44] I mean the upstream code [11:43:14] (03PS4) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) [11:46:52] (03PS5) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) [11:48:02] finally made all things work as described in the README. I ended up in a horrible entrypoint as I required conditionals for the final python command [11:48:59] * klausman lunch [11:49:19] (03PS6) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) [12:32:54] * isaranto lunch [13:19:12] hello folks! [13:21:19] \o [13:24:34] (03CR) 10Elukey: [C:04-1] "As always you are spot-on, I didn't think about this case. Indeed this is our entrypoint for something like revscoring (just an example):" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [13:24:46] (03CR) 10Elukey: [C:03+2] resource_utils.py: add a function to automatically set OMP_NUM_THREADS [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011130 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [13:26:24] hi luca o/ [13:26:42] hello hello [13:26:49] Buon giorno [13:26:59] you are totally right about __init__.py aiko [13:27:05] now I am thinking another way [13:28:41] elukey: So Keith +1'd the SLo buckets change. I'm fine with merging it today, but maybe you'd prefer we wait 'til Monday? [13:28:48] yes please [13:28:56] Which of the two? :D [13:29:26] the latter, deploying on a friday is not a great idea (especially since it is not a system that we own and the whole SRE team is in Poland) [13:30:19] Good morning alll [13:30:28] morning! [13:30:52] Hey Chris! [13:30:57] * isaranto afk for a bit [13:31:03] alright, will wait 'til Monday [13:31:09] hi Chris! [13:32:14] aiko: so I am trying to find a good place for set_omp_num_threads, before torch is imported.. I can't use __main__ (that would be ideal) since in theory it happens after all the imports are evaluated [13:34:12] (03Merged) 10jenkins-bot: resource_utils.py: add a function to automatically set OMP_NUM_THREADS [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011130 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [13:34:15] we could put it at the very top of model.py, but our linters rightfully complain that is not a good idea [13:36:34] elukey: mmm I can't think of other place we can place it [13:38:35] I was also thinking to put it in __main__, but yeah it happens after imports [13:40:14] does it really need to be set before torch is imported? [13:40:22] as far as I know yes [13:40:30] same thing for other things like numpy etc.. [13:40:49] maybe some libraries do allow to set it after improt [13:40:52] *import [13:41:54] I see [13:43:17] so there is a cleaner way, but not sure if you folks like it [13:43:42] we have __main__ inside model.py, and we run it via blubber config [13:44:13] now if we had a runner.py containing only __main__, we could easily set variables etc.. before anything from model.py is imported [13:44:21] and of course we'd run runner.py [13:45:54] so the entrypoint is runner.py and in runner.py we import model.py? [13:46:02] exacly yes [13:46:05] where do we put the runner.py? [13:46:42] (03PS5) 10Elukey: [WIP] readability: set automatically OMP_NUM_THREADS [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) [13:46:45] aiko: --^ [13:46:48] like this [13:47:05] just an idea, you folks can tell me "no absolutely not!" [13:47:07] :D [13:48:22] I think it is a good idea! [13:48:32] (03PS6) 10Elukey: [WIP] readability: set automatically OMP_NUM_THREADS [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) [13:48:36] fixed one little thing [13:48:55] aiko: can you tell me how you tested it on stat1008? So I can try to see if any of it works :D [13:48:59] (if you have time) [13:49:34] ah no wait [13:49:46] from model_server import ReadabilityModel happens before set_omp_num_threads [13:49:50] * elukey cries in a corner [13:50:54] ahhh right [13:51:41] it's an endless loop 😂 [13:52:20]  /me back and catching up to jump in the conversation [13:52:35] (03PS7) 10Elukey: [WIP] readability: set automatically OMP_NUM_THREADS [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) [13:52:48] this is the only thing I can think of --^ [13:52:53] not really clean though [13:53:19] isaranto: o/ re: batcher: yes that is how the upstream code does [13:55:56] elukey: nice! worth to test it [13:56:19] let's see what others think, Ilias may not like it at all :D [13:57:54] what I like is irrelevant. I'm an adult and I've learned that things won't go the way I like anywayz :P [13:58:20] about testing on stat1008, I use conda to install required deps, download the model, and then run the model server "locally" on stat1008 (like how the makefile does) [13:58:20] very diplomatic way to tell Luca that his code is gargabe [13:58:29] :D [13:58:44] aiko: ahh ok we have a Makefile, nice [13:58:53] nono it was bad humor attempt [13:59:02] ahahahh nono I am joking [14:01:34] elukey: I'm all in if this works, it isn't bad at all [14:01:50] elukey: Makefile uses virtualenv as default, but the problem is the python that venv uses is too old on stat1008, so I turned to use conda [14:02:04] aiko: nice :) [14:02:13] isaranto: thanks! Somehow I feel bad in proposing this code though [14:02:20] I'll try to see if it works [14:03:04] another alternative similar to what you're doing would be the following: create a module named `setcpucount` and in that file run the function. then on import it would be executed and the env var would be set [14:05:57] * elukey nods [14:05:59] lemme know if what I'm saying is clear. tbh not really different from what you're doing [14:06:00] I am reading https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html [14:06:23] have we tried to set torch.set_num_threads ? [14:06:26] it will just hide all the ugliness in a module [14:06:29] I don't recall [14:06:43] that would be way easier [14:06:47] I really really don't remember [14:08:33] wow [14:09:38] so in theory we should probably set set_num_interop_threads and set_num_threads to the same number [14:12:50] I think we didn't try torch.set_num_threads before [14:13:58] (03PS8) 10Elukey: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) [14:14:14] ok so this is my new proposal :D --^ [14:15:00] does it make sense? [14:15:58] maybe with a little logging [14:17:02] (03PS9) 10Elukey: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) [14:22:04] (03CR) 10AikoChou: [C:03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [14:26:02] (03CR) 10Ilias Sarantopoulos: [C:03+1] readability: set torch threads using get_cpu_count (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [14:29:36] (03CR) 10Elukey: readability: set torch threads using get_cpu_count (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [14:30:45] (03PS10) 10Elukey: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) [14:30:57] (03CR) 10Elukey: readability: set torch threads using get_cpu_count (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [14:31:46] (03CR) 10Ilias Sarantopoulos: [C:03+1] readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [15:18:44] 06Machine-Learning-Team: Support building and running of articletopic-outlink model-server via Makefile - https://phabricator.wikimedia.org/T360177#9634149 (10kevinbazira) Currently, the method that loads a model has a hardcoded model path. When we set the path through an environmental variable, the error below... [15:31:19] (03CR) 10Ilias Sarantopoulos: huggingface: add huggingface image (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [15:47:50] (03PS1) 10Kevin Bazira: articletopic-outlink: load model path from environment variable [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1010979 (https://phabricator.wikimedia.org/T360177) [15:49:31] I am trying `make readability` from inference-services, and I got an error with then nltk downloader [15:49:34] ModuleNotFoundError: No module named '_sqlite3' [15:49:36] happened to somebody else? [15:51:36] (03CR) 10Ilias Sarantopoulos: [C:03+1] "Nice!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1010979 (https://phabricator.wikimedia.org/T360177) (owner: 10Kevin Bazira) [15:51:55] lemme try it now Luca [15:52:23] I am on stat1008, it may be py3.11 related [15:52:27] thanks :) [15:52:28] (03PS1) 10AikoChou: revertrisk: improve error messages [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) [15:52:56] oh my [15:53:08] I just ran an rm -rf /* just like last Friday [15:53:12] what is going on??? [15:53:21] on your laptop? [15:54:02] Yes :) [15:54:20] At least now I know what happens [15:54:27] I have to reinstall browser [15:56:18] I think it's a sign to end the week ;) [15:57:40] iirc python 3.11 will surely fail cause there is no pytorch 1.13 available [15:57:46] klausman: FYI I am trying some special telemetry settings in staging to reduce the number of labels via istio config [15:58:04] shouldn't change much but in case something weird pops up lemme know [15:58:07] no there is nevermind this is mac specific [15:58:08] ah, doing it live from deployment server or are there patches? [15:58:24] the former [15:58:34] I am not sure if it works or not, need to tes tit [15:58:36] *tes tit [15:58:41] *test it [15:59:03] need to relearn to type on a keyboard [15:59:12] Is this basically the same approach as https://github.com/istio/istio/issues/38841 ? [15:59:18] (03CR) 10Kevin Bazira: [C:03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1010979 (https://phabricator.wikimedia.org/T360177) (owner: 10Kevin Bazira) [15:59:34] basically yes, I am checking if it works and if we have duplicated metrics [15:59:44] :+1: [15:59:46] https://istio.io/latest/docs/tasks/observability/metrics/telemetry-api/ [16:00:04] (03Merged) 10jenkins-bot: articletopic-outlink: load model path from environment variable [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1010979 (https://phabricator.wikimedia.org/T360177) (owner: 10Kevin Bazira) [16:08:11] from https://stackoverflow.com/questions/43993890/modulenotfounderror-no-module-named-sqlite3 it seems that they suggest to rebuild python, that seems a little too much [16:08:30] https://github.com/nltk/nltk/issues/2521 [16:15:14] w8.. stat1008 has 3.7 python [16:16:01] (03CR) 10AikoChou: "*https://ores-legacy.wikimedia.org/v3/scores/rowiki/15925124?models=damaging" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) (owner: 10AikoChou) [16:19:00] elukey: I can't replicate but I'm sure I've encountered it in the past [16:23:46] isaranto: good point, I had pyenv overriding it, lemme try with the system version [16:23:54] could be a pebcak from me [16:24:48] I found another issue though we dont have pyopencl installed. other than that runs fine for me [16:27:14] exactly same for me [16:27:20] ERROR: No matching distribution found for pyopencl==2024.1 [16:29:40] pip install -r python/requirements.txt [16:31:00] also no matching distro for kserve 0.11.2 [16:31:40] that would be python related but weird [16:32:08] a if it is python 3.7 then it would make sense. No support anymore [16:32:11] 06Machine-Learning-Team: Add GPU check in all images - https://phabricator.wikimedia.org/T360212 (10isarantopoulos) 03NEW [16:32:57] I'm going afk folks. I created a task --^ so that it doesn't fall through the cracks. Have a nice evening and cu next week <# [16:33:02] <3 [16:33:05] o/ [16:35:03] I use conda to create an env, and then still use pip to install dependencies [16:35:08] conda-analytics-clone my-venv [16:35:13] source conda-analytics-activate my-venv [16:35:17] the python version will be 3.10 [16:35:56] bye Ilias! have a nice weekend :) [16:43:31] aiko: yep yep makes sense, but I think we may have the same issue with conda as well if libsqlite3-dev is not installed [16:43:45] tried to add it on stat1008 now, and I am deploying pyenv 3.8 [16:43:57] if it works I'll propose a patch to DE to have it permanently [16:44:12] it is a problem every time we run something that use nltk [16:56:49] ah lovely I see that I cannot bind on port 8081 since Aiko is using it [16:57:13] ahh sorry I closed it [16:57:14] feature worth to add, namely to specify the port [16:57:33] aiko: nono please! I think we just need to have a different port if needed! [16:57:41] please check if you are testing :) [16:57:59] I managed to make it work with pyenv and python 3.10 [16:58:05] I'm not testing rn :) [16:58:21] cool! [16:59:04] super it works :) [16:59:20] \o/ [17:02:38] and I see the number of threads that vary when I change the cpu count (via env variable) [17:02:49] more or less [17:03:38] (03CR) 10Elukey: [C:03+2] "Tested on stat1008, it seems to work, but I'll have to see in staging if the threads trick really work." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [17:03:44] aiko: thanks for the support! [17:06:27] (03PS11) 10Elukey: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) [17:08:08] 06Machine-Learning-Team, 13Patch-For-Review: Set automatically libomp's num threads when using Pytorch - https://phabricator.wikimedia.org/T360111#9634518 (10elukey) The main issue with the approach of setting the OMP_NUM_THREADS variable is that IIUC it needs to be set before torch is imported/initialized, th... [17:11:43] elucky: anytime <3 [17:11:54] elukey: <3 [17:15:41] (03CR) 10Elukey: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [17:15:45] (03CR) 10Elukey: [C:03+2] readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [17:16:34] (03Merged) 10jenkins-bot: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [17:22:45] going afk o/ enjoy the weekend folks :D [17:23:24] you too! [17:27:09] going afk as well! [17:27:14] have a nice weekend all [17:46:37] \o