[07:50:24] <isaranto>	 moooorning!
[09:19:25] <aiko>	 Καλημέρα!
[09:25:32] <isaranto>	 早上好 <3
[09:26:26] <isaranto>	 you know I just looked that up 🙂‍↕️
[09:34:16] <aiko>	 hahaha same 🤓
[09:46:41] <wikibugs>	 06Machine-Learning-Team: Support building and running of articletopic-outlink model-server via Makefile - https://phabricator.wikimedia.org/T360177 (10kevinbazira) 03NEW
[09:46:45] <klausman>	 Guten Morgen :)
[09:48:05] <wikibugs>	 06Machine-Learning-Team: Support building and running of articletopic-outlink model-server via Makefile - https://phabricator.wikimedia.org/T360177#9633442 (10kevinbazira) While building the articletopic-outlink model-server locally, the error below is thrown. We encountered a similar error in T357382#9536821 an...
[10:07:52] <isaranto>	 Morgen Tobias!
[11:15:05] <aiko>	 reading the kserve batcher PR. one thing I noticed is that the current behaviour is it will wait for previous batch to finish before starting the next batch
[11:15:31] <aiko>	 it would be better if it were async
[11:19:59] * aiko lunch :D
[11:20:38] <isaranto>	 do you mean that this is what kserve batch does?
[11:20:44] <isaranto>	 I mean the upstream code
[11:43:14] <wikibugs>	 (03PS4) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986)
[11:46:52] <wikibugs>	 (03PS5) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986)
[11:48:02] <isaranto>	 finally made all things work as described in the README. I ended up in a horrible entrypoint as I required conditionals for the final python command
[11:48:59] * klausman lunch
[11:49:19] <wikibugs>	 (03PS6) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986)
[12:32:54] * isaranto lunch
[13:19:12] <elukey>	 hello folks!
[13:21:19] <isaranto>	 \o
[13:24:34] <wikibugs>	 (03CR) 10Elukey: [C:04-1] "As always you are spot-on, I didn't think about this case. Indeed this is our entrypoint for something like revscoring (just an example):" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[13:24:46] <wikibugs>	 (03CR) 10Elukey: [C:03+2] resource_utils.py: add a function to automatically set OMP_NUM_THREADS [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011130 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[13:26:24] <aiko>	 hi luca o/
[13:26:42] <elukey>	 hello hello
[13:26:49] <klausman>	 Buon giorno
[13:26:59] <elukey>	 you are totally right about __init__.py aiko 
[13:27:05] <elukey>	 now I am thinking another way
[13:28:41] <klausman>	 elukey: So Keith +1'd the SLo buckets change. I'm fine with merging it today, but maybe you'd prefer we wait 'til Monday?
[13:28:48] <elukey>	 yes please
[13:28:56] <klausman>	 Which of the two? :D
[13:29:26] <elukey>	 the latter, deploying on a friday is not a great idea (especially since it is not a system that we own and the whole SRE team is in Poland)
[13:30:19] <chrisalbon>	 Good morning alll
[13:30:28] <elukey>	 morning!
[13:30:52] <isaranto>	 Hey Chris!
[13:30:57] * isaranto afk for a bit
[13:31:03] <klausman>	 alright, will wait 'til Monday
[13:31:09] <aiko>	 hi Chris! 
[13:32:14] <elukey>	 aiko: so I am trying to find a good place for set_omp_num_threads, before torch is imported.. I can't use __main__ (that would be ideal) since in theory it happens after all the imports are evaluated
[13:34:12] <wikibugs>	 (03Merged) 10jenkins-bot: resource_utils.py: add a function to automatically set OMP_NUM_THREADS [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011130 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[13:34:15] <elukey>	 we could put it at the very top of model.py, but our linters rightfully complain that is not a good idea
[13:36:34] <aiko>	 elukey: mmm I can't think of other place we can place it 
[13:38:35] <aiko>	 I was also thinking to put it in __main__, but yeah it happens after imports
[13:40:14] <aiko>	 does it really need to be set before torch is imported?
[13:40:22] <elukey>	 as far as I know yes
[13:40:30] <elukey>	 same thing for other things like numpy etc..
[13:40:49] <elukey>	 maybe some libraries do allow to set it after improt
[13:40:52] <elukey>	 *import
[13:41:54] <aiko>	 I see
[13:43:17] <elukey>	 so there is a cleaner way, but not sure if you folks like it
[13:43:42] <elukey>	 we have __main__ inside model.py, and we run it via blubber config
[13:44:13] <elukey>	 now if we had a runner.py containing only __main__, we could easily set variables etc.. before anything from model.py is imported
[13:44:21] <elukey>	 and of course we'd run runner.py
[13:45:54] <aiko>	 so the entrypoint is runner.py and in runner.py we import model.py?
[13:46:02] <elukey>	 exacly yes
[13:46:05] <aiko>	 where do we put the runner.py?
[13:46:42] <wikibugs>	 (03PS5) 10Elukey: [WIP] readability: set automatically OMP_NUM_THREADS [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111)
[13:46:45] <elukey>	 aiko: --^
[13:46:48] <elukey>	 like this 
[13:47:05] <elukey>	 just an idea, you folks can tell me "no absolutely not!"
[13:47:07] <elukey>	 :D
[13:48:22] <aiko>	 I think it is a good idea!
[13:48:32] <wikibugs>	 (03PS6) 10Elukey: [WIP] readability: set automatically OMP_NUM_THREADS [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111)
[13:48:36] <elukey>	 fixed one little thing
[13:48:55] <elukey>	 aiko: can you tell me how you tested it on stat1008? So I can try to see if any of it works :D
[13:48:59] <elukey>	 (if you have time)
[13:49:34] <elukey>	 ah no wait
[13:49:46] <elukey>	 from model_server import ReadabilityModel happens before set_omp_num_threads
[13:49:50] * elukey cries in a corner
[13:50:54] <aiko>	 ahhh right
[13:51:41] <aiko>	 it's an endless loop 😂
[13:52:20] <isaranto>	  /me back and catching up to jump in the conversation
[13:52:35] <wikibugs>	 (03PS7) 10Elukey: [WIP] readability: set automatically OMP_NUM_THREADS [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111)
[13:52:48] <elukey>	 this is the only thing I can think of --^
[13:52:53] <elukey>	 not really clean though
[13:53:19] <aiko>	 isaranto: o/ re: batcher: yes that is how the upstream code does
[13:55:56] <aiko>	 elukey: nice! worth to test it 
[13:56:19] <elukey>	 let's see what others think, Ilias may not like it at all :D
[13:57:54] <isaranto>	 what I like is irrelevant. I'm an adult and I've learned that things won't go the way I like anywayz :P 
[13:58:20] <aiko>	 about testing on stat1008, I use conda to install required deps, download the model,  and then run the model server "locally" on stat1008 (like how the makefile does)
[13:58:20] <elukey>	 very diplomatic way to tell Luca that his code is gargabe
[13:58:29] <elukey>	 :D
[13:58:44] <elukey>	 aiko: ahh ok we have a Makefile, nice
[13:58:53] <isaranto>	 nono it was bad humor attempt
[13:59:02] <elukey>	 ahahahh nono I am joking
[14:01:34] <isaranto>	 elukey: I'm all in if this works, it isn't bad at all
[14:01:50] <aiko>	 elukey: Makefile uses virtualenv as default, but the problem is the python that venv uses is too old on stat1008, so I turned to use conda
[14:02:04] <elukey>	 aiko: nice :)
[14:02:13] <elukey>	 isaranto: thanks! Somehow I feel bad in proposing this code though
[14:02:20] <elukey>	 I'll try to see if it works
[14:03:04] <isaranto>	 another alternative similar to what you're doing would be the following: create a module named `setcpucount` and in that file run the function. then on import it would be executed and the env var would be set
[14:05:57] * elukey nods
[14:05:59] <isaranto>	 lemme know if what I'm saying is clear. tbh not really different from what you're doing 
[14:06:00] <elukey>	 I am reading https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html
[14:06:23] <elukey>	 have we tried to set torch.set_num_threads ?
[14:06:26] <isaranto>	 it will just hide all the ugliness in a module
[14:06:29] <elukey>	 I don't recall
[14:06:43] <elukey>	 that would be way easier
[14:06:47] <isaranto>	 I really really don't remember
[14:08:33] <aiko>	 wow
[14:09:38] <elukey>	 so in theory we should probably set set_num_interop_threads and set_num_threads to the same number
[14:12:50] <aiko>	 I think we didn't try torch.set_num_threads before
[14:13:58] <wikibugs>	 (03PS8) 10Elukey: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111)
[14:14:14] <elukey>	 ok so this is my new proposal :D --^
[14:15:00] <elukey>	 does it make sense?
[14:15:58] <elukey>	 maybe with a little logging
[14:17:02] <wikibugs>	 (03PS9) 10Elukey: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111)
[14:22:04] <wikibugs>	 (03CR) 10AikoChou: [C:03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[14:26:02] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] readability: set torch threads using get_cpu_count (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[14:29:36] <wikibugs>	 (03CR) 10Elukey: readability: set torch threads using get_cpu_count (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[14:30:45] <wikibugs>	 (03PS10) 10Elukey: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111)
[14:30:57] <wikibugs>	 (03CR) 10Elukey: readability: set torch threads using get_cpu_count (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[14:31:46] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[15:18:44] <wikibugs>	 06Machine-Learning-Team: Support building and running of articletopic-outlink model-server via Makefile - https://phabricator.wikimedia.org/T360177#9634149 (10kevinbazira) Currently, the method that loads a model has a hardcoded model path. When we set the path through an environmental variable, the error below...
[15:31:19] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: huggingface: add huggingface image (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos)
[15:47:50] <wikibugs>	 (03PS1) 10Kevin Bazira: articletopic-outlink: load model path from environment variable [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1010979 (https://phabricator.wikimedia.org/T360177)
[15:49:31] <elukey>	 I am trying `make readability` from inference-services, and I got an error with then nltk downloader
[15:49:34] <elukey>	 ModuleNotFoundError: No module named '_sqlite3'
[15:49:36] <elukey>	 happened to somebody else?
[15:51:36] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] "Nice!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1010979 (https://phabricator.wikimedia.org/T360177) (owner: 10Kevin Bazira)
[15:51:55] <isaranto>	 lemme try it now Luca
[15:52:23] <elukey>	 I am on stat1008, it may be py3.11 related
[15:52:27] <elukey>	 thanks :)
[15:52:28] <wikibugs>	 (03PS1) 10AikoChou: revertrisk: improve error messages [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278)
[15:52:56] <isaranto>	 oh my
[15:53:08] <isaranto>	 I just ran an rm -rf /* just like last Friday
[15:53:12] <isaranto>	 what is going on???
[15:53:21] <elukey>	 on your laptop?
[15:54:02] <isaranto>	 Yes :)
[15:54:20] <isaranto>	 At least now I know what happens
[15:54:27] <isaranto>	 I have to reinstall browser
[15:56:18] <klausman>	 I think it's a sign to end the week ;)
[15:57:40] <isaranto>	 iirc python 3.11 will surely fail cause there is no pytorch 1.13 available
[15:57:46] <elukey>	 klausman: FYI I am trying some special telemetry settings in staging to reduce the number of labels via istio config
[15:58:04] <elukey>	 shouldn't change much but in case something weird pops up lemme know
[15:58:07] <isaranto>	 no there is nevermind this is mac specific
[15:58:08] <klausman>	 ah, doing it live from deployment server or are there patches?
[15:58:24] <elukey>	 the former
[15:58:34] <elukey>	 I am not sure if it works or not, need to tes tit
[15:58:36] <elukey>	 *tes tit
[15:58:41] <elukey>	 *test it
[15:59:03] <elukey>	 need to relearn to type on a keyboard
[15:59:12] <klausman>	 Is this basically the same approach as https://github.com/istio/istio/issues/38841 ?
[15:59:18] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1010979 (https://phabricator.wikimedia.org/T360177) (owner: 10Kevin Bazira)
[15:59:34] <elukey>	 basically yes, I am checking if it works and if we have duplicated metrics
[15:59:44] <klausman>	 :+1:
[15:59:46] <elukey>	 https://istio.io/latest/docs/tasks/observability/metrics/telemetry-api/
[16:00:04] <wikibugs>	 (03Merged) 10jenkins-bot: articletopic-outlink: load model path from environment variable [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1010979 (https://phabricator.wikimedia.org/T360177) (owner: 10Kevin Bazira)
[16:08:11] <elukey>	 from https://stackoverflow.com/questions/43993890/modulenotfounderror-no-module-named-sqlite3 it seems that they suggest to rebuild python, that seems a little too much
[16:08:30] <elukey>	 https://github.com/nltk/nltk/issues/2521
[16:15:14] <isaranto>	 w8.. stat1008 has 3.7 python 
[16:16:01] <wikibugs>	 (03CR) 10AikoChou: "*https://ores-legacy.wikimedia.org/v3/scores/rowiki/15925124?models=damaging" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) (owner: 10AikoChou)
[16:19:00] <isaranto>	 elukey: I can't replicate but I'm sure I've encountered it in the past
[16:23:46] <elukey>	 isaranto: good point, I had pyenv overriding it, lemme try with the system version
[16:23:54] <elukey>	 could be a pebcak from me
[16:24:48] <isaranto>	 I found another issue though we dont have pyopencl installed. other than that runs fine for me
[16:27:14] <elukey>	 exactly same for me
[16:27:20] <elukey>	 ERROR: No matching distribution found for pyopencl==2024.1
[16:29:40] <isaranto>	 pip install -r python/requirements.txt
[16:31:00] <elukey>	 also no matching distro for kserve 0.11.2
[16:31:40] <isaranto>	 that would be python related but weird
[16:32:08] <isaranto>	 a if it is python 3.7 then it would make sense. No support anymore
[16:32:11] <wikibugs>	 06Machine-Learning-Team: Add GPU check in all images - https://phabricator.wikimedia.org/T360212 (10isarantopoulos) 03NEW
[16:32:57] <isaranto>	 I'm going afk folks. I created a task --^ so that it doesn't fall through the cracks. Have a nice evening and cu next week <#
[16:33:02] <isaranto>	 <3
[16:33:05] <elukey>	 o/
[16:35:03] <aiko>	 I use conda to create an env, and then still use pip to install dependencies
[16:35:08] <aiko>	 conda-analytics-clone my-venv
[16:35:13] <aiko>	 source conda-analytics-activate my-venv
[16:35:17] <aiko>	 the python version will be 3.10
[16:35:56] <aiko>	 bye Ilias! have a nice weekend :)
[16:43:31] <elukey>	 aiko: yep yep makes sense, but I think we may have the same issue with conda as well if libsqlite3-dev is not installed
[16:43:45] <elukey>	 tried to add it on stat1008 now, and I am deploying pyenv 3.8
[16:43:57] <elukey>	 if it works I'll propose a patch to DE to have it permanently
[16:44:12] <elukey>	 it is a problem every time we run something that use nltk
[16:56:49] <elukey>	 ah lovely I see that I cannot bind on port 8081 since Aiko is using it
[16:57:13] <aiko>	 ahh sorry I closed it
[16:57:14] <elukey>	 feature worth to add, namely to specify the port
[16:57:33] <elukey>	 aiko: nono please! I think we just need to have a different port if needed!
[16:57:41] <elukey>	 please check if you are testing :)
[16:57:59] <elukey>	 I managed to make it work with pyenv and python 3.10
[16:58:05] <aiko>	 I'm not testing rn :)
[16:58:21] <aiko>	 cool!
[16:59:04] <elukey>	 super it works :)
[16:59:20] <aiko>	 \o/
[17:02:38] <elukey>	 and I see the number of threads that vary when I change the cpu count (via env variable)
[17:02:49] <elukey>	 more or less
[17:03:38] <wikibugs>	 (03CR) 10Elukey: [C:03+2] "Tested on stat1008, it seems to work, but I'll have to see in staging if the threads trick really work." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[17:03:44] <elukey>	 aiko: thanks for the support!
[17:06:27] <wikibugs>	 (03PS11) 10Elukey: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111)
[17:08:08] <wikibugs>	 06Machine-Learning-Team, 13Patch-For-Review: Set automatically libomp's num threads when using Pytorch - https://phabricator.wikimedia.org/T360111#9634518 (10elukey) The main issue with the approach of setting the OMP_NUM_THREADS variable is that IIUC it needs to be set before torch is imported/initialized, th...
[17:11:43] <aiko>	 elucky: anytime <3
[17:11:54] <aiko>	 elukey: <3
[17:15:41] <wikibugs>	 (03CR) 10Elukey: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[17:15:45] <wikibugs>	 (03CR) 10Elukey: [C:03+2] readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[17:16:34] <wikibugs>	 (03Merged) 10jenkins-bot: readability: set torch threads using get_cpu_count [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011131 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey)
[17:22:45] <aiko>	 going afk o/ enjoy the weekend folks :D 
[17:23:24] <elukey>	 you too!
[17:27:09] <elukey>	 going afk as well!
[17:27:14] <elukey>	 have a nice weekend all
[17:46:37] <klausman>	 \o