[08:08:50] (03PS1) 10Elukey: articlequality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844916 (https://phabricator.wikimedia.org/T320374) [08:09:04] hello :) [08:11:09] (03CR) 10CI reject: [V: 04-1] articlequality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844916 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [08:13:39] what [08:13:42] ahh black [08:14:28] (03PS2) 10Elukey: articlequality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844916 (https://phabricator.wikimedia.org/T320374) [08:22:42] (03PS1) 10Elukey: draftquality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844917 (https://phabricator.wikimedia.org/T320374) [08:28:35] (03PS1) 10Elukey: topic: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844919 (https://phabricator.wikimedia.org/T320374) [08:29:02] all right code reviews for the aiohttp client session out [08:42:56] (03CR) 10AikoChou: articlequality: improve aiohttp client session handling (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844916 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [08:45:17] (03CR) 10AikoChou: [C: 03+1] draftquality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844917 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [08:45:32] (03CR) 10AikoChou: [C: 03+1] topic: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844919 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [08:45:52] morning :) [08:54:42] aiko: o/ [08:54:47] I was reading https://stackoverflow.com/questions/55027940/is-run-in-executor-optimized-for-running-in-a-loop-with-coroutines [08:55:08] I am wondering if we should run self.model.score via loop.run_in_executor [08:56:21] just tested it locally and it seems working [09:01:07] (03PS3) 10Elukey: articlequality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844916 (https://phabricator.wikimedia.org/T320374) [09:01:09] (03PS2) 10Elukey: draftquality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844917 (https://phabricator.wikimedia.org/T320374) [09:01:11] (03PS2) 10Elukey: topic: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844919 (https://phabricator.wikimedia.org/T320374) [09:01:26] (03CR) 10Elukey: articlequality: improve aiohttp client session handling (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844916 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:11:33] (03PS1) 10Elukey: editquality: run the score method in a separate thread [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844923 (https://phabricator.wikimedia.org/T320374) [09:11:37] aiko: basically --^ [09:12:37] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Connect Outlink topic model to eventgate - https://phabricator.wikimedia.org/T315994 (10achou) @Isaac Got it! For the moment I only aligned the event output with the existing ORES model. I think it's not super urgent to change the output for on-demand reques... [09:13:00] we could also use a process pool [09:13:01] https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools [09:13:32] (03CR) 10CI reject: [V: 04-1] editquality: run the score method in a separate thread [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844923 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:13:36] uff [09:13:54] (03PS2) 10Elukey: editquality: run the score method in a separate thread [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844923 (https://phabricator.wikimedia.org/T320374) [09:21:48] could we use the thread pool the kserve model server created? [09:23:40] aiko: yeah in theory we use the default executor set in https://github.com/kserve/kserve/blob/release-0.8/python/kserve/kserve/model_server.py#L129 IIUC [09:24:00] (03CR) 10AikoChou: [C: 03+1] articlequality: improve aiohttp client session handling (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844916 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:27:12] the main issue is that the Python's GIL will prevent the thread to run in parallel with the asyncio loop [09:29:14] so yeah as said in https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools I think having a process pool is probably better (separated from the kserve's default loop) [09:29:59] or we could stop the kserve default executor (the thread pool with max_asyncio_threads etc..) [09:30:15] create a process pool instead, and set it as default [09:30:20] I'm reading https://www.reddit.com/r/Python/comments/8kzxm5/gil_and_asyncio/ [09:37:21] (03CR) 10Elukey: [C: 03+2] articlequality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844916 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:37:28] (03CR) 10Elukey: [C: 03+2] draftquality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844917 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:37:31] (03CR) 10Elukey: [C: 03+2] topic: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844919 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:38:55] (03Merged) 10jenkins-bot: articlequality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844916 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:42:48] (03Merged) 10jenkins-bot: draftquality: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844917 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:42:53] (03Merged) 10jenkins-bot: topic: improve aiohttp client session handling [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844919 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:43:02] new docker images coming :) [09:44:13] IIUC, the GIL doesn’t affect the ability to have multiple IO operations concurrently. It affects CPU-bound tasks. [09:46:38] aiko: the GIL affects also IO operations if they are blocking, for example [09:46:52] only one thread can run basically [09:47:34] if you use non-blocking IO, then concurrency between threads increases but you don't have them running in parallel [09:47:38] at least this is my undestanding [09:47:58] if you have multi-processes then there is one GIL each, and they can run in parallel [09:48:20] (but my memories about GIT could not be exact, lemme know if the above is not right) [09:50:28] my point is that in our case, if the model.score code runs, the running thread will stop the rest until it completes [09:51:12] but are we still using a blocking function? even we've changed it to async? [09:53:12] yeah model.score is not async [09:53:22] it is like the time.sleep example [09:53:34] (https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools) [09:54:37] sorry I said something incorrect above - the GIL is released when IO is performed, so multiple threads waiting on read() (to receive a HTTP response for example from a socker) can wait in parallel [09:54:46] with asyncio we do it with a single thread, more efficient [09:54:55] but the blocking code issue remains [09:56:05] so now I see the usage of the kserve's thread pool - we could have used the mwapi "blocking" Session in the thread pool [09:56:06] Morning! [09:56:27] o/ [09:56:44] elukey: your memories of the GIL are correct, as far as I am aware. There are some special cases like pypy3 and the like, but base Python3 is still not good at threading [09:57:24] multi-processing is fine, but of course has other costs (forking isn't as cheap as creating a thread, and sharing data structures becomes much harder) [09:57:33] but model.score is not an IO task, I think it won't take a lot of time to complete and block other tasks. [09:58:50] aiko: but if it takes say X ms to run, the rest will have to wait, including the asyncio/tornado loop [09:59:17] if we offload it to a process, then it will run in parallel with the asyncio loop [10:00:36] klausman: in theory with a process pool the cost of forking etc.. should be minimal [10:01:23] Yes, but has the added complexity of process communication. Still might be the better choice, since MP comms are usually easier to get right than MT access to data/memory [10:02:21] elukey: ok makes sense. when the number of requests increases to some extent, it'll still affect the performance [10:02:28] _personally,_ I have always preferred MP approaches in Python, as my brain is not suited to MT in that language, and my projects worked fine with the MP overhead (COW forking is pretty cheap) [10:02:28] aiko: exactly [10:02:54] klausman: also https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools seems to suggest that a MP process pool returns a future, and we can await on it [10:03:06] so the complexity is good [10:03:36] (COW forking with Python is also weird IIRC due to reference counting etc.. that messes up things) [10:04:33] Using the multiprocessing library, it is relatively easy _if_ you never pass back data from the forked process and just ahve it exit. But I am sure the asyncio lib has proper support for pools, as indicated by that page. [10:05:11] If using processes instead of threads can be done relaticely easyily, I think we should give it a go. [10:05:43] the remaining bit to figure out is how it plays out with k8s cpu/memory constraints [10:05:47] but we can tune it [10:06:25] I'll try to work on it this afternoon [10:06:46] I'd expect the memory hit to be negible. Our main memory footprint is probably the runtime and code, and those are shared between forked processes [10:10:41] elukey: I can test it in ml-sandbox first [10:16:02] aiko: sure! [10:16:13] going afk for lunch, ttl! [10:25:37] oh. right. eating. I should do that as well :) [12:12:41] (03CR) 10Klausman: [C: 03+1] editquality: run the score method in a separate thread [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844923 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [12:44:02] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Align ORES prediction output with Lift Wing's one (for revscoring models) - https://phabricator.wikimedia.org/T318932 (10achou) The output change has been applied to all other revscoring model servers. Going to mark this as RESOLVED. Feel free to reopen if... [12:44:35] 10Machine-Learning-Team: Migrate ORES clients to LiftWing - https://phabricator.wikimedia.org/T312518 (10achou) [12:44:43] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Align ORES prediction output with Lift Wing's one (for revscoring models) - https://phabricator.wikimedia.org/T318932 (10achou) 05Open→03Resolved [12:50:42] aiko: checked with time.time() how much it takes to score a rev-id with editquality goodfaith, and indeed it is tiny [12:50:56] Score time: 0.002106189727783203 seconds [12:56:10] in local testing, time seems dominated by the api calls [12:56:21] elukey: yeah I remember the number is tiny [12:56:26] I expected worse timings from the models [12:59:02] (03PS3) 10Elukey: editquality: run the score method in a separate process [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844923 (https://phabricator.wikimedia.org/T320374) [12:59:23] the above works on my local docker env --^ [13:00:01] (03CR) 10CI reject: [V: 04-1] editquality: run the score method in a separate process [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844923 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [13:01:31] (03PS4) 10Elukey: editquality: run the score method in a separate process [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844923 (https://phabricator.wikimedia.org/T320374) [14:00:06] aiko: ah wow no the score time changes a lot if I run wrk (so with a load test) [14:00:11] I can see latencies up to a second [14:00:16] even more [14:00:47] really! [14:01:07] yeah very weird! [14:01:15] going afk for some errand folks! bbiab [14:01:31] elukey: Are you happy for me to do a rolling-reboot of the dse-k8s worker nodes? [14:58:32] elukey: o/ I ran your code of process pool in ml-sandbox, but seems the latency didn't change much. I tried allocating 2/4 cpus & 2Gi memory to the pod, but looks no difference 🤔 [15:07:13] btullis: I know you didn't ask me, but I'm fine with that :) [15:07:56] klausman: Sincere apologies for the omission :-) [15:08:35] I'll run the cookbook to roll-reboot the workers now. [15:14:19] yep yep anytime btullis! [15:14:34] aiko: yeah weird [15:40:49] aiko: I think this is the issue https://github.com/kserve/kserve/blob/release-0.8/python/kserve/kserve/model_server.py#L132-L134 [15:43:19] maybe [15:45:09] so we can pass a specific exec [15:45:23] err a specific pool to the run_in_executor [15:45:40] I am trying it but it doesn't seem to affect performances [15:47:09] (03PS5) 10Elukey: editquality: run the score method in a separate process [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/844923 (https://phabricator.wikimedia.org/T320374) [15:47:12] basically like --^ [15:57:37] with docker top $container-id I do see the processes though [15:58:09] in my case the time is dominated by the http calls though [16:01:05] yeah and with the thread pool executor I only see one process [16:11:10] ok will restart tomorrow morning :) [16:11:14] have a nice rest of the day folks! [16:32:19] klausman: elukey: FYI the cookbook didn't work, so I haven't rebooted the workers yet. Will try to fix it tomorrow and re-run. [16:32:35] ack. What was the failure? [16:36:45] It couldn't find the config file for kubectl, because naming things is hard: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/cookbooks/+/refs/heads/master/cookbooks/sre/k8s/reboot-nodes.py#111 [16:37:14] aaah [16:37:27] Well, that should be an easy enough fix. If mildly annoying [16:39:23] Yeah, hopefully. Nice Friday fix :-)