[07:52:35] 10Lift-Wing, 10Machine-Learning-Team: Explore ingress filtering for Lift Wing - https://phabricator.wikimedia.org/T300259 (10elukey) Deployed rate limits and new istio configs for showing up their metrics. The only downside is that the new metrics require a restart of all the pods, so for the moment let's wait... [08:05:58] I am reading https://tenthousandmeters.com/blog/python-behind-the-scenes-13-the-gil-and-its-effects-on-python-multithreading/, really well done [08:06:03] (good morning :) [08:06:32] we should probably start a collection of useful links to read in the kserve docs, to explain the gotchas about asyncio etc.. [08:10:16] sure that would be great. what would be the best place? a document? a markdown file in the inference services repo? [08:29:31] sorry I was afk :) [08:30:19] I was thinking to add it in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/KServe#KServe_architecture [08:31:27] but we should probably open a task to do a good write up, so people know what we aim for [08:39:25] (I am adding a few words in there) [08:44:46] done, I added something [08:45:05] Going to open a task to have a better set of examples etc.. (it will require a bit of time) [08:47:06] 10Machine-Learning-Team: Add guidelines about how to write/use asyncio code in KServe - https://phabricator.wikimedia.org/T324313 (10elukey) [08:47:10] there --^ [09:05:41] I have also created https://github.com/kserve/kserve/discussions/2578 [09:06:05] to ask what others do with asyncio and cpu-bound code [09:06:13] maybe there is something that we haven't tried yet [09:07:16] 10Machine-Learning-Team: Add guidelines about how to write/use asyncio code in KServe - https://phabricator.wikimedia.org/T324313 (10elukey) Opened https://github.com/kserve/kserve/discussions/2578 to ask upstream what are the best practices to use when running cpu-bound code on asyncio. [09:08:42] 👍 great approach! [10:07:35] isaranto: how is it going with the templates? If you want I can help :) [10:10:57] elukey: I got lost in the helm templating world 😅 help much appreciated :D [10:11:20] i was also reading about asyncio previously [10:11:21] isaranto: yeah I can imagine, and that one is very dense [10:11:39] I have an idea, if you want I can send a code review and we can discuss it [10:13:30] sure, otherwise I am also available for a short call if u want. I was trying to render the template by adding a section inside the custom_predictor config and accessing it with something like `$custom_predictor.resources` [10:13:38] but with no luck until now [10:18:52] lemme check a couple of things [10:21:00] ok! ping me if u want to pair-program [10:31:12] isaranto: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/863247/ this is my idea [10:35:48] just also added a better unit test in the fixture [10:37:45] from CI it seems working as we'd expect [10:40:11] i rendered it locally, it works like a charm ! I also tried adding resources in the generic predictor and it overrides the custom ones. great job! [10:42:14] isaranto: does it make sense how it is done? Any questions? We can go over it if you want [10:43:12] morning! [10:43:30] hello Aiko :) [10:43:47] elukey: thanks for opening a task for guidelines about asyncio. That's nice! [10:44:32] hey Aiko o/ [10:47:43] elukey: the changes are pretty clear. I was failing with indentation and trying to just add custom predictor config. I have 2 questions though: [10:47:43] 1. we have to manually bump the chart versions each time we make a change, right? [10:47:43] 2. under the fixtures we have the values for the unit tests ? how does it work with CI [10:48:47] 1. yes correct [10:49:06] 2. basically CI will do something like the following for each fixture [10:49:24] helm3 template -f 'charts/kserve-inference/.fixtures/inference_no_transformer.yaml' charts/kserve-inference/ [10:49:36] so it is only a "visual" unit test, nothing breaks if it changes [10:49:43] you'll see it in the CI's diff [10:51:19] isaranto: merged, you can deploy it if you want :) [10:51:37] thanks! [11:13:43] all good? Does it work? [11:17:08] there is an issue with the integer (`ASYNCIO_AUX_WORKERS`) it needs to be in quotes. we can either do: [11:17:08] ``` [11:17:08] - name: ASYNCIO_AUX_WORKERS [11:17:08] value: "5" [11:17:08] ``` [11:17:08] or add the quotes in the templates like this (line58) - but then all varaibles will be wrapped around quotes. [11:17:08] ``` [11:17:09] value: {{ .value | quote }} [11:17:09] ``` [11:17:10] I would suggest the first one so that we allow native yaml types. wdyt? [11:17:55] +1 yes [11:18:07] I figured it out from this error when I try to sync [11:18:07] `....v1beta1.InferenceService.Spec: v1beta1.InferenceServiceSpec.Predictor: v1beta1.PredictorSpec.PodSpec: Containers: []v1.Container: v1.Container.Env: []v1.EnvVar: v1.EnvVar.Value: ReadString: expects " or n, but found t, error found in #10 byte of ...|,"value":true},{...` [11:18:23] ok I'm sending a patch then [11:19:14] sigh [11:20:46] a fix for the fix of the fix [11:20:52] lol [11:21:29] yaml engineers [11:21:30] :D [11:24:01] the future is bright [11:24:06] toml is here :D [11:26:02] i'm kiddin [11:26:39] :D I figured :D [11:27:29] btw i was checking the code and luckily u do cast the `asyncio_aux_workers` var to int `asyncio_aux_workers = int(asyncio_aux_workers)`. I'll change its type in the fucntion declaration in another patch so that it reflects reality [11:28:08] ah nice thanks! [11:32:04] going afk for lunch! [11:42:21] (03PS1) 10Ilias Sarantopoulos: asyncio: fix type hint for asyncio_aux_workers var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624) [11:54:38] * klausman lunch [12:22:54] * isaranto afk, lunch [14:14:55] (03CR) 10Elukey: "Ilias I am wondering if it is more clear to keep the type hint to "int" in here and modify the rest to use an int, like:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624) (owner: 10Ilias Sarantopoulos) [14:15:00] elukey: enclosing the int in quotes doesnt render the quotes and I had to go with the second option and explicitly define it in the templates [14:16:54] isaranto: ack makes sense :( [14:17:02] I always confuse those corner cases [14:17:06] (03CR) 10Ilias Sarantopoulos: asyncio: fix type hint for asyncio_aux_workers var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624) (owner: 10Ilias Sarantopoulos) [14:17:40] I just found https://www.python-httpx.org/, looks nice, let's keep it imind [14:17:44] cc: aiko: --^ [14:21:09] (03PS2) 10Ilias Sarantopoulos: asyncio: cast asyncio_aux_workers env var to int on read [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624) [14:22:19] (03CR) 10Ilias Sarantopoulos: asyncio: cast asyncio_aux_workers env var to int on read (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624) (owner: 10Ilias Sarantopoulos) [14:24:22] morning all [14:24:44] morning! [14:27:33] (03CR) 10Elukey: [C: 03+1] asyncio: cast asyncio_aux_workers env var to int on read [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624) (owner: 10Ilias Sarantopoulos) [14:27:43] morning! ☀️ [14:29:05] elukey: I mised that you had merged it . I will submit another patch [14:29:41] isaranto: yeah sorry :( [14:30:11] my bad. I didn't check the jenkins output on time [14:32:52] 10Lift-Wing, 10Machine-Learning-Team: Test batch prediction for revert-risk model - https://phabricator.wikimedia.org/T323023 (10elukey) This is really a mystery - we are trying to remove as many variable as possible to find the simplest repro use case, but for the moment we have only managed to do it via PySp... [14:33:00] I should have checked as well :) [14:33:02] Fridays :) [14:33:44] no it is MY Bad :P [14:33:49] thansk for being nice anyway [14:35:01] elukey: o/ cool I'll have a look! [14:36:06] aiko: if you have time, could you show me how to start a batch prediction job via Spark to repro the missing responses? [14:36:24] I am doing a couple of tests and I'd like to run a few spark jobs, but I don't want to ping you every time :) [14:36:38] (otherwise you'll stop answering me :D) [14:44:12] elukey: sure! I can share the jupyter notebook I use, or next Monday we can have a chat [14:44:27] definitely yes [14:44:33] the notebook is fine :) [14:47:48] I am trying to use wrk to repro the use case, but I can't [14:47:52] it is so weird [15:09:50] isaranto: ready to go! [15:10:15] 🤞 [15:22:02] elukey: finally it worked. resources and env vars all set correctly. thanks! [15:24:51] super! nice work [15:40:48] 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10elukey) @isarantopoulos made some high level calculations to understand how many cores we can use for each pod with MP enabled. Lemme know if it makes sense :) We have 8 worke... [15:49:41] 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10elukey) Dumped the current state of ml-serve-codfw (from `kubect describe nodes`): ` Resource Requests Limits cpu 350m (17%) 0 (0%) memory... [15:50:10] isaranto: added some thoughts to the task, lemme know (even next weeks) your thoughts [15:54:19] cool, I think they are valid more or less. I agree that we can start small and experiment as u note [15:56:23] more or less ? :D Is it a nice way that it is all trash? LD [15:56:41] kidding, I agree that those calculations are really high level, but at least we have an idea [15:56:49] 5 cores as limit may be a little too much [16:11:24] ack [16:29:55] 10Machine-Learning-Team, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Import new knative-serving version (k8s 1.23 dependency for ML) - https://phabricator.wikimedia.org/T323793 (10elukey) The auto-tls bits shouldn't concern us, since we use `helmfile_namespace... [16:30:10] 10Machine-Learning-Team, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Import new knative-serving version (k8s 1.23 dependency for ML) - https://phabricator.wikimedia.org/T323793 (10elukey) [16:33:20] 10Machine-Learning-Team, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Import new knative-serving version (k8s 1.23 dependency for ML) - https://phabricator.wikimedia.org/T323793 (10elukey) Current status: * New images https://gerrit.wikimedia.org/r/861349 * Ne... [16:42:58] With the first run of tests I am running with wrk I am getting bad results and many timeouts.. [16:44:36] a specific model server or all? [16:45:20] mostly all. I will start pasting some results on phabricator. I have gathered all of them on a notion page and will transfer them there. [16:45:51] in general I was thinking of creating some plots as an analysis that we can reproduce in most cases. what do u think? [16:46:40] sure makes sense [16:47:02] it also depends on how many connections/threads you are using to hit those models [16:47:29] (trying to get a grasp of the "bad results" in a context) [16:47:41] lemme paste some results [16:50:48] also is it same rev-id or multiple ones? [16:56:35] 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) **editquality-goodfaith** **With MP** ` isaranto@deploy1002:~/scripts$ wrk -c 1 -t 1 --timeout 2s -s inference-goodfaith.lua https://inference-staging.svc.codfw... [16:58:00] same rev-id. So for editquality-goodfaith they are good but for the rest of the models I get many timeouts. I will investigate the logs a bit more [16:59:30] isaranto: so same rev-id is misleading, since if you choose one that is served fast even without mp works fine [16:59:42] 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) **article-quality with MP** ` isaranto@deploy1002:~/scripts$ wrk -c 1 -t 1 --timeout 2s -s inference-articlequality.lua https://inference-staging.svc.codfw.wm... [16:59:44] the main issue is when we have different ones, with different latencies [16:59:53] (like when we test with benthos) [16:59:58] ok I will run the tests I ran with benthos [17:01:18] isaranto: I think that we should try the approach with a different set of rev-ids that we discussed during the team meeting [17:01:28] we can chat about it on monday :) [17:01:44] same rev-id tests are also good, I'd like to understand more the articlequality results [17:02:17] great! I'd like the teams's input on how to get representative red-ids and from where [17:02:18] there is something off, even with the overhead of MP I don't expect +600ms of latency [17:02:32] we can probably either create them randomly [17:02:39] or we can get them from kafka [17:03:28] going afk now, have a good weekend folks! [17:03:46] \o [17:04:02] 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) # drafttopic ## **Non MP** `bash isaranto@deploy1002:~/scripts$ wrk -c 1 -t 1 --timeout 2s -s inference-drafttopic.lua https://inference-staging.svc.codfw.wmn... [17:04:29] ciao, have a nice weekend! [18:13:23] 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) ## Editquality with Benthos #### SP - Single process ` Total duration:0 days 00:05:00, Total No of requests: 641 50.0% 368.48ms 75.0% 550.88ms 90.0% 1002.84...