[07:52:35] <wikibugs>	 10Lift-Wing, 10Machine-Learning-Team: Explore ingress filtering for Lift Wing - https://phabricator.wikimedia.org/T300259 (10elukey) Deployed rate limits and new istio configs for showing up their metrics. The only downside is that the new metrics require a restart of all the pods, so for the moment let's wait...
[08:05:58] <elukey>	 I am reading https://tenthousandmeters.com/blog/python-behind-the-scenes-13-the-gil-and-its-effects-on-python-multithreading/, really well done
[08:06:03] <elukey>	 (good morning :)
[08:06:32] <elukey>	 we should probably start a collection of useful links to read in the kserve docs, to explain the gotchas about asyncio etc..
[08:10:16] <isaranto>	 sure that would be great. what would be the best place? a document? a markdown file in the inference services repo?
[08:29:31] <elukey>	 sorry I was afk :)
[08:30:19] <elukey>	 I was thinking to add it in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/KServe#KServe_architecture
[08:31:27] <elukey>	 but we should probably open a task to do a good write up, so people know what we aim for
[08:39:25] <elukey>	 (I am adding a few words in there)
[08:44:46] <elukey>	 done, I added something
[08:45:05] <elukey>	 Going to open a task to have a better set of examples etc.. (it will require a bit of time)
[08:47:06] <wikibugs>	 10Machine-Learning-Team: Add guidelines about how to write/use asyncio code in KServe - https://phabricator.wikimedia.org/T324313 (10elukey)
[08:47:10] <elukey>	 there --^
[09:05:41] <elukey>	 I have also created https://github.com/kserve/kserve/discussions/2578
[09:06:05] <elukey>	 to ask what others do with asyncio and cpu-bound code
[09:06:13] <elukey>	 maybe there is something that we haven't tried yet
[09:07:16] <wikibugs>	 10Machine-Learning-Team: Add guidelines about how to write/use asyncio code in KServe - https://phabricator.wikimedia.org/T324313 (10elukey) Opened https://github.com/kserve/kserve/discussions/2578 to ask upstream what are the best practices to use when running cpu-bound code on asyncio.
[09:08:42] <isaranto>	 👍 great approach!
[10:07:35] <elukey>	 isaranto: how is it going with the templates? If you want I can help :)
[10:10:57] <isaranto>	 elukey: I got lost in the helm templating world 😅 help much appreciated :D 
[10:11:20] <isaranto>	 i was also reading about asyncio previously
[10:11:21] <elukey>	 isaranto: yeah I can imagine, and that one is very dense
[10:11:39] <elukey>	 I have an idea, if you want I can send a code review and we can discuss it
[10:13:30] <isaranto>	 sure, otherwise I am also available for a short call if u want. I was trying to render the template by adding a section inside the custom_predictor config and accessing it with something like `$custom_predictor.resources`
[10:13:38] <isaranto>	 but with no luck until now
[10:18:52] <elukey>	 lemme check a couple of things
[10:21:00] <isaranto>	 ok! ping me if u want to pair-program
[10:31:12] <elukey>	 isaranto: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/863247/ this is my idea
[10:35:48] <elukey>	 just also added a better unit test in the fixture
[10:37:45] <elukey>	 from CI it seems working as we'd expect
[10:40:11] <isaranto>	 i rendered it locally, it works like a charm ! I also tried adding resources in the generic predictor and it overrides the custom ones. great job!
[10:42:14] <elukey>	 isaranto: does it make sense how it is done? Any questions? We can go over it if you want
[10:43:12] <aiko>	 morning!
[10:43:30] <elukey>	 hello Aiko :)
[10:43:47] <aiko>	 elukey: thanks for opening a task for guidelines about asyncio. That's nice!
[10:44:32] <isaranto>	 hey Aiko o/
[10:47:43] <isaranto>	 elukey: the changes are pretty clear. I was failing with indentation and trying to just add custom predictor config. I have 2 questions though:
[10:47:43] <isaranto>	 1. we have to manually bump the chart versions each time we make a change, right?
[10:47:43] <isaranto>	 2. under the fixtures we have the values for the unit tests ? how does it work with CI
[10:48:47] <elukey>	 1. yes correct
[10:49:06] <elukey>	 2. basically CI will do something like the following for each fixture
[10:49:24] <elukey>	 helm3 template -f 'charts/kserve-inference/.fixtures/inference_no_transformer.yaml' charts/kserve-inference/
[10:49:36] <elukey>	 so it is only a "visual" unit test, nothing breaks if it changes
[10:49:43] <elukey>	 you'll see it in the CI's diff
[10:51:19] <elukey>	 isaranto: merged, you can deploy it if you want :)
[10:51:37] <isaranto>	 thanks!
[11:13:43] <elukey>	 all good? Does it work?
[11:17:08] <isaranto>	 there is an issue with the integer (`ASYNCIO_AUX_WORKERS`) it needs to be in quotes. we can either do:
[11:17:08] <isaranto>	 ```
[11:17:08] <isaranto>	         - name: ASYNCIO_AUX_WORKERS
[11:17:08] <isaranto>	           value: "5"
[11:17:08] <isaranto>	 ```
[11:17:08] <isaranto>	 or add the quotes in  the templates like this (line58) - but then all varaibles will be wrapped around quotes. 
[11:17:08] <isaranto>	 ```
[11:17:09] <isaranto>	 value: {{ .value | quote }}
[11:17:09] <isaranto>	 ```
[11:17:10] <isaranto>	 I would suggest the first one so that we allow native yaml types. wdyt?
[11:17:55] <elukey>	 +1 yes
[11:18:07] <isaranto>	 I figured it out from this error when I try to sync
[11:18:07] <isaranto>	 `....v1beta1.InferenceService.Spec: v1beta1.InferenceServiceSpec.Predictor: v1beta1.PredictorSpec.PodSpec: Containers: []v1.Container: v1.Container.Env: []v1.EnvVar: v1.EnvVar.Value: ReadString: expects " or n, but found t, error found in #10 byte of ...|,"value":true},{...`
[11:18:23] <isaranto>	 ok I'm sending a patch then
[11:19:14] <elukey>	 sigh
[11:20:46] <isaranto>	 a fix for the fix of the fix
[11:20:52] <isaranto>	 lol
[11:21:29] <elukey>	 yaml engineers
[11:21:30] <elukey>	 :D
[11:24:01] <isaranto>	 the future is bright
[11:24:06] <isaranto>	 toml is here :D
[11:26:02] <isaranto>	 i'm kiddin
[11:26:39] <elukey>	 :D I figured :D
[11:27:29] <isaranto>	 btw i was checking the code and luckily u do cast the  `asyncio_aux_workers` var to int  `asyncio_aux_workers = int(asyncio_aux_workers)`. I'll change its type in the fucntion declaration in another patch so that it reflects reality
[11:28:08] <elukey>	 ah nice thanks!
[11:32:04] <elukey>	 going afk for lunch!
[11:42:21] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: asyncio: fix type hint for asyncio_aux_workers var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624)
[11:54:38] * klausman lunch
[12:22:54] * isaranto afk, lunch
[14:14:55] <wikibugs>	 (03CR) 10Elukey: "Ilias I am wondering if it is more clear to keep the type hint to "int" in here and modify the rest to use an int, like:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624) (owner: 10Ilias Sarantopoulos)
[14:15:00] <isaranto>	 elukey: enclosing the int in quotes doesnt render the quotes and I had to go with the second option and explicitly define it in the templates
[14:16:54] <elukey>	 isaranto: ack makes sense :(
[14:17:02] <elukey>	 I always confuse those corner cases
[14:17:06] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: asyncio: fix type hint for asyncio_aux_workers var (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624) (owner: 10Ilias Sarantopoulos)
[14:17:40] <elukey>	 I just found https://www.python-httpx.org/, looks nice, let's keep it imind
[14:17:44] <elukey>	 cc: aiko: --^
[14:21:09] <wikibugs>	 (03PS2) 10Ilias Sarantopoulos: asyncio: cast asyncio_aux_workers env var to int on read [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624)
[14:22:19] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: asyncio: cast asyncio_aux_workers env var to int on read (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624) (owner: 10Ilias Sarantopoulos)
[14:24:22] <chrisalbon>	 morning all
[14:24:44] <elukey>	 morning!
[14:27:33] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] asyncio: cast asyncio_aux_workers env var to int on read [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/863259 (https://phabricator.wikimedia.org/T323624) (owner: 10Ilias Sarantopoulos)
[14:27:43] <isaranto>	 morning! ☀️
[14:29:05] <isaranto>	 elukey: I mised that you had merged it . I will submit another patch 
[14:29:41] <elukey>	 isaranto: yeah sorry :(
[14:30:11] <isaranto>	 my bad. I didn't check the jenkins output on time
[14:32:52] <wikibugs>	 10Lift-Wing, 10Machine-Learning-Team: Test batch prediction for revert-risk model - https://phabricator.wikimedia.org/T323023 (10elukey) This is really a mystery - we are trying to remove as many variable as possible to find the simplest repro use case, but for the moment we have only managed to do it via PySp...
[14:33:00] <elukey>	 I should have checked as well :)
[14:33:02] <elukey>	 Fridays :)
[14:33:44] <isaranto>	 no it is MY Bad :P
[14:33:49] <isaranto>	 thansk for being nice anyway
[14:35:01] <aiko>	 elukey: o/ cool I'll have a look!
[14:36:06] <elukey>	 aiko: if you have time, could you show me how to start a batch prediction job via Spark to repro the missing responses?
[14:36:24] <elukey>	 I am doing a couple of tests and I'd like to run a few spark jobs, but I don't want to ping you every time :)
[14:36:38] <elukey>	 (otherwise you'll stop answering me :D)
[14:44:12] <aiko>	 elukey: sure! I can share the jupyter notebook I use, or next Monday we can have a chat
[14:44:27] <elukey>	 definitely yes
[14:44:33] <elukey>	 the notebook is fine :)
[14:47:48] <elukey>	 I am trying to use wrk to repro the use case, but I can't
[14:47:52] <elukey>	 it is so weird
[15:09:50] <elukey>	 isaranto: ready to go!
[15:10:15] <isaranto>	 🤞
[15:22:02] <isaranto>	 elukey: finally it worked. resources and env vars all set correctly. thanks!
[15:24:51] <elukey>	 super! nice work
[15:40:48] <wikibugs>	 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10elukey) @isarantopoulos made some high level calculations to understand how many cores we can use for each pod with MP enabled. Lemme know if it makes sense :)  We have 8 worke...
[15:49:41] <wikibugs>	 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10elukey) Dumped the current state of ml-serve-codfw (from `kubect describe nodes`):  `   Resource           Requests     Limits   cpu                350m (17%)   0 (0%)   memory...
[15:50:10] <elukey>	 isaranto: added some thoughts to the task, lemme know (even next weeks) your thoughts
[15:54:19] <isaranto>	 cool, I think they are valid more or less. I agree that we can start small and experiment as u note
[15:56:23] <elukey>	 more or less ? :D Is it a nice way that it is all trash? LD
[15:56:41] <elukey>	 kidding, I agree that those calculations are really high level, but at least we have an idea
[15:56:49] <elukey>	 5 cores as limit may be a little too much
[16:11:24] <isaranto>	 ack
[16:29:55] <wikibugs>	 10Machine-Learning-Team, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Import new knative-serving version (k8s 1.23 dependency for ML) - https://phabricator.wikimedia.org/T323793 (10elukey) The auto-tls bits shouldn't concern us, since we use `helmfile_namespace...
[16:30:10] <wikibugs>	 10Machine-Learning-Team, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Import new knative-serving version (k8s 1.23 dependency for ML) - https://phabricator.wikimedia.org/T323793 (10elukey)
[16:33:20] <wikibugs>	 10Machine-Learning-Team, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Import new knative-serving version (k8s 1.23 dependency for ML) - https://phabricator.wikimedia.org/T323793 (10elukey) Current status:  * New images https://gerrit.wikimedia.org/r/861349 * Ne...
[16:42:58] <isaranto>	 With the first run of tests I am running with wrk I am getting bad results and many timeouts..
[16:44:36] <elukey>	 a specific model server or all?
[16:45:20] <isaranto>	 mostly all. I will start pasting some results on phabricator. I have gathered all of them on a notion page and will transfer them there.
[16:45:51] <isaranto>	 in general I was thinking of creating some plots as an analysis that we can reproduce in most cases. what do u think?
[16:46:40] <elukey>	 sure makes sense
[16:47:02] <elukey>	 it also depends on how many connections/threads you are using to hit those models 
[16:47:29] <elukey>	 (trying to get a grasp of the "bad results" in a context)
[16:47:41] <isaranto>	 lemme paste some results
[16:50:48] <elukey>	 also is it same rev-id or multiple ones?
[16:56:35] <wikibugs>	 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) **editquality-goodfaith** **With MP** ` isaranto@deploy1002:~/scripts$ wrk -c 1 -t 1 --timeout 2s -s inference-goodfaith.lua https://inference-staging.svc.codfw...
[16:58:00] <isaranto>	 same rev-id. So for editquality-goodfaith they are good but for the rest of the models I get many timeouts. I will investigate the logs a bit more
[16:59:30] <elukey>	 isaranto: so same rev-id is misleading, since if you choose one that is served fast even without mp works fine
[16:59:42] <wikibugs>	 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) **article-quality  with MP**  ` isaranto@deploy1002:~/scripts$ wrk -c 1 -t 1 --timeout 2s -s inference-articlequality.lua https://inference-staging.svc.codfw.wm...
[16:59:44] <elukey>	 the main issue is when we have different ones, with different latencies
[16:59:53] <elukey>	 (like when we test with benthos)
[16:59:58] <isaranto>	 ok I will run the tests I ran with benthos
[17:01:18] <elukey>	 isaranto: I think that we should try the approach with a different set of rev-ids that we discussed during the team meeting
[17:01:28] <elukey>	 we can chat about it on monday :)
[17:01:44] <elukey>	 same rev-id tests are also good, I'd like to understand more the articlequality results
[17:02:17] <isaranto>	 great! I'd like the teams's input on how to get representative red-ids and from where
[17:02:18] <elukey>	 there is something off, even with the overhead of MP I don't expect +600ms of latency
[17:02:32] <elukey>	 we can probably either create them randomly
[17:02:39] <elukey>	 or we can get them from kafka
[17:03:28] <elukey>	 going afk now, have a good weekend folks!
[17:03:46] <klausman>	 \o
[17:04:02] <wikibugs>	 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) # drafttopic  ## **Non MP**  `bash isaranto@deploy1002:~/scripts$ wrk -c 1 -t 1 --timeout 2s -s inference-drafttopic.lua https://inference-staging.svc.codfw.wmn...
[17:04:29] <isaranto>	 ciao, have a nice weekend!
[18:13:23] <wikibugs>	 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) ## Editquality with Benthos  #### SP - Single process ` Total duration:0 days 00:05:00, Total No of requests: 641  50.0% 368.48ms  75.0% 550.88ms  90.0% 1002.84...