[01:17:01] 10Machine-Learning-Team, 10Project-Admins: Archive #Test-Grounds Phabricator project? - https://phabricator.wikimedia.org/T353764 (10Aklapper) [07:40:37] Good morning! [07:47:22] 10Machine-Learning-Team, 10Research: Allow to set Catboost's threads in readability-liftwing - https://phabricator.wikimedia.org/T353461 (10elukey) @Miriam Hi! Yes we'd need some changes in https://gitlab.wikimedia.org/trokhymovych/readability-liftwing, basically to allow us to tune threads from the model-serv... [07:55:07] * isaranto bbiab [08:02:35] 10Machine-Learning-Team, 10Research: Allow to set Catboost's threads in readability-liftwing - https://phabricator.wikimedia.org/T353461 (10MGerlach) >>! In T353461#9417689, @elukey wrote: > @Miriam Hi! Yes we'd need some changes in https://gitlab.wikimedia.org/trokhymovych/readability-liftwing, basically to a... [08:29:42] I can sent an MR for the above --^ [08:43:37] isaranto: o/ [08:43:37] following your recommendation in yesterday's meeting, please see the results of setting `low_cpu_mem_usage` to false by default: https://phabricator.wikimedia.org/P54500 [08:43:37] this configuration requires 10GB memory but still the response time remains similar to when `low_cpu_mem_usage` was true and running on 4GB memory as shown in: https://phabricator.wikimedia.org/T353127#9414797 [08:43:37] we benefit more from having `low_cpu_mem_usage=True` by default [08:47:09] kevinbazira: o/ ok that's good to hear, otherwise we would need double of memory requirements for all our models [08:47:40] I think that the +1s latency can be justified to some extent from what Isaac wrote about an older version of the model [08:50:10] yep. just doing final experiments with less aggressive quantization that chrisalbon suggested in the meeting. if they don't yield performance gains then we might have to work with the +1s latency. [08:50:39] that sounds good! [09:19:11] 10Machine-Learning-Team, 10Research: Allow to set Catboost's threads in readability-liftwing - https://phabricator.wikimedia.org/T353461 (10elukey) >>! In T353461#9417696, @MGerlach wrote: >>>! In T353461#9417689, @elukey wrote: >> @Miriam Hi! Yes we'd need some changes in https://gitlab.wikimedia.org/trokhymo... [09:36:59] morning! [09:40:42] o/ [09:40:49] I am doing some tests with Istio in staging folks [09:40:58] if you notice anything weird lemme know [09:41:05] (or if I am impacting your tests etc..) [09:41:40] ack! [09:44:55] hey! [09:45:44] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade Revert Risk Multilingual docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347551 (10CodeReviewBot) isaranto opened https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/merge_requests/31 feat: allow to set number of threads in c... [09:47:32] * aiko afk for ~20m [09:48:02] I opened a patch for the thread thingy for revertrisk models https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/merge_requests/31 [09:51:04] nice :) [09:54:34] (03PS1) 10Elukey: revert-risk: strip "http://" too before setting the Host header [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984506 [09:55:14] (03CR) 10Elukey: "Not sure if there is a better option, lemme know :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984506 (owner: 10Elukey) [09:55:15] 10Machine-Learning-Team, 10Research, 10Patch-For-Review: Allow to set Catboost's threads in readability-liftwing - https://phabricator.wikimedia.org/T353461 (10CodeReviewBot) isaranto opened https://gitlab.wikimedia.org/trokhymovych/readability-liftwing/-/merge_requests/3 feat: allow to set number of thread... [10:35:35] (03PS2) 10Elukey: revert-risk: strip "http://" too before setting the Host header [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984506 [10:40:38] isaranto: nice work on the gitlab's PRs [10:40:48] they should unblock us if they got merged [10:47:57] (03CR) 10AikoChou: [C: 03+1] revert-risk: strip "http://" too before setting the Host header [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984506 (owner: 10Elukey) [10:48:07] thankssss :) [10:48:22] (03CR) 10Elukey: [C: 03+2] revert-risk: strip "http://" too before setting the Host header [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984506 (owner: 10Elukey) [10:48:47] rr-agnostic is currently broken in staging, I believe due to -^^ [10:49:05] I am getting a 400, so I think the Host header is not right [10:49:30] I have deployed the config to circumvent the http redirect issue [10:53:06] 10Machine-Learning-Team, 10Research: Allow to set Catboost's threads in readability-liftwing - https://phabricator.wikimedia.org/T353461 (10isarantopoulos) @MGerlach The above Merge Request should do the trick! [10:53:55] (03Merged) 10jenkins-bot: revert-risk: strip "http://" too before setting the Host header [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984506 (owner: 10Elukey) [10:53:59] isaranto, aiko - if my test work we'll have a longer term path to fix redirect issues etc.. but it is a long config change for us, so I would put in place a temporary solution to fix the issue [10:54:10] like the map change that Ilias filed [10:54:14] what do you think? [10:55:13] agree! [10:55:26] great Luca! yeah I agree with the map config fix. I was checking to change the mwapi async sessions but we'd have to change mwapi -> knowledge_integrity ->inf service [11:21:54] https://arxiv.org/pdf/2312.11514.pdf [11:22:14] impressive! [11:28:06] * isaranto afk lunch! [11:46:10] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/984512 I want to add another batcher for rr in staging for testing [11:57:43] Morning all [12:01:40] 10Machine-Learning-Team, 10Research: Allow to set Catboost's threads in readability-liftwing - https://phabricator.wikimedia.org/T353461 (10CodeReviewBot) trokhymovych merged https://gitlab.wikimedia.org/trokhymovych/readability-liftwing/-/merge_requests/3 feat: allow to set number of threads in catboost models [12:04:09] o/ Chris! [12:05:49] hi Chris o/ [12:07:43] 10Machine-Learning-Team, 10Research: Allow to set Catboost's threads in readability-liftwing - https://phabricator.wikimedia.org/T353461 (10Trokhymovych) @isarantopoulos Thank you! I have checked and merged your changes. [12:40:37] 10Machine-Learning-Team, 10Patch-For-Review: Increased latencies with Kserve 0.11.1 (cgroups v2) - https://phabricator.wikimedia.org/T349844 (10isarantopoulos) [12:40:42] 10Machine-Learning-Team, 10Research: Allow to set Catboost's threads in readability-liftwing - https://phabricator.wikimedia.org/T353461 (10isarantopoulos) 05Open→03Resolved a:03isarantopoulos [12:44:00] (03PS1) 10Ilias Sarantopoulos: readability: set number of threads for readability [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984524 (https://phabricator.wikimedia.org/T348664) [12:49:13] I'd like your thoughts on --^ I don't really like introducing one more env variable, but if we do it should better be the same for all services [13:21:36] 10Machine-Learning-Team: Optimize response performance for the article-descriptions model-server - https://phabricator.wikimedia.org/T353127 (10kevinbazira) @Isaac, thank you for letting us know about the Cloud VPS API using an older library version. We have been wondering why the LiftWing latency is not matchin... [13:28:39] (03CR) 10Elukey: [C: 03+1] readability: set number of threads for readability [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984524 (https://phabricator.wikimedia.org/T348664) (owner: 10Ilias Sarantopoulos) [13:28:45] LGTM! [13:29:02] the NUM_THREADS variable seems good as standard from now on [13:29:10] maybe it could be customized for each isvc [13:29:21] in this case, CATBOOST_NUM_THREADS etc.. [13:29:32] or we can keep a generic NUM_THREADS [13:29:48] but we should use the var only for testing, get_cpu_count() should be the default etc.. [13:33:32] I agree, it should just be optional. on the naming I'd prefer to have just one universal name everywhere. seems easier [14:08:33] I'm testing readability now and if it works fine I'll send a patch to update kserve [14:10:09] (03PS1) 10Ilias Sarantopoulos: readability: update kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984608 (https://phabricator.wikimedia.org/T348664) [14:10:28] opened it now as WIP [14:22:01] btw this pytroch inference optimization guide/checklist is nice https://pytorch.org/serve/performance_checklist.html [14:34:17] *pytorch [14:35:12] (03CR) 10AikoChou: [C: 03+1] readability: set number of threads for readability [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984524 (https://phabricator.wikimedia.org/T348664) (owner: 10Ilias Sarantopoulos) [14:46:45] 10Machine-Learning-Team: Goal: Inference Optimization for Hugging face/Pytorch models - https://phabricator.wikimedia.org/T353337 (10isarantopoulos) [14:49:38] TIL from python.resource_utils [14:56:50] 10Machine-Learning-Team: Goal: Inference Optimization for Hugging face/Pytorch models - https://phabricator.wikimedia.org/T353337 (10isarantopoulos) [15:11:22] 10Machine-Learning-Team: Goal: A document describing a plan for a training infrastructure - https://phabricator.wikimedia.org/T353814 (10calbon) [15:18:26] 10Machine-Learning-Team: Goal: A plan for a training infrastructure - https://phabricator.wikimedia.org/T353814 (10calbon) [15:22:24] 10Machine-Learning-Team: Goal: A plan for a training infrastructure - https://phabricator.wikimedia.org/T353814 (10calbon) [15:26:41] 10Machine-Learning-Team: Fix istio gateway's PodDisruptionBudgets for ml-serve - https://phabricator.wikimedia.org/T352400 (10elukey) a:05elukey→03None [15:28:03] 10Machine-Learning-Team, 10ORES: Feature injection does not appear to work in ores-legacy - https://phabricator.wikimedia.org/T347194 (10calbon) p:05Medium→03Triage [15:29:05] 10Machine-Learning-Team: Investigate prediction bug in article-descriptions model-server - https://phabricator.wikimedia.org/T352750 (10kevinbazira) 05Open→03Resolved [15:29:10] 10Machine-Learning-Team, 10Wikipedia-Android-App-Backlog (Android Release - FY2023-24): Migrate Machine-generated Article Descriptions from toolforge to liftwing. - https://phabricator.wikimedia.org/T343123 (10kevinbazira) [15:35:08] 10Machine-Learning-Team, 10Goal: Goal: Implement caching for revertrisk-multilingual - https://phabricator.wikimedia.org/T353333 (10calbon) [15:35:16] 10Machine-Learning-Team, 10Goal: Goal: Inference Optimization for Hugging face/Pytorch models - https://phabricator.wikimedia.org/T353337 (10calbon) [15:35:20] 10Machine-Learning-Team, 10Goal: Goal: Expand Lift Wing Cluster and add GPU capacity to production - https://phabricator.wikimedia.org/T353338 (10calbon) [15:35:24] 10Machine-Learning-Team, 10Goal: Goal: A plan for a training infrastructure - https://phabricator.wikimedia.org/T353814 (10calbon) [15:35:36] 10Machine-Learning-Team, 10Goal: Order 1 GPU for Lift Wing - https://phabricator.wikimedia.org/T341699 (10calbon) p:05Medium→03Lowest [15:35:40] 10Machine-Learning-Team, 10Goal: Goal: Users can query a large language model using the API Gateway and receive a response in a reasonable amount of time. - https://phabricator.wikimedia.org/T348154 (10calbon) p:05Medium→03Low [15:36:03] 10Machine-Learning-Team, 10Goal: Goal: Users can query a large language model using the API Gateway and receive a response in a reasonable amount of time. - https://phabricator.wikimedia.org/T348154 (10calbon) p:05Low→03Lowest [15:36:19] 10Machine-Learning-Team, 10Goal: Goal: Increase the number of models hosted on Lift Wing - https://phabricator.wikimedia.org/T348156 (10calbon) p:05Medium→03Lowest [15:36:43] 10Machine-Learning-Team, 10Goal: Goal: Decide on an optional Lift Wing caching strategy for model servers - https://phabricator.wikimedia.org/T348155 (10calbon) p:05Medium→03Lowest [15:37:26] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Investigate increase p99 latencies in ml-serve-eqiad - https://phabricator.wikimedia.org/T352958 (10isarantopoulos) p:05High→03Medium [15:37:31] 10Machine-Learning-Team: Test the kserve batcher for Revert Risk LA isvc - https://phabricator.wikimedia.org/T348536 (10achou) p:05High→03Medium [16:26:23] https://news.ycombinator.com/item?id=38702783 😛 [16:36:12] oh my this is awesome [16:36:13] (03CR) 10Ilias Sarantopoulos: [C: 03+2] readability: set number of threads for readability [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984524 (https://phabricator.wikimedia.org/T348664) (owner: 10Ilias Sarantopoulos) [16:37:00] (03Merged) 10jenkins-bot: readability: set number of threads for readability [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984524 (https://phabricator.wikimedia.org/T348664) (owner: 10Ilias Sarantopoulos) [16:39:14] (03PS1) 10Ilias Sarantopoulos: readability: update kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984608 (https://phabricator.wikimedia.org/T348664) [16:47:43] (03CR) 10Elukey: [C: 03+1] readability: update kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984608 (https://phabricator.wikimedia.org/T348664) (owner: 10Ilias Sarantopoulos) [16:51:24] 10Machine-Learning-Team, 10SRE Observability (FY2023/2024-Q2): Gap in metrics rendered from Thanos Rules - https://phabricator.wikimedia.org/T352756 (10elukey) Summary: we removed the Pyrra config (related to Lift Wing) that caused some error in the Thanos logs, to remove a variable and see if the Gaps disappe... [17:02:31] 10Machine-Learning-Team, 10observability, 10SRE Observability (FY2023/2024-Q2): Istio recording rules for Pyrra and Grizzly - https://phabricator.wikimedia.org/T351390 (10elukey) Summary so far: The Grizzly setup seems to work, except for gaps in metrics as shown in T352756. The Thanos recording rules in pu... [17:06:38] folks I have restored the previous staging state [17:06:45] all config + revert risk etc.. [17:06:55] I'll make a summary in the tasks [17:13:28] Ack! [17:13:45] Going afk o/ [17:17:17] 10Machine-Learning-Team, 10Patch-For-Review: Improve Istio's mesh traffic transparent proxy capabilities for external domains accessed by Lift Wing - https://phabricator.wikimedia.org/T353622 (10elukey) I was able to make Revert Risk agnostic to work without `WIKI_URL` set to `api-ro.discovery.wmnet`, with the... [17:18:10] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Investigate increase p99 latencies in ml-serve-eqiad - https://phabricator.wikimedia.org/T352958 (10elukey) Added a summary in T353622#9419330 about the possibility to let Istio to handle these use cases. The answer is yes, it should be possible, but... [17:26:55] logging off folks o/ [17:27:00] have a nice rest of the day :) [18:13:04] 10Machine-Learning-Team: Test the kserve batcher for Revert Risk LA isvc - https://phabricator.wikimedia.org/T348536 (10achou) I used the wrk tool to test the kserve batcher in order to better understand the batching process. The input for the wrk script is the same as before, with one revision ID per request.... [18:21:29] 10Machine-Learning-Team, 10Goal: Goal: Lift Wing users can request multiple predictions using a single request. - https://phabricator.wikimedia.org/T348153 (10achou) [18:21:32] 10Machine-Learning-Team: Test the kserve batcher for Revert Risk LA isvc - https://phabricator.wikimedia.org/T348536 (10achou) 05Open→03Resolved The next step is to add support for batch inference in knowledge-integrity (T352987). Going to resolve this task. [18:56:37] night elukey! [20:47:01] (03CR) 10Ilias Sarantopoulos: [C: 03+2] readability: update kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984608 (https://phabricator.wikimedia.org/T348664) (owner: 10Ilias Sarantopoulos) [20:47:47] (03Merged) 10jenkins-bot: readability: update kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/984608 (https://phabricator.wikimedia.org/T348664) (owner: 10Ilias Sarantopoulos) [20:50:00] 10Machine-Learning-Team: Optimize response performance for the article-descriptions model-server - https://phabricator.wikimedia.org/T353127 (10Isaac) > thank you for letting us know about the Cloud VPS API using an older library version. We have been wondering why the LiftWing latency is not matching that of th...