[06:31:39] <wikibugs>	 (03PS2) 10Kevin Bazira: article-country: containerize model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077391 (https://phabricator.wikimedia.org/T371897)
[06:44:11] <wikibugs>	 (03CR) 10Kevin Bazira: "Thank you for the comment, Aiko." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077391 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[07:05:36] <wikibugs>	 (03CR) 10Santhosh: [C:04-1] Use category search to find campaign pages instead of template (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas)
[07:19:45] <isaranto>	 Good morning folks o/
[09:42:19] <aiko>	 Guten Tag o/
[09:43:54] <isaranto>	 Guten Tag!
[09:48:13] <isaranto>	 I was just testing the new ref quality service with the 2 models. great idea! I was having the same thought for the GPU deployments - if we can have several LLMs that don't have high utilization in the same service 
[09:48:53] <isaranto>	 it is kind of a different topic though
[09:52:44] <wikibugs>	 (03PS5) 10Kevin Bazira: article-country: initial commit [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1075033 (https://phabricator.wikimedia.org/T371897)
[09:53:31] <wikibugs>	 (03PS6) 10Kevin Bazira: article-country: initial commit [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1075033 (https://phabricator.wikimedia.org/T371897)
[09:53:32] <aiko>	 yesss I think so too! it's worth testing on gpu! so we might be able to share a gpu with multiple models
[09:54:27] <wikibugs>	 (03CR) 10Kevin Bazira: article-country: initial commit (035 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1075033 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira)
[09:54:32] <isaranto>	 we might need to switch to triton or onnx runtimes which seem optimized according to docs for this kind of work but we'll see
[09:56:55] <isaranto>	 aiko: which python version did you use for the refquality model? I had an issue while unpickling in a virtualenv with python 3.10 while with 3.11 I have the torch 1.13 issue for m1. Or shall I just try the docker image directly?
[10:01:20] <aiko>	 nice! afaik research team did some experiments with onnx and it looked pretty promising
[10:02:27] <aiko>	 isaranto: ah I think I only tested the docker image
[10:05:06] <isaranto>	 yes, I saw this on onnx https://phabricator.wikimedia.org/T368614#10202389
[10:06:13] <isaranto>	 ok I'll do the image then. We'll need to figure out the issue with the old torch version. I didn't follow up on that
[10:06:21] <isaranto>	 at least to have the next version on a newer version
[10:12:12] <aiko>	 cool I didn't see that :)
[10:12:58] <aiko>	 ack! I'll test the refquality locally and see if I encounter the same issue
[10:19:00] <wikibugs>	 (03CR) 10Nik Gkountas: Use category search to find campaign pages instead of template (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas)
[10:19:13] <wikibugs>	 (03PS4) 10Nik Gkountas: Use category search to find campaign pages instead of template [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132)
[10:19:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Use category search to find campaign pages instead of template [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas)
[10:20:03] <wikibugs>	 (03PS10) 10Nik Gkountas: Fetch campaign metadata and return them with recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132)
[10:20:13] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Fetch campaign metadata and return them with recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas)
[10:22:31] <isaranto>	 iirc there is no pre built python wheel for torch 1.13 for macos so you would face this issue
[10:22:43] <wikibugs>	 (03CR) 10Nik Gkountas: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas)
[10:35:37] <wikibugs>	 (03CR) 10Nik Gkountas: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas)
[11:07:07] * isaranto afk lunch
[12:55:09] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: Implementing Team-Based Deployment Permissions in Lift Wing - https://phabricator.wikimedia.org/T376614 (10isarantopoulos) 03NEW
[13:04:07] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: Implementing Team-Based Deployment Permissions in Lift Wing - https://phabricator.wikimedia.org/T376614#10206661 (10isarantopoulos) @klausman I wrote an initial suggestion, and we can reshape the solution based on what is easier/better configuration-wise
[13:17:13] <isaranto>	 aiko: o/ when I make a request to ref-need it hangs. did this happen to you?
[13:17:18] <isaranto>	 ref-risk works fine
[13:19:42] <chrisalbon>	 Good morning all
[13:21:15] <isaranto>	 mooorning Chris
[13:44:14] <wikibugs>	 06Machine-Learning-Team, 06SRE, 10SRE-Access-Requests, 10LPL Essential (LPL Essential 2024 Jul-Sep): Access to deploy recommendation API ML service for kartik - https://phabricator.wikimedia.org/T376585#10206774 (10isarantopoulos)
[14:13:47] <aiko>	 isaranto: ahh yes it also happened to me. let me look into that
[14:14:11] <aiko>	 .. I forgot to test ref-need for the new image
[14:14:21] <isaranto>	 aiko: it may be m1 related so maybe just test it on ml-testing
[14:14:59] <aiko>	 ack!
[14:16:41] <isaranto>	 seems like it worked fine for Kevin
[14:17:24] <aiko>	 ohh interesting!
[14:22:11] <isaranto>	 I'm almost certain it has to do with the old torch version and m1, but I may be biased :D
[14:30:35] <aiko>	 isaranto: did you test with the models downloaded from here https://analytics.wikimedia.org/published/wmf-ml-models/reference-quality/20240930095938/ ?
[14:30:57] <isaranto>	 yes
[14:31:17] <aiko>	 it happens with the updated ref-need model. 
[14:32:05] <aiko>	 mm I'll figure out why
[14:33:32] <aiko>	 I tested the old binary https://analytics.wikimedia.org/published/wmf-ml-models/reference-quality/reference-need/20240903095237/ and it works fine
[14:35:02] <isaranto>	 aiko: did it hang on ml-testing as well?
[14:35:30] <aiko>	 I haven't tested it there yet
[14:36:42] <isaranto>	 ack!
[15:06:27] <isaranto>	 I need to find out when kserve 0.14 is going to be released
[15:06:40] <isaranto>	 we are in the rc stage https://github.com/kserve/kserve/releases and it should be expected soon
[15:18:54] <aiko>	 isaranto: it didn't hang on ml-testing! both models work fine
[15:19:07] <isaranto>	 ack!
[15:28:03] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] reference-quality: add reference-risk model (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1076163 (https://phabricator.wikimedia.org/T372405) (owner: 10AikoChou)
[15:36:21] * isaranto afk!
[21:23:14] <wikibugs>	 (03CR) 10AikoChou: [C:03+2] "Thanks for the review!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1076163 (https://phabricator.wikimedia.org/T372405) (owner: 10AikoChou)
[21:26:51] <wikibugs>	 (03Merged) 10jenkins-bot: reference-quality: add reference-risk model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1076163 (https://phabricator.wikimedia.org/T372405) (owner: 10AikoChou)