[06:31:39] (03PS2) 10Kevin Bazira: article-country: containerize model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077391 (https://phabricator.wikimedia.org/T371897) [06:44:11] (03CR) 10Kevin Bazira: "Thank you for the comment, Aiko." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077391 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [07:05:36] (03CR) 10Santhosh: [C:04-1] Use category search to find campaign pages instead of template (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas) [07:19:45] Good morning folks o/ [09:42:19] Guten Tag o/ [09:43:54] Guten Tag! [09:48:13] I was just testing the new ref quality service with the 2 models. great idea! I was having the same thought for the GPU deployments - if we can have several LLMs that don't have high utilization in the same service [09:48:53] it is kind of a different topic though [09:52:44] (03PS5) 10Kevin Bazira: article-country: initial commit [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1075033 (https://phabricator.wikimedia.org/T371897) [09:53:31] (03PS6) 10Kevin Bazira: article-country: initial commit [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1075033 (https://phabricator.wikimedia.org/T371897) [09:53:32] yesss I think so too! it's worth testing on gpu! so we might be able to share a gpu with multiple models [09:54:27] (03CR) 10Kevin Bazira: article-country: initial commit (035 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1075033 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [09:54:32] we might need to switch to triton or onnx runtimes which seem optimized according to docs for this kind of work but we'll see [09:56:55] aiko: which python version did you use for the refquality model? I had an issue while unpickling in a virtualenv with python 3.10 while with 3.11 I have the torch 1.13 issue for m1. Or shall I just try the docker image directly? [10:01:20] nice! afaik research team did some experiments with onnx and it looked pretty promising [10:02:27] isaranto: ah I think I only tested the docker image [10:05:06] yes, I saw this on onnx https://phabricator.wikimedia.org/T368614#10202389 [10:06:13] ok I'll do the image then. We'll need to figure out the issue with the old torch version. I didn't follow up on that [10:06:21] at least to have the next version on a newer version [10:12:12] cool I didn't see that :) [10:12:58] ack! I'll test the refquality locally and see if I encounter the same issue [10:19:00] (03CR) 10Nik Gkountas: Use category search to find campaign pages instead of template (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas) [10:19:13] (03PS4) 10Nik Gkountas: Use category search to find campaign pages instead of template [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132) [10:19:22] (03CR) 10CI reject: [V:04-1] Use category search to find campaign pages instead of template [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas) [10:20:03] (03PS10) 10Nik Gkountas: Fetch campaign metadata and return them with recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) [10:20:13] (03CR) 10CI reject: [V:04-1] Fetch campaign metadata and return them with recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas) [10:22:31] iirc there is no pre built python wheel for torch 1.13 for macos so you would face this issue [10:22:43] (03CR) 10Nik Gkountas: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas) [10:35:37] (03CR) 10Nik Gkountas: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas) [11:07:07] * isaranto afk lunch [12:55:09] 10Lift-Wing, 06Machine-Learning-Team: Implementing Team-Based Deployment Permissions in Lift Wing - https://phabricator.wikimedia.org/T376614 (10isarantopoulos) 03NEW [13:04:07] 10Lift-Wing, 06Machine-Learning-Team: Implementing Team-Based Deployment Permissions in Lift Wing - https://phabricator.wikimedia.org/T376614#10206661 (10isarantopoulos) @klausman I wrote an initial suggestion, and we can reshape the solution based on what is easier/better configuration-wise [13:17:13] aiko: o/ when I make a request to ref-need it hangs. did this happen to you? [13:17:18] ref-risk works fine [13:19:42] Good morning all [13:21:15] mooorning Chris [13:44:14] 06Machine-Learning-Team, 06SRE, 10SRE-Access-Requests, 10LPL Essential (LPL Essential 2024 Jul-Sep): Access to deploy recommendation API ML service for kartik - https://phabricator.wikimedia.org/T376585#10206774 (10isarantopoulos) [14:13:47] isaranto: ahh yes it also happened to me. let me look into that [14:14:11] .. I forgot to test ref-need for the new image [14:14:21] aiko: it may be m1 related so maybe just test it on ml-testing [14:14:59] ack! [14:16:41] seems like it worked fine for Kevin [14:17:24] ohh interesting! [14:22:11] I'm almost certain it has to do with the old torch version and m1, but I may be biased :D [14:30:35] isaranto: did you test with the models downloaded from here https://analytics.wikimedia.org/published/wmf-ml-models/reference-quality/20240930095938/ ? [14:30:57] yes [14:31:17] it happens with the updated ref-need model. [14:32:05] mm I'll figure out why [14:33:32] I tested the old binary https://analytics.wikimedia.org/published/wmf-ml-models/reference-quality/reference-need/20240903095237/ and it works fine [14:35:02] aiko: did it hang on ml-testing as well? [14:35:30] I haven't tested it there yet [14:36:42] ack! [15:06:27] I need to find out when kserve 0.14 is going to be released [15:06:40] we are in the rc stage https://github.com/kserve/kserve/releases and it should be expected soon [15:18:54] isaranto: it didn't hang on ml-testing! both models work fine [15:19:07] ack! [15:28:03] (03CR) 10Ilias Sarantopoulos: [C:03+1] reference-quality: add reference-risk model (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1076163 (https://phabricator.wikimedia.org/T372405) (owner: 10AikoChou) [15:36:21] * isaranto afk! [21:23:14] (03CR) 10AikoChou: [C:03+2] "Thanks for the review!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1076163 (https://phabricator.wikimedia.org/T372405) (owner: 10AikoChou) [21:26:51] (03Merged) 10jenkins-bot: reference-quality: add reference-risk model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1076163 (https://phabricator.wikimedia.org/T372405) (owner: 10AikoChou)