[07:12:42] (03CR) 10Kevin Bazira: "Thank you for structuring the code to accommodate running tests for a specified model-server." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993078 (https://phabricator.wikimedia.org/T355394) (owner: 10Ilias Sarantopoulos) [07:56:32] Good morning \o/ [08:55:27] 10Machine-Learning-Team, 10Wikipedia-Android-App-Backlog (Android Release - FY2023-24): Migrate Machine-generated Article Descriptions from toolforge to liftwing. - https://phabricator.wikimedia.org/T343123 (10kevinbazira) @Seddon, in T353127 we were able to make significant improvements in response latency. F... [09:30:18] Morning! [09:30:39] ml-serve2004 fell over on the weekend, but is now back. Currently investigating if I can find a cause [09:35:51] hey Tobias! [11:33:26] 10Machine-Learning-Team: Debug GPU deployments on ml-staging - https://phabricator.wikimedia.org/T356038 (10isarantopoulos) [11:34:01] 10Lift-Wing, 10Machine-Learning-Team: Debug GPU deployments on ml-staging - https://phabricator.wikimedia.org/T356038 (10isarantopoulos) [11:50:21] o/ [11:52:11] hey aiko! [12:02:02] * isaranto lunch! [12:04:15] ditto [12:44:21] 10Machine-Learning-Team: Test revertrisk-multilingual with GPU - https://phabricator.wikimedia.org/T356045 (10achou) [14:05:42] (03CR) 10Ilias Sarantopoulos: "Both of your comments are totally valid! However at the moment we just setup the example of how to work and these details will be finalize" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993078 (https://phabricator.wikimedia.org/T355394) (owner: 10Ilias Sarantopoulos) [14:16:15] (03PS5) 10Ilias Sarantopoulos: locust: save separate results file per model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993078 (https://phabricator.wikimedia.org/T355394) [14:17:05] (03CR) 10Ilias Sarantopoulos: "I have updated the patch and set run time to 60s and modified the endpoints to be compatible with both internal and external endpoints." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993078 (https://phabricator.wikimedia.org/T355394) (owner: 10Ilias Sarantopoulos) [14:22:29] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10Chinese-Sites, 10CommRel-Specialists-Support (Oct-Dec-2023): Support languages whose add-a-link models were not published - https://phabricator.wikimedia.org/T309263 (10AKhatun_WMF) [14:35:21] Good morning all [14:35:40] heyo chris [15:09:59] o\ [15:16:13] I hope the attempt to add a GPU to article desc works 🤞 [15:16:30] if any of you have a moment https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/993707 [15:19:29] Looking [15:20:16] LGTMd [15:21:11] Danke! hope it works [15:28:55] ah, it isn't much faster so I suspect that the GPU is not utilized properly [15:29:26] let me open radeontop and then you make some queries, that should answer it [15:29:38] ok, go [15:30:38] fwiw, I see only 7M of VRAM used, so it's likely not using the GPU, unless it loads the model ondemand [15:36:31] lol yes [15:38:26] I can check from grafana as well https://grafana.wikimedia.org/d/ZAX3zaIWz/amd-rocm-gpu [15:38:40] well, I'll need to go through the code to check [15:38:48] thanks Tobias! [15:46:01] With your new attach powers, you could also try and start a python interpreter, see if there's any permission problem or similar [15:51:07] aha! correct. although since we already use it in other isvcs there shouldn't be any such scenario [15:55:32] taking your advice however I did that and it seems that torch cant find the gpu. I attached a shell , run the python intepreter and then `import torch;torch.cuda.is_available()` which should be True but I got false :( [15:58:56] ok I figured it out! it is because the image is using the cpu version of pytorch [16:01:18] (03PS1) 10Ilias Sarantopoulos: article-descriptions: add GPu version of pytorch [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993715 [16:01:43] this will do it --^ [16:15:51] (03CR) 10Klausman: [C: 03+1] article-descriptions: add GPu version of pytorch [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993715 (owner: 10Ilias Sarantopoulos) [16:28:15] (03CR) 10Ilias Sarantopoulos: [C: 03+2] article-descriptions: add GPu version of pytorch [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993715 (owner: 10Ilias Sarantopoulos) [16:29:03] (03Merged) 10jenkins-bot: article-descriptions: add GPu version of pytorch [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993715 (owner: 10Ilias Sarantopoulos) [16:30:40] (03CR) 10Kevin Bazira: "Ack" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993078 (https://phabricator.wikimedia.org/T355394) (owner: 10Ilias Sarantopoulos) [16:55:49] kevinbazira: I'll check the above and let you know. However it seems normal if you tried to use the internal API from your localhost [16:56:02] I have one more patch for today https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/993729 [16:56:24] going afk for the day. will join a bit later just to test the GPU. cu folks! [16:57:43] isaranto: sure sure, I've +1'd. [16:57:54] Enjoy your evening. o/ [17:22:36] heading out now as well, \o [18:02:04] night! [18:21:17] (03PS6) 10Ilias Sarantopoulos: locust: save separate results file per model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993078 (https://phabricator.wikimedia.org/T355394) [18:30:55] hmm I can't use the GPU because it is being used by the previous revision of article-descriptions model. So I figured out I also need to have permissions to delete revisions in ml-staging. I'll deal with it tomorrow then. o/ [18:32:34] ruh roh [18:34:21] chrisalbon: TIL about "ruh roh" 😛 [18:34:27] I bet kids still love it [18:34:58] ha, true [19:49:39] 10Machine-Learning-Team, 10Research: Explore using revertrisk language agnostic API in a pre-save context - https://phabricator.wikimedia.org/T356102 (10kostajh) [19:57:03] 10Machine-Learning-Team, 10Research: Explore using revertrisk language agnostic API in a pre-save context - https://phabricator.wikimedia.org/T356102 (10kostajh) [22:28:10] 10Machine-Learning-Team, 10artificial-intelligence, 10Research ideas: [Epic] Paid editing (COI) detection model - https://phabricator.wikimedia.org/T120170 (10Harej)