[06:53:21] hey o/ [06:54:04] testing the mistrail-7b-instruct locally and then I'll file a patch to try to deploy that one. [06:55:53] I'll be afk for a bit (~1h), I have a doctor's appointment [07:17:55] (03PS3) 10Kevin Bazira: logo-detection: add KServe custom model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) [07:49:42] morning! o/ [08:39:39] back! hey Aiko! [10:03:56] 06Machine-Learning-Team: Test revertrisk-multilingual with GPU - https://phabricator.wikimedia.org/T356045#9700139 (10achou) I built a RRML image locally using the Pytorch 2.2.x base image from T360638. The image size is 13.6GB. Here are the layers: ` % docker history rrml-gpu:1 IMAGE CREATED... [10:47:41] (03CR) 10Ilias Sarantopoulos: "We had discussed that it would be best to receive the image in the size (224x224) from the upload wizard as it would make things much fast" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) (owner: 10Kevin Bazira) [10:48:26] kevinbazira: o/ perhaps you missed my messages on IRC yesterday so I wrote on the patch [11:02:24] * isaranto lunch [11:25:29] isaranto, elukey: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1018235 [11:26:31] +1! [11:26:49] I lied about lunch, I'm going now :) [11:29:54] XD [11:30:49] thanksss [11:32:20] 07artificial-intelligence, 10Reconciliation: Alternative, affordable, lower-barrier approach(es) to reconciliation - https://phabricator.wikimedia.org/T362149#9700452 (10Spinster) [11:44:17] (03PS1) 10AikoChou: revertrisk: use the Pytorch base image for RRML GPU inference [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) [11:49:06] (03PS2) 10AikoChou: revertrisk: use the Pytorch base image for RRML GPU inference [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) [11:54:19] * aiko lunch! [12:34:15] hello folks! [12:34:17] thanks aiko :) [12:40:31] Hello! [12:41:59] (03CR) 10Elukey: [C:03+1] "Great work! Left a comment but lgtm" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou) [12:49:27] (03CR) 10Elukey: "I agree with Ilias, the suggestions are sound! I would also involve either SRE (Application Security, like Moritz) or the Security team to" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) (owner: 10Kevin Bazira) [12:50:28] aiko: deployed your RR changes in ml-staging-codfw [12:52:35] both pods are up, and the OMP_NUM_THREADS value seem sound [12:54:47] nice! [12:55:31] I'll wait Aiko to test them, but so far everything is good [12:59:40] ack [13:00:24] I'll be sending patches to deploy 2 models in the experimental namespace. It is taking me longer to test locally as I ran out of disk space (1GB left!) [13:00:55] probably after our meeting as I'm meeting with Mercelis in a bit [13:48:57] I added the bert model which is small so that we can test the hf image https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1018274 [13:52:09] (03PS4) 10Kevin Bazira: logo-detection: add KServe custom model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) [13:59:08] (03CR) 10Kevin Bazira: "> We had discussed that it would be best to receive the image in the size (224x224) from the upload wizard as it would make things much fa" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) (owner: 10Kevin Bazira) [13:59:35] elukey: I tested both models. they work without problems :) [14:07:29] yessss (I tested ML not wikidata, thanks!) [14:44:30] 06Machine-Learning-Team: Investigate the inconsistent load test results (locust) for revertrisk - https://phabricator.wikimedia.org/T361881#9701183 (10isarantopoulos) [14:45:30] 06Machine-Learning-Team, 13Patch-For-Review: Create logo-detection model-server to be hosted on LiftWing - https://phabricator.wikimedia.org/T361803#9701199 (10isarantopoulos) [14:49:30] 10Lift-Wing, 06Machine-Learning-Team: Determine a structure for the python package repository - https://phabricator.wikimedia.org/T361370#9701239 (10isarantopoulos) [14:51:46] 06Machine-Learning-Team: Update and fix locust load testing for revscoring models - https://phabricator.wikimedia.org/T361238#9701246 (10isarantopoulos) [15:00:02] 06Machine-Learning-Team, 13Patch-For-Review: 14Assess runtime performance impact of pydantic data models in the RRLA model-server - 14https://phabricator.wikimedia.org/T355742#9701320 (10kevinbazira) 05Open→03Resolved [15:02:07] I forgot to celebrate the base pytorch image \o/ [15:02:10] 06Machine-Learning-Team: 14Optimize response performance for the article-descriptions model-server - 14https://phabricator.wikimedia.org/T353127#9701356 (10kevinbazira) 05Open→03Resolved [15:05:11] nothing to celebrate, it is a hack :D [15:14:14] 06Machine-Learning-Team, 13Patch-For-Review: Use Huggingface model server image for HF LLMs - https://phabricator.wikimedia.org/T357986#9701414 (10isarantopoulos) Deployed bert-base-uncased model on ml-staging and it works! ` time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/bert:predict" -X... [15:14:40] isaranto: wow --^ [15:15:27] 10 seconds so slow :P [15:15:38] It's great that it worked! [15:26:33] (03CR) 10AikoChou: revertrisk: use the Pytorch base image for RRML GPU inference (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou) [15:32:11] isaranto: nice! would like to see how many secs with gpu :D [15:32:49] I'm going to do it with a bigger model [15:33:57] ack! [15:34:07] logging off earlier folks! see you tomorrow :) [15:34:28] Guten abend aiko: o/ [15:42:44] wǎnshàng hǎo! (hope to have got it right) [15:53:17] 06Machine-Learning-Team, 13Patch-For-Review: Set automatically libomp's num threads when using Pytorch - https://phabricator.wikimedia.org/T360111#9701683 (10elukey) Thanks to Aiko that fixed some issues with RR Wikidata and ML, the new code is now deployed to all the model servers that used to have OMP_NUM_TH... [15:55:41] 10Lift-Wing, 06Machine-Learning-Team, 10ORES, 10ChangeProp, and 5 others: Selectively disable changeprop functionality that is no longer used - https://phabricator.wikimedia.org/T361483#9701705 (10elukey) >>! In T361483#9688445, @akosiaris wrote: >>>! In T361483#9680093, @elukey wrote: >>>>! In T361483#968... [16:15:55] wow, someone already attempted to fix the issue I raised yesterday! https://github.com/kserve/kserve/pull/3582 [16:16:14] I mean this issue https://github.com/kserve/kserve/issues/3580 [16:19:17] nice! [16:24:23] going afk, have a nice rest of the day folks! [16:25:16] me2, ciao folks, cu tomorrow! [19:26:10] 06Machine-Learning-Team, 06Research: Add Article Quality Model to LiftWing - https://phabricator.wikimedia.org/T360455#9702164 (10leila) We reviewed this task in the Research backlog refinement meeting today. @Miriam communicated that this is a task for the ML team. Moving the task to the Support Needed lane b...