[06:49:51] Good morning and have a nice week everyone! [06:58:29] (03PS29) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) [07:00:03] (03PS30) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) [07:00:28] (03CR) 10Ilias Sarantopoulos: "Made the changes and rebased!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [07:55:40] kevinbazira: o/ is there anything I can help with the logo detection model? [07:55:58] isaranto: o/ [07:56:15] I was trying to switch the backend to torch but it may need a lot of debugging to do so [07:57:01] syre sure. I'll be pushing a patch for the model-server soon. you could help with the reviews. thanks! [07:58:51] ok! [08:46:36] (03CR) 10Elukey: [C:03+1] huggingface: add huggingface image (034 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [09:54:02] 06Machine-Learning-Team, 06serviceops, 13Patch-For-Review: Rename the envoy's uses_ingress option to sets_sni - https://phabricator.wikimedia.org/T346638#9696700 (10JMeybohm) [09:57:31] (03PS1) 10Kevin Bazira: logo-detection: add KServe custom model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) [10:01:46] (03CR) 10Kevin Bazira: "To make the review process easier, I have added only the KServe custom model-server for now. Please find the commands I used to test it in" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) (owner: 10Kevin Bazira) [10:06:32] (03PS2) 10Kevin Bazira: logo-detection: add KServe custom model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) [10:25:07] (03CR) 10AikoChou: [C:03+1] huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [10:27:19] Guten tag o/ [10:31:19] Guten tag \o/ [10:32:44] (03CR) 10Ilias Sarantopoulos: [C:03+2] "Done" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [10:42:30] (03Merged) 10jenkins-bot: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [11:13:47] I'm thinking of removing nsfw model from publishing docker images. [11:18:03] * isaranto lunch! [12:00:27] hf image was published without having any issues \o/ [12:01:22] nice!! \o/ [12:28:56] hello folks! [12:31:28] 06Machine-Learning-Team, 10Observability-Metrics, 10SRE Observability (FY2023/2024-Q4): Gap in metrics rendered from Thanos Rules - https://phabricator.wikimedia.org/T352756#9697175 (10elukey) a:05klausman→03elukey [12:32:43] 06Machine-Learning-Team: Run unit tests for the inference-services repo in CI - https://phabricator.wikimedia.org/T360120#9697180 (10elukey) a:05elukey→03None [12:33:06] 06Machine-Learning-Team, 13Patch-For-Review: Improve Istio's mesh traffic transparent proxy capabilities for external domains accessed by Lift Wing - https://phabricator.wikimedia.org/T353622#9697183 (10elukey) a:03elukey [12:40:06] hey Luca1 [12:43:35] *! [12:44:58] (03CR) 10Elukey: "Hi Kevin!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) (owner: 10Kevin Bazira) [12:46:01] elukey: kevinbazira --^ regarding the above. I also wanted to discuss some things. [12:46:24] did Marco mention if they are going to send us the 224x224 images or shall I ask him on phabricator? [13:12:28] kevinbazira: (wrong ping previously) [13:14:13] If they do the resizing then we can accept the encoded image in some econding (e.g. base64) and we wont be facing any clear security issues as the image will be decoded and converted to a tensor [13:16:40] on the hf image front: I filed a patch to deploy falcon-7b-instruct model on experimental namespace https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1017858 [13:17:28] perhaps I should be trying first with a smaller model to validate that the hf image works...If it fails I'll jump to a smaller one directly and then focus on debugging etc [13:51:28] (03CR) 10Kevin Bazira: "Hi Luca," [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) (owner: 10Kevin Bazira) [13:57:00] isaranto: go big or go home :D [13:57:05] +1ed, seems good to go [13:57:16] I'm home already so.... [13:57:18] :P [13:57:28] the requirements are big, hopefully there is space in staging :D [13:57:30] ahahaha yes [13:59:58] isaranto: do we have plans to test https://huggingface.co/allenai/OLMo-7B ? [14:00:55] sure! why not? [14:01:12] if it works fine it could be a very nice alternative to Falcon [14:01:18] on paper it seems really open [14:01:39] I mean we don't have a written plan, but it would be best to narrow down a subset/list of models that are the best candidates for production [14:01:45] other than just testing I mean [14:02:20] it is quite huge for a 7B param model [14:02:53] ~27GB while others are close to 10-14GB [14:03:00] wow didn't see it [14:03:11] downloading 30G from swift will be nice [14:04:49] if anybody has time for https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1017292 [14:05:28] +1 [14:05:29] I am planning to test the new images for models that need OMP_NUM_THREADS, and let the others to pick it up if needed [14:05:48] thankss [14:05:51] I can deploy the one on experimental namespace (wikidata) [14:05:58] super [14:06:00] I just deployed so I can just do it again [14:06:08] ping me once u merg [14:06:11] *e it [14:07:07] already merged [14:07:14] already deployed :D [14:07:19] nice :) [14:07:51] MI/MD instead of CI/CD [14:07:59] manual integration, manual delivery [14:08:01] :D [14:10:02] oups it crashed cause of an error in kserve-container https://phabricator.wikimedia.org/P59848 [14:10:02] cc: aiko: [14:14:05] isaranto: ahhh KI needs to be upgraded to v0.6 as well https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/revert_risk_model/model_server/wikidata/requirements.txt [14:14:30] ack! [14:17:34] aiko: o/ is RR-ML ok to deploy to staging? Or does it need a 0.6 upgrade as well? [14:20:39] isaranto: and the kserve needs to be a specific commit for a fix from upstream https://github.com/kserve/kserve/pull/3556 [14:20:44] elukey: let me check [14:44:33] ouff ofc it failed [14:44:55] elukey, isaranto: o/ I'm going to file a patch for RR-ML and RR-wikidata [14:45:17] thanks! <3 [14:45:30] thanks Aiko! [15:12:58] (03PS1) 10AikoChou: revertrisk: update KI to v0.6 for RRML and RR-wikidata [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017877 [15:13:08] 06Machine-Learning-Team, 13Patch-For-Review: Use Huggingface model server image for HF LLMs - https://phabricator.wikimedia.org/T357986#9698023 (10isarantopoulos) I tried to deploy falcon-7b-instruct using the hf image but got the following error in the kserve-container: ` kubectl logs falcon-7b-instruct-gpu-p... [15:14:50] elukey, isaranto: --^ [15:17:42] (03CR) 10Ilias Sarantopoulos: "Just a nit regarding kserve, other than that LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017877 (owner: 10AikoChou) [15:37:28] (03PS2) 10AikoChou: revertrisk: update KI to v0.6 for RRML and RR-wikidata [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017877 [15:37:38] 06Machine-Learning-Team, 13Patch-For-Review: Use Huggingface model server image for HF LLMs - https://phabricator.wikimedia.org/T357986#9698165 (10isarantopoulos) Proceeding with [[ https://huggingface.co/google-bert/bert-base-uncased | google-bert-uncased ]] as an example model (the one we used during debuggi... [15:38:09] I'm uploading new llm models on swift to deploy tomorrow [15:38:17] logging off folks, have a nice evening! [15:39:19] (03CR) 10Ilias Sarantopoulos: [C:03+1] revertrisk: update KI to v0.6 for RRML and RR-wikidata (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017877 (owner: 10AikoChou) [15:39:24] (03CR) 10Elukey: [C:03+1] revertrisk: update KI to v0.6 for RRML and RR-wikidata (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017877 (owner: 10AikoChou) [15:40:00] aiko: my mind was stuck and I couldn't remember the reason! and if I couldn't remember now in a week it would have been worse. thanks! [15:41:04] (03CR) 10AikoChou: [C:03+2] revertrisk: update KI to v0.6 for RRML and RR-wikidata [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017877 (owner: 10AikoChou) [15:43:56] isaranto: it's always good for future reference :D [15:44:51] Luca gave me a similar review this morning so I had the comment fresh in mind :) [15:45:01] o/ [15:47:03] (03Merged) 10jenkins-bot: revertrisk: update KI to v0.6 for RRML and RR-wikidata [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017877 (owner: 10AikoChou) [15:47:51] always blaming Luca [16:28:31] 06Machine-Learning-Team, 10Foundational Technology Requests: Content Translation Recommendations API - https://phabricator.wikimedia.org/T293648#9698423 (10Astinson) After discussions with @Isaac and @Pginer-WMF So one feature that I don't think we are paying attention to yet is to "relative importance" of th... [16:53:42] have a nice rest of the day folks! [16:56:44] bye Luca! have nice evening