[07:26:51] Good morning o/ [07:32:47] Morning everyone! [08:11:17] hi Tobias! [08:12:36] klausman: if you want you can merge this patch https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1071630 [08:12:36] as I don't have +2 on prod images repo [08:12:48] ah, right [08:14:01] and done [08:16:26] Danke! [08:17:10] 06Machine-Learning-Team, 13Patch-For-Review: Migrate the ownership of ML-Owned Docker images in production-images repo to mailing lists - https://phabricator.wikimedia.org/T374233#10132739 (10isarantopoulos) [08:17:23] I'll take another look afterwards, but I don't think there are other images that we should own [08:17:29] Agreed [08:18:06] There are some that I have worked on like golang, but they're far from ML exclusive (or even "mostly used by") [08:18:56] ack [08:21:17] thanks for the update folks, remember to kick off the image rebuild [08:27:22] I am having trouble logging into build2001 atm, but shoudl be fixed in a jiffy [08:29:38] building now [08:35:44] thanks for the reviews/help Luca! [09:10:58] (03PS1) 10AikoChou: ci: add blubber for reference-need [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1071818 (https://phabricator.wikimedia.org/T371902) [09:15:06] hello :) [09:16:23] Guten Tag Aiko [09:17:04] guten tag Ilias! [09:18:37] 안녕하세요! :) [09:19:52] isaranto: the blubber file for reference-need is ready for review https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1071818 when you have a moment :) [09:20:48] great! will do! [09:22:35] I'm taking an ealry lunchbreak so I can go to an apt viewiing, bbiab [09:23:27] good luck Tobias! [09:24:39] aiko: are we rebranding the ref need to ref quality? [09:25:11] tbh ref quality seems more suitable, just asking so that we rename it everywhere [09:40:37] isaranto: ref-quality is the project name containing ref-need and ref-risk. Muniza and I think it would be better to start with a single service for ref-need and ref-risk, rather than two separate services [09:43:44] ack [09:50:40] (03PS1) 10AikoChou: reference-quality: add CI pipelines to config.yaml [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1071824 (https://phabricator.wikimedia.org/T371902) [09:51:54] (03CR) 10AikoChou: "Related patch: https://gerrit.wikimedia.org/r/c/integration/config/+/1071825" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1071824 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [09:52:15] (03CR) 10Ilias Sarantopoulos: [C:03+1] "LGTM! tested both production and test images and they work great." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1071818 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [09:53:04] I noticed some really long latencies while testing the image, but it can be due to os architecture and pytorch (macos I mean) [09:54:11] pasting my local logs just as fyi https://phabricator.wikimedia.org/P68765, we'll figure things out when we do load testing [09:56:26] yeah I got similar latencies as you https://phabricator.wikimedia.org/P68762 [10:05:12] * aiko lunch! [10:33:32] * isaranto lunch! [10:57:12] (03CR) 10AikoChou: [C:03+2] ci: add blubber for reference-need [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1071818 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [11:23:36] (03CR) 10Ilias Sarantopoulos: [V:03+2 C:03+1] ci: add blubber for reference-need [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1071818 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [11:24:24] (03CR) 10Ilias Sarantopoulos: [C:03+1] reference-quality: add CI pipelines to config.yaml [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1071824 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [12:36:00] klausman: o/ regarding https://phabricator.wikimedia.org/T372405#10131224, do we need to do anything to make this work for new model servers? I think we can always see/filter UAs in the logstash dashboard, right? [12:37:35] Yes, UAs should always be visible with our usual logging [12:37:58] The key is of course that the _client_ sets a useful one :) [12:39:13] I see, thanks for the answer! [13:04:53] aiko: it was missing from the client that was using revertrisk (automoderator) so we couldn't check the traffic on our side [13:05:06] nothing to do then on our side [13:35:36] 06Machine-Learning-Team: [LLM] log input/output size per request - https://phabricator.wikimedia.org/T370775#10133946 (10isarantopoulos) I just thought again about this: the input size of the prompt is the size of the request which we already have in bytes, while the output number of tokens is a POST parameter.... [13:41:44] isaranto: got it :) [13:54:27] isaranto: o/ do you remember if we use the `log_slow_function` in any of our models? I thought we used it in the revscoring model, but I don’t find it there [13:54:56] I don't recall at the moment, but I'll take a look and let you know [13:55:20] okay, thanks :) [13:55:31] 06Machine-Learning-Team, 05Goal: Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that is fast in production. - https://phabricator.wikimedia.org/T371395#10134097 (10isarantopoulos) - Continuing work on [[ https://phabricator.wikimedia.org/T370149 | vllm ]] as an inference... [13:57:09] 06Machine-Learning-Team, 05Goal: Goal 4: Support product teams in deploying production models. - https://phabricator.wikimedia.org/T371398#10134113 (10isarantopoulos) - Articlequality language agnostic model is up and running and ready for use by WME [[ https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Ge... [14:30:09] 06Machine-Learning-Team, 05Goal: Goal 2: People outside the ML team can ssh into an ml-lab machine, run a Jupyter Notebook, and run PyTorch powered by a GPU. - https://phabricator.wikimedia.org/T371396#10134279 (10klausman) Update: Deciding version of ROCm (and thus, Tensorflow or Pytorch), then packaging the... [14:49:41] 10Lift-Wing, 06Machine-Learning-Team: Huggingface server run by kserve does not export any query metrics - https://phabricator.wikimedia.org/T371491#10134403 (10klausman) [14:50:21] 10Lift-Wing, 06Machine-Learning-Team: Log and export preprocess size in inference services as a prometheus metric - https://phabricator.wikimedia.org/T374034#10134404 (10isarantopoulos) [15:10:27] (03CR) 10Hashar: "recheck after having deployed https://gerrit.wikimedia.org/r/c/integration/config/+/1071825" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1071824 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [15:50:28] (03CR) 10AikoChou: [C:03+2] reference-quality: add CI pipelines to config.yaml [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1071824 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [15:59:52] (03Merged) 10jenkins-bot: reference-quality: add CI pipelines to config.yaml [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1071824 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [16:40:10] 06Machine-Learning-Team: Migrate the ownership of ML-Owned Docker images in production-images repo to mailing lists - https://phabricator.wikimedia.org/T374233#10134935 (10isarantopoulos) 05Open→03Resolved [16:40:16] 06Machine-Learning-Team: [LLM] Run LLMs locally in ml-testing - https://phabricator.wikimedia.org/T370656#10134937 (10isarantopoulos) 05Open→03Resolved [16:40:17] going afk folks, have a nice rest of day/evening! [16:40:40] 06Machine-Learning-Team, 06Content-Transform-Team, 06Research, 13Patch-For-Review: Add Article Quality Model to LiftWing - https://phabricator.wikimedia.org/T360455#10134942 (10isarantopoulos) 05Open→03Resolved [16:40:49] 10Lift-Wing, 06Machine-Learning-Team: [LLM] add locust entry for huggingfaceserver - https://phabricator.wikimedia.org/T370992#10134946 (10isarantopoulos) 05Open→03Resolved [16:40:55] 06Machine-Learning-Team: [LLM] Allow additional cmd arguments in hf image - https://phabricator.wikimedia.org/T370670#10134959 (10isarantopoulos) 05Open→03Resolved [16:43:57] 06Machine-Learning-Team, 06Data-Platform-SRE, 06Infrastructure-Foundations, 06serviceops: Migrate the ownership of Docker images in production-images repo to mailing lists - https://phabricator.wikimedia.org/T373526#10134961 (10akosiaris) >>! In T373526#10102566, @elukey wrote: >>>! In T373526#10100630, @a... [16:55:08] 06Machine-Learning-Team, 06Content-Transform-Team, 06Research, 13Patch-For-Review: Add Article Quality Model to LiftWing - https://phabricator.wikimedia.org/T360455#10135015 (10FNavas-foundation) just to note that Enterprise is planning to integrate by November [20:01:19] (03PS1) 10Umherirrender: tests: Use multi-row insert [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1071949 [20:21:37] (03CR) 10Dreamy Jazz: [C:03+2] tests: Use multi-row insert [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1071949 (owner: 10Umherirrender) [20:46:59] (03Merged) 10jenkins-bot: tests: Use multi-row insert [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1071949 (owner: 10Umherirrender)