[07:24:32] 06Machine-Learning-Team: Issues with Reference Need and Reference Risk models - https://phabricator.wikimedia.org/T384172 (10achou) 03NEW [07:27:47] Howdy! [07:29:47] Hola Ilias o/ [07:58:41] good morning folks [08:23:53] How's it going? Does anybody need help/support or a review with anything? [09:00:24] isaranto: I am doing ok, I will work on deployment with Kevin today, thnx. How about you? [09:01:04] ack! [09:08:37] 06Machine-Learning-Team: Issues with Reference Need and Reference Risk models - https://phabricator.wikimedia.org/T384172#10475328 (10achou) Looking at the [[ https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob/main/knowledge_integrity/models/reference_risk/model.py?ref_type=heads#L121-125 | s... [09:11:30] I'm doing some admin work and plan to work on the AWQ and GPTQ aya models to see which one would be easier to deploy (without building from source etc) https://phabricator.wikimedia.org/T382343#10419920 [09:19:03] aiko: do you need any help with ref quality models? [09:25:05] isaranto: no thanks for asking. I can handle it :) [09:25:42] awesome [09:36:52] (03PS1) 10AikoChou: reference-quality: catch Error objects returned from knowledge_integrity [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1112698 (https://phabricator.wikimedia.org/T384172) [09:46:12] 06Machine-Learning-Team, 13Patch-For-Review: Issues with Reference Need and Reference Risk models - https://phabricator.wikimedia.org/T384172#10475450 (10MunizaA) >>! In T384172#10475328, @achou wrote: > Looking at the [[ https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob/main/knowledge_int... [10:28:04] (03CR) 10Ilias Sarantopoulos: "The 422 code seems proper for this response, but in similar past discussions we have refrained from using too specific response codes as w" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1112698 (https://phabricator.wikimedia.org/T384172) (owner: 10AikoChou) [11:30:40] * isaranto afk lunch [13:59:35] o/ georgekyz and I are deploying articletopic-outlink ... [14:00:16] staging looks ... [14:00:16] ``` [14:00:16] $ kubectl get pods [14:00:16] NAME READY STATUS RESTARTS AGE [14:00:16] outlink-topic-model-predictor-default-00025-deployment-dfc52qgf 3/3 Running 0 2m7s [14:00:16] outlink-topic-model-transformer-default-00024-deployment-6d5slx 3/3 Running 0 2m8s [14:00:16] ``` [14:04:35] nice! [14:07:33] klausman: o/ regarding prod access for georgekyz and https://gerrit.wikimedia.org/r/c/operations/puppet/+/1109414. Do we need reviews from Tyler and Moritz? If yes I can ping them [14:08:08] Moritz maybe not, but Tyler is the sign-off for the analytics stuff (statboxes) [14:08:19] I'll ping Tyler [14:08:50] articletopic-outlink is up and running on both eqiad and codfw [14:08:57] thank you! [14:11:43] kevinbazira: georgekyz nice! did you get the chance to run load testing with wrk? iirc the outilink topic model isnt setup with locust yet. If not I can go through that with George [14:16:13] isaranto: sure sure, we shall run load tests. a quick check shows that the inference speed remains fast: [14:16:13] ``` [14:16:13] $ time curl https://api.wikimedia.org/service/lw/inference/v1/models/outlink-topic-model:predict -X POST -d '{"page_title": "Douglas_Adams", "lang": "en"}' -H "Content-type: application/json" [14:16:13] {"prediction":{"article":"https://en.wikipedia.org/wiki/Douglas_Adams","results":[{"topic":"Culture.Media.Media*","score":0.6926519870758057},{"topic":"Culture.Biography.Biography*","score":0.5544804334640503},{"topic":"Culture.Literature","score":0.5312193632125854}]}} [14:16:13] real 0m0.271s [14:16:13] user 0m0.011s [14:16:13] sys 0m0.010s [14:16:14] ``` [14:18:55] ack! I asked cause I'm just trying to keep track of George's onboarding to make sure that we don't leave out any steps. thanks for taking care of everything! [14:25:41] np! :) [14:36:47] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Update articletopic outlink to kserve 0.14.1 - https://phabricator.wikimedia.org/T383312#10476438 (10gkyziridis) The latest changes on articletopic outlink updating the `kserve 0.14.1` package for predictor and transformer are deployed correctly on eqiad and c... [14:38:36] klausman, isaranto o/ - https://gerrit.wikimedia.org/r/c/operations/puppet/+/1109414 needs an SRE access task associated with it [14:38:57] there are steps to follow etc.. [14:39:23] https://wikitech.wikimedia.org/wiki/SRE/Production_access [14:39:53] elukey: o/ sorry for that. I'll create the task for it [14:40:09] np, in the task there are also tickboxes to follow tec.. [14:40:18] https://wikitech.wikimedia.org/wiki/SRE/Production_access seems to be the right form. The preconditions (NDA, accoutns, SSH keys) are already done [14:40:22] mostly to prevent accidental misconfig etc.. [14:41:02] yep you are at step https://wikitech.wikimedia.org/wiki/SRE/Production_access#Filing_the_request [14:41:02] Not sure about the L3 thing. It's mentioned in the form but not the wikitech page [14:42:13] https://phabricator.wikimedia.org/L3 ah, here it is [14:42:30] yes there are pretty important stuff :) [15:37:31] (03PS2) 10AikoChou: reference-quality: catch Error objects returned from knowledge_integrity [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1112698 (https://phabricator.wikimedia.org/T384172) [15:38:35] (03CR) 10Ilias Sarantopoulos: [C:03+1] reference-quality: catch Error objects returned from knowledge_integrity [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1112698 (https://phabricator.wikimedia.org/T384172) (owner: 10AikoChou) [15:44:29] (03CR) 10AikoChou: "+1 makes sense! I updated the patch. And we'll also need to change the revert risk model server code since it still uses 422 error for thi" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1112698 (https://phabricator.wikimedia.org/T384172) (owner: 10AikoChou) [15:45:19] (03CR) 10AikoChou: [C:03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1112698 (https://phabricator.wikimedia.org/T384172) (owner: 10AikoChou) [15:46:04] (03Merged) 10jenkins-bot: reference-quality: catch Error objects returned from knowledge_integrity [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1112698 (https://phabricator.wikimedia.org/T384172) (owner: 10AikoChou) [16:18:32] filed a task for access https://phabricator.wikimedia.org/T384239 [16:25:06] georgekyz: plz input the information that is missing and sign the L3 if you haven't already. Thanks! [17:08:17] * klausman afk [18:24:11] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Build and Publish ROCm-Compatible Python Packages - https://phabricator.wikimedia.org/T381859#10477526 (10isarantopoulos) In an effort to build packages locally but with docker I've made the following attempt: ✅ use official rocm image based in ub... [18:30:41] * isaranto afk!