[06:54:08] Guten tag o/ [08:27:56] (03PS11) 10Ilias Sarantopoulos: articlequality: update to ordinal regression from statsmodels [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1055177 (https://phabricator.wikimedia.org/T360455) [08:28:23] open for reviews! [08:33:09] Bonum mane! :) [08:34:59] \o [09:02:30] now that we're adding the articlequality model I was wondering if we should create a new namespace for it that we can later add more models [09:02:58] for example langagnostic-something-something [09:04:54] I don't think we have a real issue at the moment, but I'm wondering how are we going to keep track of namespace resource limits in the future [09:48:33] 06Machine-Learning-Team: Deploy Modernized Recommendation API to LiftWing - https://phabricator.wikimedia.org/T371465 (10kevinbazira) 03NEW [10:00:26] isaranto: yeah, resource limits are a bit all-over-the-place atm, especially considering the NS-level limits. [10:01:08] isaranto: o/ I am going to have a look at the articlequality patch [10:01:18] klausman: o/ I have noticed a number of undeployed changes in the rec-api staging and prod. these are related to: NetworkPolicy, Secrets, Certs, etc. [10:01:26] for staging, I checked this by running `helmfile -e ml-staging-codfw diff` in `/srv/deployment-charts/helmfile.d/ml-services/recommendation-api-ng` [10:01:34] will these undeployed changes affect https://gerrit.wikimedia.org/r/1058574 ? [10:01:55] ok! kevinbazira: let me know if anything needs clarification [10:02:13] sure sure [10:02:16] I'll also add a load test for locust [10:02:24] thanks! [10:02:57] (03PS12) 10Ilias Sarantopoulos: articlequality: update to ordinal regression from statsmodels [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1055177 (https://phabricator.wikimedia.org/T360455) [10:03:40] kevinbazira: thoush _should_ be fine. lmk if something breaks [10:03:52] okok [11:05:10] kevinbazira: I can push the pending changes now, to make sure they work, and then we do the updates from 1058574 later? [11:06:22] klausman: yep. that would be great! [11:06:33] alright, will do that in a minute [11:08:13] ok pushed. feel free to test if the updated chart starts everything correctly [11:14:36] thanks! checking now ... [11:18:44] klausman: the rec-api pod on staging works as expected when I run: `curl https://recommendation-api-ng.k8s-ml-staging.discovery.wmnet:31443/api/spec` [11:18:55] excellent [11:20:08] I'll push it in prod-codfw, unless there are any objections [11:29:43] * isaranto afk lunch! [11:48:29] ack! no bjections [11:58:22] ok, done [12:04:36] (03CR) 10Kevin Bazira: [C:03+1] "I have run this patch locally and the model-server works as shown in: https://phabricator.wikimedia.org/P67157" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1055177 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos) [12:12:53] the rec-api pod in prod works as expected when I run: `curl "https://recommendation-api-ng.discovery.wmnet:31443/api/spec"` [12:12:53] looks like the deployment has taken effect on codfw but not eqiad: `helmfile -e ml-serve-eqiad diff` [12:20:05] isaranto: o/ I'll review the articlequality patch in a moment [12:21:11] o/ Aiko,thanks [12:21:32] kevinbazira: I missed the local makefile,let me see if I can fix it [12:21:38] yes, hadn't pushed eqid yet [12:21:44] I'm going to upload the model on swift as well [12:27:55] eqiad is now also pushed [12:28:56] hi folks! I noticed the new rec-api, really nice :) Was there any effort to support the other rec-api's API ? I mean the one accessible from restbase - https://en.wikipedia.org/api/rest_v1/#/Recommendation [12:29:11] if so we could point the code to Lift Wing, and deprecate the nodejs-rec-api [12:38:51] 06Machine-Learning-Team: Huggingface server run by kserve does not export any query metrics - https://phabricator.wikimedia.org/T371491 (10klausman) 03NEW [12:46:13] elukey: I have no idea, but good point. we'll follow up to check [12:46:41] <3 [12:46:52] it was in some task, I can find it if needed [12:52:25] elukey: o/ the Langauge team rebuilt the new rec-api based on their needs in: https://phabricator.wikimedia.org/T369484 [12:52:25] based on the fields supported in the new rec-api: https://recommend.wmcloud.org/docs they will have to update endpoints accessible from restbase - https://en.wikipedia.org/api/rest_v1/#/Recommendation [12:54:24] kevinbazira: thanks! Do they have a timeline, and/or tracking it somewhere? [12:57:42] no timelines were shared with us. the new changes are being tracked in T369484. [12:59:39] thanks! [12:59:52] isaranto: are the httpb tests for ores-leagcy up to date? I am getting errors: Body: expected to contain 'probability', got '{\n "zhwiki": {\n "models": {\n "damaging": '... (270 characters total). [13:00:12] Unfortunately, httpbb can't be told to log the full response [13:00:29] w8 let me check [13:01:01] I'm going to make a request with the specific rev_ids. what may happen occasionally is that if a revision is deleted then the tests wont work anymore [13:01:47] ack. Note that I pushed the http-webapp chart update to staging that was also pending for rec-api. But I fell like that isn't the issue [13:02:43] in prod I can get the results https://ores.wikimedia.org/v3/scores/zhwiki/851/damaging [13:03:55] BUT you're right. zhwiki damaging isn't deployed in staging so ores-legacy running from ml-staging would fail(and connecting to liftwing staging) [13:04:11] https://ores-legacy.k8s-ml-staging.discovery.wmnet:31443/v3/scores/viwiki/169/articletopic gives me an error/not found while https://ores-legacy.discovery.wmnet:31443/v3/scores/viwiki/169/articletopic works fine [13:04:24] we only have enwiki and wikidata in revscoring-damaging in staging [13:04:42] aha! then the staging test file could be trimmed, I guess [13:05:03] I can take a look and open a patch to remove the relevant entries from httpbb in staging [13:06:04] klausman: I'll have it ready in a bit! [13:06:17] thankyou! [13:18:21] (03CR) 10AikoChou: [C:03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1055177 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos) [13:19:29] wow I had to delete many entries [13:19:43] tbh I had forgotten about the ores-legacy tests when we removed the liftwing ones [13:20:03] 10Lift-Wing, 06Machine-Learning-Team: Request to update Readability model on Lift Wing - https://phabricator.wikimedia.org/T369712#10031829 (10achou) a:05AikoChou→03achou [13:48:58] klausman: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1058602 [13:49:07] I removed all the entries and I tested it [13:49:50] however i'm still getting an error for wp10 which actually redirects to articlequality. it works in prod but for some reason not in staging [13:50:19] also I think we need to add in the docs the command to run ores-legacy tests - I couldnt' find it anywhere [13:50:38] which would be sth like this `httpbb --host ores-legacy.k8s-ml-staging.discovery.wmnet --https_port 31443 test_ores_staging.yaml` [13:50:48] I'll add it later unless I'm mistaken and it exists somewhere [13:51:27] ack: re command (both the command and that it should be documented somewhere [13:53:24] isaranto: good to merge? [13:54:02] wp10 still fails but I would merge it as it is an improvement over the previous status [13:54:07] ack! [13:54:17] Dankeee [14:20:39] 06Machine-Learning-Team, 10MW-1.43-notes (1.43.0-wmf.16; 2024-07-30), 10Structured-Data-Backlog (Current Work): [SPIKE] Send an image thumbnail to the logo detection service - https://phabricator.wikimedia.org/T364551#10032095 (10AUgolnikova-WMF) [14:20:53] 06Machine-Learning-Team, 10MW-1.43-notes (1.43.0-wmf.16; 2024-07-30), 10Structured-Data-Backlog (Current Work): [SPIKE] Send an image thumbnail to the logo detection service - https://phabricator.wikimedia.org/T364551#10032097 (10AUgolnikova-WMF) [15:14:29] 06Machine-Learning-Team, 05Goal: Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that uses an inference optimization engine in production. - https://phabricator.wikimedia.org/T371395#10032310 (10calbon) [15:14:53] 06Machine-Learning-Team, 05Goal: Goal 2: People outside the ML team can ssh into an ml-lab machine, run a Jupyter Notebook, and run PyTorch powered by a GPU. - https://phabricator.wikimedia.org/T371396#10032313 (10calbon) [15:15:01] 06Machine-Learning-Team, 05Goal: Goal 3: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services. - https://phabricator.wikimedia.org/T371397#10032314 (10calbon) [15:15:11] 06Machine-Learning-Team, 05Goal: Goal 4: Support product teams in deploying production models. - https://phabricator.wikimedia.org/T371398#10032315 (10calbon) [15:25:03] I have so many tabs open again!! [15:27:12] lol [15:34:06] it isn't funny :P [15:37:29] I added the link to the "request to host a model" form in wikitech https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Hosting_a_model [15:45:34] nice, I updated the link for updating models. it was a wrong link before [15:45:41] thanks! [16:33:27] 10Lift-Wing, 06Machine-Learning-Team: Request to update Readability model on Lift Wing - https://phabricator.wikimedia.org/T369712#10032804 (10achou) @Trokhymovych I'm starting to work on this. Is the prediction time similar to the previous model? Or it takes more/less time? Just wanted to get some numbers on... [16:42:15] (03CR) 10Isaac Johnson: articlequality: update to ordinal regression from statsmodels (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1055177 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos) [16:44:45] 06Machine-Learning-Team, 06Content-Transform-Team, 06Research, 13Patch-For-Review: Add Article Quality Model to LiftWing - https://phabricator.wikimedia.org/T360455#10032872 (10Isaac) > We're going to solve the numpy issue by relaxing the kserve restriction by using our wmf kserve fork. @isarantopoulos tha... [17:59:09] (03CR) 10Ilias Sarantopoulos: articlequality: update to ordinal regression from statsmodels (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1055177 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos) [17:59:44] * isaranto afk! [18:26:13] (03CR) 10Isaac Johnson: articlequality: update to ordinal regression from statsmodels (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1055177 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos)