[00:23:48] 10Machine-Learning-Team, 10ORES: Add deprecation warnings to ORES-related repositories on Github - https://phabricator.wikimedia.org/T349632 (10Aklapper) I've crossed many Wikimedia wiki pages last updated 10-15 years ago telling what "is being" done at that time so I'd appreciate an "As of late 2023, ..." [07:30:40] Good morning folks! [08:08:15] morning! :) [08:12:27] morning! [08:32:52] * elukey bbiab [08:56:47] kevinbazira: o/ is it ok if I deploy article-desc? [08:57:22] isaranto: yes, please go ahead. [08:57:35] hope it works fine :) [08:57:50] 🤞 :) [09:02:09] deployed and works fine! [09:02:33] kevinbazira: it was 14s for a single request before right? [09:04:22] isranto: yes, that's the range for 2 beams, it increases with more beams. we'll have to optimize it. [09:40:11] (03CR) 10AikoChou: [C: 03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979315 (https://phabricator.wikimedia.org/T352181) (owner: 10AikoChou) [09:50:34] (03Merged) 10jenkins-bot: revert-risk: enable local run [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/979315 (https://phabricator.wikimedia.org/T352181) (owner: 10AikoChou) [10:03:47] Morning! [10:19:42] elukey: do you know at what frequency the pages at https://docker-registry.wikimedia.org are refreshed? E.g. Aiko's updated RR image (https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/980345) isn't visible there yet, and I wonder if it is useful to wait. [10:20:23] klausman: I think there is a systemd timer every ~30 mins [10:20:29] ah, ack, [10:21:17] I can always just try and docker pull it, which is also arguable a more thoroguh test of availability [10:28:16] as long as it is posted by pipelinebot in the merged patch (https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/979315), it should be available, right? [10:30:09] It should yes, but I prefer to be thorough :) [10:34:47] ack, thanks for +1 :) [10:36:13] 10Machine-Learning-Team: Investigate prediction bug in article-descriptions model-server - https://phabricator.wikimedia.org/T352750 (10kevinbazira) [11:06:30] * klausman lunch [11:10:47] 10Machine-Learning-Team: Investigate prediction bug in article-descriptions model-server - https://phabricator.wikimedia.org/T352750 (10kevinbazira) a:03kevinbazira I have narrowed the bug down to the predict method. Below is the preprocessed data for first_paragraphs, descriptions, lang, and num_beams that I... [11:22:12] * isaranto lunch [11:26:00] * aiko lunch too [11:49:53] isaranto: wow I watched https://x.com/AndrewYNg/status/1731717066376536147 about LVM, crazy [11:51:17] I can imagine an LVM trained on censorship, trained for specific segments of a population (and fed from CCTV cameras), scary [12:04:55] 10Machine-Learning-Team, 10observability: Gap in metrics rendered from Thanos Rules - https://phabricator.wikimedia.org/T352756 (10elukey) [12:09:14] you went right to the scary scenario [12:09:34] ahahahha yes [12:09:37] I am an optimist [12:09:59] (I cheer for Anthropic in the current battle) [12:13:22] Morning all! [12:13:40] +1 [12:14:47] * elukey lunch! [12:31:04] isaranto: o/ I deployed the change of local-run to staging. revertrisk-ml has the issue of not finding python.preprocess_utils, while revertrisk-la works fine [12:31:15] isaranto: the only difference is the blubber version used. revertrisk-ml uses the old version 0.15 and revertrisk-la uses 0.21 [12:32:43] I'm not aware what other changes have been done in blubber that would justify this [12:33:09] aiko: try to add the directory to the PYTHONPATH as we did in the other images [12:36:13] while at the same time update blubber version [12:36:20] isaranto: yep I'll file a patch to add the pythonpath [12:38:39] let me know if you need any help. [12:39:08] or if you want me to test it locally or anything else [12:40:00] aaaaand I may have caught Covid on the weekend. [12:40:30] Currently trying to find a pharmacy that sells selftest kits [12:40:38] (and has them _in stock_) [12:50:49] oh, hope you don't have it 🤞 [13:01:24] FYI: gitlab is adding a model registry https://www.youtube.com/watch?v=Jr7Qi2tqo0s&ab_channel=GitLabUnfiltered [13:01:46] it is still too early and don't know what will be available in the self hosted version of gitlab [13:08:19] Ooof sorry Klausman [13:08:45] A model registry? Hmmmm [13:10:49] 10Machine-Learning-Team: Enable local runs for article-descriptions model - https://phabricator.wikimedia.org/T351940 (10isarantopoulos) The model can now be ran locally following the instructions in the README.md file. I added a couple of unit tests that assess that the correct url is created for the REST API r... [13:27:11] elukey: --^ while trying to run the tests I ran into similar issues as in the past with tox and virtualenv versions. By removing virtualenv things seem to work like here https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/langid/entrypoint.sh#4 [13:27:31] but it seems like a hack. I suggested we revisit CI upon moving to gitlab. edyt? [13:27:35] *wdyt? [13:46:09] makes sense yes! [13:46:49] I suspect there is a clash between virtual-env on debian vs the one that tox wants/needs [14:08:27] (03PS1) 10AikoChou: revert-risk: add top level dir to PYTHONPATH [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980394 (https://phabricator.wikimedia.org/T352181) [14:17:40] 10Machine-Learning-Team, 10serviceops: Rename the envoy's uses_ingress option to sets_sni - https://phabricator.wikimedia.org/T346638 (10elukey) a:05elukey→03None [14:19:37] folks about https://phabricator.wikimedia.org/T342765 - we should probably review what's still pending and give the green light [14:19:57] the only thing that I am wondering is if we'll need any model binary currently stored on Git LFS [14:20:17] but I'd say no, since we have everything mirrored to https://analytics.wikimedia.org/published/wmf-ml-models/ [14:20:21] thoughts? [14:21:21] 10Machine-Learning-Team, 10serviceops: Bump istio and Cert Manager Docker images to Bullseye - https://phabricator.wikimedia.org/T351933 (10elukey) Cert Manager deployed in staging envs, the plan is to leave it running for 2/3 days to see new certs issued. Once done, we can rollout to prod and close. [14:22:36] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Growth-Team, 10Wikipedia-Android-App-Backlog, 10Patch-For-Review: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298 (10Samwalton9-WMF) [14:23:29] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Growth-Team, 10Wikipedia-Android-App-Backlog, 10Patch-For-Review: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298 (10Samwalton9-WMF) [14:26:10] elukey: don't know if I'm missing anything but I'd give the green light [14:28:52] isaranto: maybe we could add the deprecation banner and archive first all the repos [14:32:13] makes sense [14:36:00] 10Machine-Learning-Team: Enable local runs for article-descriptions model - https://phabricator.wikimedia.org/T351940 (10isarantopoulos) 05Open→03Resolved [14:37:58] 10Machine-Learning-Team: Enable local runs for article-descriptions model - https://phabricator.wikimedia.org/T351940 (10isarantopoulos) [14:43:59] 10Machine-Learning-Team, 10Patch-For-Review: Deploy ctranslate2 version of nllb-200 - https://phabricator.wikimedia.org/T351740 (10isarantopoulos) Using ctranslate2 with 8bit quantization I was able to create a model.bin of ~600MB size (from ~2.5GB of the original). I can't tell any difference in the quality... [14:44:00] (03PS6) 10AikoChou: revert-risk: add batch_model.py and USE_BATCHER env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/977135 (https://phabricator.wikimedia.org/T348536) [14:45:59] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "Adding the __init__.py in python dir will affect many model servers so let's keep an eye when we next deploy them. Just running the httpbb" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980394 (https://phabricator.wikimedia.org/T352181) (owner: 10AikoChou) [15:20:25] 10Machine-Learning-Team, 10observability: Gap in metrics rendered from Thanos Rules - https://phabricator.wikimedia.org/T352756 (10calbon) a:03elukey [15:22:27] 10Machine-Learning-Team: Investigate prediction bug in article-descriptions model-server - https://phabricator.wikimedia.org/T352750 (10calbon) [15:28:11] 10Machine-Learning-Team: Add a script for running the model server locally - https://phabricator.wikimedia.org/T352689 (10calbon) a:03AikoChou [15:28:35] 10Machine-Learning-Team: Add a script for running the Revert Risk model server locally - https://phabricator.wikimedia.org/T352689 (10calbon) [15:31:03] 10Machine-Learning-Team, 10ORES: Review traffic on ores.wikimedia.org - https://phabricator.wikimedia.org/T352527 (10calbon) a:03isarantopoulos [15:33:17] 10Machine-Learning-Team: Add a script for running the Revert Risk model server locally - https://phabricator.wikimedia.org/T352689 (10achou) a:05AikoChou→03achou [15:34:18] 10Machine-Learning-Team: Fix the link recommendation training pipeline - https://phabricator.wikimedia.org/T352525 (10calbon) 05Open→03Resolved [15:36:23] 10Machine-Learning-Team: Fix istio gateway's PodDisruptionBudgets for ml-serve - https://phabricator.wikimedia.org/T352400 (10calbon) a:03elukey [15:40:34] 10Machine-Learning-Team: Rethink aiohttp's session reuse in the isvc code - https://phabricator.wikimedia.org/T352290 (10calbon) TL;DR review the current code and investigate if this simplification is useful [15:43:14] 10Machine-Learning-Team: Rethink aiohttp's session reuse in the isvc code - https://phabricator.wikimedia.org/T352290 (10calbon) a:03elukey [15:44:51] 10Machine-Learning-Team: Rethink aiohttp's session reuse in the isvc code - https://phabricator.wikimedia.org/T352290 (10calbon) a:05elukey→03AikoChou [15:45:25] 10Machine-Learning-Team: Discuss potential migration - https://phabricator.wikimedia.org/T344010 (10calbon) 05Open→03Resolved [15:45:32] 10Machine-Learning-Team, 10Wikipedia-Android-App-Backlog (Android Release - FY2023-24): Migrate Machine-generated Article Descriptions from toolforge to liftwing. - https://phabricator.wikimedia.org/T343123 (10calbon) [15:53:32] 10Machine-Learning-Team, 10Patch-For-Review: Outlink returns 500 when EventGate returns 503 Service Unavailable - https://phabricator.wikimedia.org/T346136 (10achou) p:05Low→03Medium [16:03:21] 10Machine-Learning-Team, 10Goal: Goal: Increase the number of models hosted on Lift Wing - https://phabricator.wikimedia.org/T348156 (10kevinbazira) Working on migrating the machine-generated article descriptions model from toolforge to LiftWing: - fixed the wikipedia api summary endpoint. now the model-server... [16:18:29] (03PS1) 10Ilias Sarantopoulos: llm: refactor directory structure to treat as python module. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980429 [16:22:24] aiko: let me know if you want me to review the batcher patch. I can do it first thing in the morning [16:23:46] I added a patch to refactor a bit the llm/ directory cause in my new patch I'm adding more stuff for nllb [16:23:58] going afk folks, cu tomorrow! [16:32:38] isaranto: I'm thinking to add another change to the batcher patch, I'll let you know when it's ready to be reviewed tomorrow! ty [16:33:49] isaranto: have a nice one! [16:45:29] Cool cool! Just wanted to make sure you weren't waiting for me [16:59:14] 10Machine-Learning-Team: Rethink aiohttp's session reuse in the isvc code - https://phabricator.wikimedia.org/T352290 (10achou) a:05AikoChou→03achou [17:23:37] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Growth-Team, 10Wikipedia-Android-App-Backlog: CRS: Community rollout plan and discussion about adding revertrisk to RecentChanges filters - https://phabricator.wikimedia.org/T352217 (10JTannerWMF) [17:30:12] have a nice rest of the day folks! [17:30:36] bye luca! [17:38:07] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Sprint 3 (Growth Team)), 10User-notice: Deploy "add a link" to 15th round of wikis - https://phabricator.wikimedia.org/T308141 (10Etonkovidova) 05In progress→03Resolved [17:43:35] 10Machine-Learning-Team: Discuss potential migration from toolforge to liftwing - https://phabricator.wikimedia.org/T344010 (10Aklapper) [17:47:02] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Sprint 3 (Growth Team)), 10Serbian-Sites, and 3 others: Deploy "add a link" to 16th round of wikis - https://phabricator.wikimedia.org/T308142 (10Etonkovidova) 05Open→03Resolved [18:06:11] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Sprint 3 (Growth Team)), 10Turkish-Sites, 10User-notice: Deploy "add a link" to 17th round of wikis - https://phabricator.wikimedia.org/T308143 (10Etonkovidova) 05Open→03Resolved [19:20:35] 10Machine-Learning-Team, 10Wikipedia-Android-App-Backlog (Android Release - FY2023-24): Migrate Machine-generated Article Descriptions from toolforge to liftwing. - https://phabricator.wikimedia.org/T343123 (10Dbrant) > We're returning a fairly verbose response right now because it was useful for debugging etc... [20:40:01] (03CR) 10Kosta Harlan: add revertrisk model to the list of models (0313 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos) [20:43:58] (03PS20) 10Kosta Harlan: add revertrisk model to the list of models [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos) [20:51:56] (03CR) 10Kosta Harlan: "When I run the maintenance script to populate the DB with revert risk scores (`php maintenance/run.php ./extensions/ORES/maintenance/Popul" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos) [20:54:14] (03CR) 10Kosta Harlan: add revertrisk model to the list of models (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos) [21:02:16] (03PS1) 10Kosta Harlan: LiftWingService: Simplify revertRiskLiftWingRequest invocation [extensions/ORES] - 10https://gerrit.wikimedia.org/r/980489 [21:09:06] (03PS2) 10Kosta Harlan: LiftWingService: Simplify revertRiskLiftWingRequest invocation [extensions/ORES] - 10https://gerrit.wikimedia.org/r/980489 [21:15:01] (03PS1) 10Kosta Harlan: LiftWingService: Extract API endpoint as a config value [extensions/ORES] - 10https://gerrit.wikimedia.org/r/980491 [21:15:37] (03CR) 10Kosta Harlan: add revertrisk model to the list of models (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos) [21:16:53] (03CR) 10CI reject: [V: 04-1] LiftWingService: Extract API endpoint as a config value [extensions/ORES] - 10https://gerrit.wikimedia.org/r/980491 (owner: 10Kosta Harlan) [21:30:28] (03PS21) 10Kosta Harlan: Add revertrisk model to the list of models [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos) [21:30:36] (03CR) 10Kosta Harlan: [C: 03+2] Add revertrisk model to the list of models (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos) [21:34:48] (03Merged) 10jenkins-bot: Add revertrisk model to the list of models [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos)