[02:50:09] (03CR) 10Scardenasmolinar: [C:03+1] Fix RecentChanges straight join [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1196539 (owner: 10Tim Starling) [06:11:51] good morning [07:26:12] good morning [07:38:33] 06Machine-Learning-Team, 07Essential-Work: Merge tone-check pipeline DAGs into a single DAG for simplified orchestration - https://phabricator.wikimedia.org/T407212#11280133 (10kevinbazira) Following T407212#11275074, I ran the tone-check training job locally with model-ready training data to determine memory... [08:18:15] (03PS1) 10Ozge: feat: upgrades article quality buildkit 1.x [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1196622 [08:26:02] (03PS2) 10Ozge: feat: upgrades article quality buildkit 1.x [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1196622 (https://phabricator.wikimedia.org/T400446) [08:29:03] (03CR) 10Ozge: [C:03+2] feat: upgrades article quality buildkit 1.x [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1196622 (https://phabricator.wikimedia.org/T400446) (owner: 10Ozge) [08:45:56] Hello, [08:45:56] I have a small patch to upgrade the blubber version. [08:45:56] https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1196622 [08:45:56] Interesting that no additional changes are required although this is a major version upgrade. [08:45:56] I see it was already using some version 1.x annotations e.g. use-system-site-packages [08:45:56] https://doc.wikimedia.org/releng/blubber/examples/06-python-builder.html [08:45:57] I've tested it with local docker but I'd like to deploy it to staging. [08:45:58] Previously, adding PipelineBot was creating a docker image that we can deploy to staging. [08:45:58] Do you know if this is still possible without merging the patch? [08:45:59] @georgekyz [09:39:58] (03CR) 10Gkyziridis: [C:03+1] "Thank you for working on that one!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1196622 (https://phabricator.wikimedia.org/T400446) (owner: 10Ozge) [09:40:20] ozge_: Hey mate, I +1 the patch go ahead and merge it [09:40:41] the post-merge process will generate the new image and you can deploy it on staging then [09:54:54] (03CR) 10Ladsgroup: [C:03+2] Fix RecentChanges straight join [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1196539 (owner: 10Tim Starling) [10:13:50] 06Machine-Learning-Team: Export retrained Tone-check model to an S3 bucket - https://phabricator.wikimedia.org/T406217#11280641 (10gkyziridis) [10:32:47] 🙌 [10:33:06] (03CR) 10Ozge: [C:03+2] feat: upgrades article quality buildkit 1.x [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1196622 (https://phabricator.wikimedia.org/T400446) (owner: 10Ozge) [10:38:41] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Update blubber version in inference services images - https://phabricator.wikimedia.org/T400446#11280701 (10OKarakaya-WMF) [10:40:07] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Update blubber version in inference services images - https://phabricator.wikimedia.org/T400446#11280703 (10OKarakaya-WMF) [10:45:50] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1196650 cool, I've created another patch for staging deployment. Let's see how it will go and then we can proceed with the prod release @georgekyz [10:55:43] (03CR) 10CI reject: [V:04-1] Fix RecentChanges straight join [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1196539 (owner: 10Tim Starling) [11:00:15] (03CR) 10Ladsgroup: [C:03+2] "again" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1196539 (owner: 10Tim Starling) [11:17:16] ozge_: +1, you can merge it and deploy it. Ping me for any support [11:58:28] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Update blubber version in inference services images - https://phabricator.wikimedia.org/T400446#11280923 (10OKarakaya-WMF) articlequality deployed to staging successfully: ` ozge@deploy2002:/srv/deployment-charts/helmfile.d/ml-services/experime... [11:59:22] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Update blubber version in inference services images - https://phabricator.wikimedia.org/T400446#11280925 (10OKarakaya-WMF) [11:59:33] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Update blubber version in inference services images - https://phabricator.wikimedia.org/T400446#11280927 (10OKarakaya-WMF) [12:00:34] deployed to staging and tested. looks all good 🎉 [12:01:06] (03Merged) 10jenkins-bot: Fix RecentChanges straight join [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1196539 (owner: 10Tim Starling) [12:26:24] ozge_: Nicely Done! [12:52:21] 06Machine-Learning-Team, 10Research-engineering, 06Research (FY2025-26-Research-October-December): Share code between Research & ML teams - https://phabricator.wikimedia.org/T398974#11281073 (10Miriam) [12:59:04] 06Machine-Learning-Team, 10Research-engineering, 06Research (FY2025-26-Research-October-December): Share code between Research & ML teams - https://phabricator.wikimedia.org/T398974#11281080 (10Miriam) [13:06:36] kevinbazira: I approved the MR for tone-check lets merge it and test it. [13:07:23] kevinbazira: Whenever you find some time please leave your thoughts on https://phabricator.wikimedia.org/T406217 (if you have any) [13:08:54] thanks, George. I've merged the MR and going to run it in prod! 🤞 [13:09:53] ack... I'll have a look at the task [13:14:32] 🤞 [14:55:06] 06Machine-Learning-Team, 10Research-engineering, 06Research (FY2025-26-Research-October-December): Share code between Research & ML teams - https://phabricator.wikimedia.org/T398974#11281584 (10fkaelin) [14:59:56] 06Machine-Learning-Team: Experiment with amd-smi and the new AMD GPUs MI300x - https://phabricator.wikimedia.org/T403697#11281610 (10elukey) After a chat with the AMD folks, it seems that amd-smi supports also the DPX partitioning for compute: ` elukey@ml-serve1012:/opt/rocm$ sudo /opt/rocm/bin/amd-smi set -C D... [16:06:48] 06Machine-Learning-Team: Experiment with amd-smi and the new AMD GPUs MI300x - https://phabricator.wikimedia.org/T403697#11281822 (10elukey) Really interesting: ` NAME STATUS ROLES AGE VERSION LABELS ml-serve1012.eqiad.wmnet Ready,SchedulingDisabled ... [16:08:02] so we can split the MI300X into two afaics! [16:08:57] I need to figure out how the compute partitions will reflect to the memory ones in this config, but so far it seems really nice [17:01:51] whoaaa that's great!! \o/ [17:57:48] 06Machine-Learning-Team, 05Goal, 13Patch-For-Review: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11282404 (10KStoller-WMF) [18:03:32] 06Machine-Learning-Team, 10Add-Link-Structured-Task, 06Growth-Team: Introduce case sensitivity to machine learning model for Add a Link - https://phabricator.wikimedia.org/T405185#11282422 (10KStoller-WMF) Thanks, @OKarakaya-WMF! >My short term suggestion is to make anchors case-sensitive and train/evaluate...