[06:35:22] good morning :) [06:48:18] Good morning [06:52:27] morning! [06:55:37] I’m wondering if I should focus on trying to migrate our bullseye models to bookworm, or leave it for later and now re-deploy models on staging to test the current pre-commit changes. What do you think folks? [07:05:10] If there is no straightforward way to upgrade the python version now I'd just proceed to finalize the current patch with python 3.9 and then deploy both. [07:06:04] I'm fine to also try to upgrade 1-2 images to bookworm to see if that would be straightforward but it would be good to decouple it from the current deployment as we would be deploying too many changes and it would be more difficult to rollback etc [07:08:13] (03PS2) 10Bartosz Wójtowicz: ci: Bump Python target-version from 3.7 to 3.9. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1148282 (https://phabricator.wikimedia.org/T393865) [07:08:33] good morning [07:09:16] wdyt? feel free to disagree I dont have a strong opinion about this. I might be too conservative :P [07:10:27] I think I agree that it makes sense to finish current pre-commit changes and test them with staging deployment and take the bullseye->bookworm migration next to make it more decoupled [07:11:45] If we'll do uv migration in the nearish future, uv could also provide an easy way to install and use different python versions instead of relying on built-in python. Unless there are very good reasons we do this? [07:13:00] I don't think there is a specific reason. It is just because it is easy to go with the default version [07:13:41] migration to uv is blocked at the moment as it is not supported by blubber. George made an attempt to add support here https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/119 [07:20:04] (03CR) 10Nikerabbit: [C:03+1] Make SearchRecommender inherit from BaseRecommender [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1148932 (owner: 10Sbisson) [07:24:19] If adding uv support to blubber would be on our team to do and it's high-enough priority for us, I'd be happy to take over George's work and try to push it through the finish line. It seems there is already some good progress on the merge request [07:27:33] ok! let's discuss this after you go through the process of deploying, running load tests, checking logs etc. as they are more essential for your onboarding [07:29:07] sounds good! [07:31:08] the last pre-commit patch bumping Python target version to 3.9 would be ready for review: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1148282. All of the changes are essentially updating type hints. Note that CI fails due to out-of-storage, because too many jobs run simultaneously. Re-running those jobs within jenkins is successful [07:35:16] ack! [07:44:25] hey folks, as FYI I am depooling ml-serve1002 to be reimaged to bookworm+containerd [07:44:42] ack! thank you! [08:15:20] 07artificial-intelligence, 10Lift-Wing, 06Machine-Learning-Team, 10ORES, and 2 others: Developing the `algo-accountability` repository - https://phabricator.wikimedia.org/T290746#10846902 (10kevinbazira) a:05kevinbazira→03None [08:17:16] 06Machine-Learning-Team, 10ORES: Investigate tools that use ORES - https://phabricator.wikimedia.org/T330854#10846914 (10isarantopoulos) a:05isarantopoulos→03None [08:17:37] 06Machine-Learning-Team: Investigate ModelMesh architecture - https://phabricator.wikimedia.org/T330408#10846915 (10isarantopoulos) a:05isarantopoulos→03None [08:37:29] o/ klausman when you have some time could you take a look at this task and let us know what is needed to tackle it? https://phabricator.wikimedia.org/T394778 [08:41:09] 06Machine-Learning-Team: Build and push images to the docker registry from ml-lab - https://phabricator.wikimedia.org/T394778#10847045 (10klausman) For this to work, Appropriate credentials need to be on ml-lab1002 (or 1001). The future proof way to do this would be to either apply the relevant Puppet role(s) to... [08:41:18] done :) [08:44:57] bartosz: The uv support in blubber was almost finished, there was a buggy issue on how uv is building the environment and we could not use the `uv pip install -r ...` option, so the tests on blubber were failing. I am not a big fun of uv but I can find some time to check the latest updates because maybe they fixed it on their end. [08:47:31] but in general I prefer the classic pip/venv way more than adding extra abstraction layers. I think our setup is pretty clean [08:48:29] ml-serve1002 back in service [08:50:14] o/ folks whenever you have time please cast an eye over here and review it please: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1148803 [08:50:19] klausman: elukey Luca you had mentioned that we should follow up with other SREs to make sure this is done in a proper way and that other teams are aware. What does that process entail? klausman could you kick that off? [08:50:54] I'm just trying to figure out how to proceed with this as it is important for the work that Kevin is doing. thank you both! [09:00:47] georgekyz: I definitely like and enjoy the simplicity of classic venv/pip, but where I see huge benefit of uv is in the big repositories, especially ones containing multiple sub-projects within like our inference-services. Currently, when we use at least 2 requirement files with pip, we experience dependency conflicts looking at the build logs, which are not yet breaking, but could possibly be. Also the speed difference of is [09:00:48] really huge in my experience once the requirements list grows. [09:02:31] georgekyz: I also love uv’s workspace concept (https://docs.astral.sh/uv/concepts/projects/workspaces/), which lets you resolve dependencies and create a single lock file for all your sub-packages (in our case models) in the repository, which could make sure all models would run using the same dependency versions. [09:12:10] isaranto: I think the k8s SIG meeting next Tuesday would be a good place to bring this up. Luca, wdyt? [09:16:09] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 10Wikimedia-Extension-setup, and 2 others: Install ORES extension on idwiki - https://phabricator.wikimedia.org/T382171#10847239 (10isarantopoulos) > We need to run it for them @Ladsgroup sorry I'm confused :D. Can I run this? Ot... [09:31:11] bartosz: yeah I see your point especially on the speed difference, indeed uv is super fast, probably I am a little bit biased due to the struggling of building the blubber rules for integrating it :P We can discuss it in the future if you like. [09:33:57] georgekyz: Sure, I would definitely love to have a pair programming session(s) with the blubber merge request :D [09:41:21] georgekyz: I've +2'd the NS patch and will deploy it in amminute [09:41:37] klausman: Thank you sooo much! [09:52:23] 06Machine-Learning-Team, 13Patch-For-Review: Deploy peacock/tone check model to production - https://phabricator.wikimedia.org/T394779#10847385 (10gkyziridis) [09:52:30] 06Machine-Learning-Team, 13Patch-For-Review: Deploy peacock/tone check model to production - https://phabricator.wikimedia.org/T394779#10847386 (10gkyziridis) [09:52:34] pushed to staging and confirmed presence of edit-check NS [09:53:43] now also pushing to serve-codfw [09:54:50] and done [09:55:27] I'll let it soak a bit and if nothing asplodes, will push to eqiad. There are some unrelated admin_ng changes in there as well (moslty external-services stuff), so I want to make sure it's all good [10:01:06] 06Machine-Learning-Team, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: Run analysis to retrieve thresholds for high impact wikis to deploy recent changes revert risk language agnostic filters to - https://phabricator.wikimedia.org/T392148#10847423 (10gkyziridis) >>! In T392148#10845099, @Kgraessle wrot... [10:13:41] klausman: thnx [10:21:19] klausman: ack on the k8s-sig. that sounds reasonable [10:52:42] 10Lift-Wing, 06Machine-Learning-Team: Host an OpenVINO model in LiftWing - https://phabricator.wikimedia.org/T395012 (10santhosh) 03NEW [11:38:01] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 10Wikimedia-Extension-setup, and 2 others: Install ORES extension on idwiki - https://phabricator.wikimedia.org/T382171#10847718 (10Ladsgroup) >>! In T382171#10847239, @isarantopoulos wrote: >> We need to run it for them > @Ladsg... [11:41:32] (03CR) 10Ilias Sarantopoulos: [C:03+1] "LGTM! the changes are really simple but I'd prefer someone else also takes a quick look since there are many changes." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1148282 (https://phabricator.wikimedia.org/T393865) (owner: 10Bartosz Wójtowicz) [11:42:28] isaranto: thnx for the fast review, should I proceed to the deployment ? [11:42:41] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 10Wikimedia-Extension-setup, and 2 others: Install ORES extension on idwiki - https://phabricator.wikimedia.org/T382171#10847731 (10isarantopoulos) Thanks for clearing that out! <3 [11:43:43] georgekyz: i didnt see any issue. in any case there is no deployment at the moment so no risk at all. lets gooooo [11:43:58] alrightyy [11:55:47] ``` [11:55:47] STDERR: [11:55:47] Error: Failed to get release main in namespace default: exit status 1: W0522 11:50:02.270509 169367 loader.go:222] Config not found: /etc/kubernetes/edit-check-deploy-ml-staging-codfw.config [11:55:47] W0522 11:50:02.270604 169367 loader.go:222] Config not found: /etc/kubernetes/edit-check-deploy-ml-staging-codfw.config [11:55:47] Error: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp [::1]:8080: connect: connection refused [11:55:47] Error: plugin "diff" exited with error [11:55:47] ``` [11:57:24] Does anyone know what this could be ? [11:58:01] Well, that file is not there, so at least that's the first step. Lemm dig up what creates it [12:00:32] ah, the puppet-side entry in hieradata/common/profile/kubernetes/deployment_server.yaml is missing. I'll make a patch [12:01:03] klausman: thank you [12:05:10] 06Machine-Learning-Team: Compare performance of KServe huggingfaceserver with HuggingFace vs vLLM backend - https://phabricator.wikimedia.org/T395019 (10kevinbazira) 03NEW [12:08:17] 06Machine-Learning-Team: Compare performance of KServe huggingfaceserver with HuggingFace vs vLLM backend - https://phabricator.wikimedia.org/T395019#10847861 (10kevinbazira) [12:09:24] georgekyz: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1149369?tab=checks [12:11:11] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 05Goal: Q4 24-25 Goal: Investigate Add-a-link model training and deployment - https://phabricator.wikimedia.org/T393474#10847863 (10OKarakaya-WMF) Hello @Michael and @Urbanecm_WMF , I've created some questions/investigation items for myself, but I'd be... [12:22:23] klausman: any luck with MinT credential for s3? [12:27:28] I am working on it, Puppet is really warping my brain [12:28:10] and gerrit being down for maintenance doesn't help :) [12:29:51] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 10Wikimedia-Extension-setup, and 2 others: Install ORES extension on idwiki - https://phabricator.wikimedia.org/T382171#10847914 (10Kgraessle) @Ladsgroup just letting you know that we plan on running the backfill job today after... [12:35:20] georgekyz: the file is there now, so the deployment should work [12:35:32] thnx so mucj [12:52:11] 06Machine-Learning-Team: Compare performance of KServe huggingfaceserver with HuggingFace vs vLLM backend - https://phabricator.wikimedia.org/T395019#10847999 (10kevinbazira) To accurately compare the performance of the KServe huggingfaceserver with the `huggingface` vs `vllm` backends, the ideal testing ground... [12:54:06] I've run a comparison between the way we used to serve LLMs using a huggingface backend vs the new way we plan to serve LLMs using vllm, the new vllm image we built gives us a speedup of roughly 6x based on a specific test case I run on `ml-lab1002`. [12:54:07] previously the aya-expanse-8b model would return a response in ~5s, now it returns a response in ~0.8s. see details in: https://phabricator.wikimedia.org/T395019#10847999 [12:57:26] woah that sounds amazing Kevin! 🎉 [12:59:10] The `helmfile -e ml-staging-codfw sync` ran successfully but I cannot see any pods for edit-check on staging [12:59:27] gkyziridis@deploy1003:/srv/deployment-charts/helmfile.d/ml-services/edit-check$ helmfile -e ml-staging-codfw sync [12:59:27] skipping missing values file matching "/etc/helmfile-defaults/private/ml-serve_services/edit-check/ml-staging-codfw.yaml" [12:59:27] Upgrading release=service-secrets, chart=wmf-stable/secrets, namespace=edit-check [12:59:27] Release "service-secrets" has been upgraded. Happy Helming! [12:59:27] NAME: service-secrets [13:01:06] https://www.irccloud.com/pastebin/hNx0vmYf/ [13:23:34] taking a look [13:26:20] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 05Goal: Q4 24-25 Goal: Investigate Add-a-link model training and deployment - https://phabricator.wikimedia.org/T393474#10848114 (10Michael) >>! In T393474#10847863, @OKarakaya-WMF wrote: > Hello @Michael and @Urbanecm_WMF , > > I've created some quest... [13:36:24] georgekyz: missing bit on the P{uppet private repo, fixing it [13:36:45] klausman: sorry for adding more on you mate [13:36:52] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 10Wikimedia-Extension-setup, 10Wikimedia-Site-requests: Install ORES extension on idwiki - https://phabricator.wikimedia.org/T382171#10848167 (10Ladsgroup) >>! In T382171#10847914, @Kgraessle wrote: > @Ladsgroup just letting yo... [13:37:27] no worries! [13:37:38] klausman: Thanks :) [13:48:25] georgekyz: # kubectl get pods -n edit-check [13:48:27] NAME READY STATUS RESTARTS AGE [13:48:29] edit-check-predictor-00001-deployment-66ccd7698b-w74wr 0/3 Init:1/2 0 18s [13:48:45] And now: 3/3 Running [13:49:48] klausman: Thnx so much! [13:49:52] yw [13:50:58] kevinbazira: amazing! [13:51:17] \o/ [14:00:58] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 10Wikimedia-Extension-setup, and 2 others: Install ORES extension on idwiki - https://phabricator.wikimedia.org/T382171#10848331 (10isarantopoulos) [14:04:59] klausman: I think you probably need to do the same on eqiad: [14:05:02] https://www.irccloud.com/pastebin/22WVQ2ag/ [14:05:07] oh, oops, yes [14:12:28] georgekyz: done & done, pods running [14:27:40] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 05Goal: Q4 24-25 Goal: Investigate Add-a-link model training and deployment - https://phabricator.wikimedia.org/T393474#10848501 (10OKarakaya-WMF) Hello @fkaelin , I've compiled the previous discussions here. We can use it as agenda items for our meetin... [14:28:08] dunke [14:29:06] graag gedaan [14:30:43] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 10Wikimedia-Extension-setup, and 2 others: Install ORES extension on idwiki - https://phabricator.wikimedia.org/T382171#10848523 (10isarantopoulos) We have deployed the extension on id.wikipedia.org 🎉 Congrats everyone! It is now... [14:35:25] 06Machine-Learning-Team, 13Patch-For-Review: Deploy peacock/tone check model to production - https://phabricator.wikimedia.org/T394779#10848567 (10gkyziridis) [14:38:22] 10Lift-Wing, 06Machine-Learning-Team, 10ML-Governance: Investigate storing model metadata on Wikidata - https://phabricator.wikimedia.org/T286508#10848588 (10Htriedman) a:05Htriedman→03None [14:46:34] 06Machine-Learning-Team, 13Patch-For-Review: Deploy peacock/tone check model to production - https://phabricator.wikimedia.org/T394779#10848678 (10gkyziridis) ==== Edit-Check Deployment Edit-check name space is created on staging/production. The model is deployed using the: [[ https://docker-registry.wikimed... [14:53:39] georgekyz: awesome --^ could you also do the API GW change? I can also do it or just help if needed (talking about tomorrow in either case) [14:54:44] (03CR) 10Gkyziridis: [C:03+1] "Indeed too many changes." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1148282 (https://phabricator.wikimedia.org/T393865) (owner: 10Bartosz Wójtowicz) [14:55:12] isaranto: I pushed a patch here: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1149375 [14:55:34] I missed that, you're one step ahead! [14:56:10] ok it is clear you dont need any help on that :D [14:57:22] 😇 [15:00:50] isaranto: merged! [15:00:58] thnx for the review [15:23:53] * isaranto afk! [15:23:53] 06Machine-Learning-Team, 10ORES, 10ContentTranslation: Show source article quality at Special:ContentTranslation's "translations in progress", "suggestions" and "for later" lists - https://phabricator.wikimedia.org/T258149#10848958 (10Nikerabbit) p:05Triage→03Medium [15:54:50] 06Machine-Learning-Team, 07I18n, 10Moderator-Tools-Team (Kanban): Ensure all ORES i18n messages are available for idwiki - https://phabricator.wikimedia.org/T394455#10849168 (10BAPerdana-WMF) All done. Good to go. [15:55:20] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 10Wikimedia-Extension-setup, and 2 others: Install ORES extension on idwiki - https://phabricator.wikimedia.org/T382171#10849170 (10Kgraessle) >>! In T382171#10848523, @isarantopoulos wrote: > We have deployed the extension on id... [16:04:49] 06Machine-Learning-Team, 07I18n, 10Moderator-Tools-Team (Kanban): Ensure all ORES i18n messages are available for idwiki - https://phabricator.wikimedia.org/T394455#10849218 (10Kgraessle) 05Open→03Resolved [16:06:08] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Moderator-Tools-Team (Kanban), 10MW-1.45-notes (1.45.0-wmf.2; 2025-05-20): PopulateDatabase errors out and stops processing revisions when any revertRiskLiftWingRequest request fails - https://phabricator.wikimedia.org/T375280#10849223 (10Kgraessle... [17:19:04] (03PS1) 10Sbisson: Fully encode URLs for GET requests [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1149449 (https://phabricator.wikimedia.org/T395026) [18:28:49] 06Machine-Learning-Team, 10ORES, 06Moderator-Tools-Team, 10PageTriage, and 2 others: ParserFunctionsTest::testIfexist failure by run of ORESFetchScoreJob in CI - https://phabricator.wikimedia.org/T395074 (10Umherirrender) 03NEW