[01:50:54] 06Machine-Learning-Team: Spark Job in airflow-devenv cannot access Hive Metastore because of Kerberos Authentication Failure - https://phabricator.wikimedia.org/T398907#10993909 (10kevinbazira) >>! In T398907#10993364, @brouberol wrote: > I have merged a change into `airflow-dags` (see https://phabricator.wikime... [06:23:17] good morning! [06:43:49] Good morning [07:06:18] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, and 2 others: Update kserve to v0.15.2* on ML clusters - https://phabricator.wikimedia.org/T380722#10994157 (10isarantopoulos) Posting here also something that would be useful for us to try: We can use https://gerrit.wikimedia.... [09:52:06] 06Machine-Learning-Team, 07Essential-Work: Update knative's queue proxy image and the Swift/S3 accounts used on ml-serve clusters - https://phabricator.wikimedia.org/T398533#10994734 (10isarantopoulos) [09:52:07] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Operational Excellence - LiftWing Platform Updates & Improvements - https://phabricator.wikimedia.org/T398948#10994733 (10isarantopoulos) [09:52:08] 06Machine-Learning-Team: ML Services causing log spam - https://phabricator.wikimedia.org/T393475#10994735 (10isarantopoulos) [09:52:11] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, and 2 others: Update kserve to v0.15.2* on ML clusters - https://phabricator.wikimedia.org/T380722#10994737 (10isarantopoulos) [09:52:17] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10994736 (10isarantopoulos) [10:15:39] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, and 3 others: Update knative-serving+net-istio to v1.12.x on ML clusters - https://phabricator.wikimedia.org/T380723#10994774 (10isarantopoulos) The [[ https://kserve.github.io/website/0.15/admin/serverless/serverless/ | recomme... [11:40:03] isaranto: o/ [11:40:26] do you know if ML manages the linkrecommendation service on Wikikube? [11:40:43] it would need to be migrated to a more up to date os [11:40:57] o/ no we don't [11:41:19] afaik at least [11:41:55] it is managed by growth [11:42:04] yeah I was checking https://wikitech.wikimedia.org/wiki/SLO/linkrecommendation [11:42:18] do you know who I can contact in that team? [11:46:07] elukey: going by git log on helmfile.d/services/linkrecommendation, probably Sergio Gimeno and/or Martin Urbanec (the latter is right here in this channel) [11:46:39] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, and 3 others: Update knative-serving+net-istio to v1.12.x on ML clusters - https://phabricator.wikimedia.org/T380723#10995074 (10elukey) Istio 1.24 is already bundled with the 1.31 k8s upgrade, so that part should be ok :) We r... [11:47:03] ack thanks I'll try, but I have the feeling that this will be handled eventually by ML :D [11:47:05] Hello @Luca , Martin Urbanec and Michael Große from Growth team have helped me to understand link recommendation service in more detail. We can also consider upgrading it in scope of add-a-link next phase if it’s not urgent. We plan some updates on this service to be able use the new models: https://phabricator.wikimedia.org/T393474#10963727 [11:48:06] ozge_: o/ - The service is currently running on Debian Buster that is EOL (so no security upgrades etc..) so in theory this should be semi-urgent [11:48:41] but IIRC that is a python service and we'll bump the main Python version, so it may have some follow ups to do [11:48:50] what is the timeline for the next phase? [11:53:36] I think it will be this quarter but we can check priorities if it needs to be earlier. Otherwise, we can always ask some help from the Growth team. What do you think @isaranto ? [11:55:23] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 05Goal: FY2024-25 Q4 Goal: Investigate Add-a-link model training and deployment - https://phabricator.wikimedia.org/T393474#10995080 (10OKarakaya-WMF) Sharing excalidraw for the add-a-link presentation. {F63902183} [11:56:43] ozge_: if it is this quarter it shouldn't be a big problem! The target OS should be Bookworm, with py11 [11:57:06] elukey: we are working on this project in this quarter but our main focus is to provide new models to scale to new wikis. There is no concrete plan at the moment for a new architecture that would change the way that it is deployed (wikikube or not). So we can check on the amount of work required for the upgrade [11:58:28] ozge_: could you use the new goal to provide updates instead? https://phabricator.wikimedia.org/T398950 I know I still need to fill in more information there! [11:58:34] I'll ask for your help [12:00:15] Awesome! Looking into it. [12:01:11] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#10995092 (10OKarakaya-WMF) Sharing excalidraw for the add-a-link presentation. {F63902958} [12:06:41] thank you! [12:33:12] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#10995177 (10Jclark-ctr) [13:01:23] isaranto: ack understood. In theory the work required would be to instruct blubber to use a new image based on bookworm, and check that the service runs fine [13:01:44] there may be some hicups with python features not supported anymore etc.. [13:02:39] I suspect it will have several issues due to the python upgrade which will come together with updating the python packages which might cause compatibility issues with the models (hopefully not) [14:13:20] 06Machine-Learning-Team: Enable isort in CI for inference-services repo - https://phabricator.wikimedia.org/T353281#10995520 (10isarantopoulos) 05Open→03Resolved a:03isarantopoulos This has been implemented while updating the pre-commit hooks in {T393865}. import sorting is done using ruff now. [15:28:09] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#10995827 (10elukey) @Jclark-ctr I have the feeling that we'll have to pause this work for a bit of time, I'll need to set some time off to figure out what's different a... [16:57:59] have a nice weekend all! [23:30:39] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 10Editing-team (Tracking): Compile list of templates, jargon and policies relevant to NPOV - https://phabricator.wikimedia.org/T389445#10996861 (10ppelberg) 05Open→03Resolved I //think// we can consider work on this task complete. //Although//,... [23:33:18] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 07Chinese-Sites, 10Editing-team (Tracking): Prepare annotool for Tone Check model evaluation (v1) - https://phabricator.wikimedia.org/T392324#10996872 (10ppelberg) [23:35:03] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 07Chinese-Sites, 10Editing-team (Tracking): Prepare annotool for Tone Check model evaluation - https://phabricator.wikimedia.org/T392324#10996875 (10ppelberg) [23:35:09] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 07Chinese-Sites, 10Editing-team (Tracking): Prepare annotool for Tone Check model evaluation - https://phabricator.wikimedia.org/T392324#10996876 (10ppelberg) [23:35:58] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 07Chinese-Sites, 10Editing-team (Tracking): Prepare annotool for Tone Check model evaluation - https://phabricator.wikimedia.org/T392324#10996879 (10ppelberg) Annotool instances were prepared for the languages Tone Check (v1) will support. As such,... [23:36:40] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 07Chinese-Sites, 10Editing-team (Tracking): Prepare annotool for Tone Check model evaluation - https://phabricator.wikimedia.org/T392324#10996881 (10ppelberg)