[05:00:13] (03PS1) 10Kevin Bazira: llm: update fa2 and bnb packages [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1212358 (https://phabricator.wikimedia.org/T410906) [07:03:14] (03CR) 10Nik Gkountas: [C:03+2] New endpoint to check if articles are part of a collection [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1211726 (https://phabricator.wikimedia.org/T408844) (owner: 10Sbisson) [07:04:36] (03Merged) 10jenkins-bot: New endpoint to check if articles are part of a collection [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1211726 (https://phabricator.wikimedia.org/T408844) (owner: 10Sbisson) [08:22:28] (03CR) 10Dpogorzelski: [C:03+1] llm: update fa2 and bnb packages [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1212358 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira) [08:24:48] (03CR) 10Kevin Bazira: [C:03+2] llm: update fa2 and bnb packages [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1212358 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira) [08:29:23] (03Merged) 10jenkins-bot: llm: update fa2 and bnb packages [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1212358 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira) [08:35:27] (03CR) 10Kevin Bazira: [C:03+2] "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1212358 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira) [08:55:03] dpogorzelski, klausman o/ reminder that ml-serve1001 needs some love :) https://phabricator.wikimedia.org/T411082 [09:15:01] yep! I'll reinstall it today [09:15:20] (plus the puppet bits, of course_ [09:16:37] o/ CI postmerge job to publish llm model-server image is pending indefinitely: https://integration.wikimedia.org/zuul/#q=1212358 [09:16:37] I've contacted #wikimedia-releng [09:19:17] elukey: qq: if I disable puppet manually on 1001, merge the patch and then reimage the machine, the disable-puppet-status will be wiped during reimage, right? [09:21:04] (03PS1) 10Bartosz Wójtowicz: revise-tone-task-generator: Fetch content only for matching topics. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1212525 (https://phabricator.wikimedia.org/T408538) [09:23:03] klausman: yep yep [09:27:26] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11415058 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin1003 for host ml-serve1001.eqiad.wmnet with OS trixie [09:41:39] (03CR) 10AikoChou: [C:03+1] "LGTM! Thanks for further optimisation. This would improve a lot! :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1212525 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [09:50:20] (03CR) 10Bartosz Wójtowicz: [C:03+2] revise-tone-task-generator: Fetch content only for matching topics. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1212525 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [09:58:50] (03Merged) 10jenkins-bot: revise-tone-task-generator: Fetch content only for matching topics. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1212525 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [10:04:58] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11415142 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin1003 for host ml-serve1001.eqiad.wmnet with OS trixie completed: - ml-serve1001... [10:05:17] ml-serve has been reimaged and uncordoned [10:08:21] nice! [10:08:31] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11415149 (10klausman) 05Open→03Resolved Machine has been reimaged and is back in the cluster, closing. [10:35:21] CI job fixed, llm image built: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1212358?tab=checks [11:13:05] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production - https://phabricator.wikimedia.org/T408790#11415304 (10OKarakaya-WMF) I've created a list of currently in use models. These models below got at least one suggestion accept or suggestion reject since 20... [11:16:32] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production - https://phabricator.wikimedia.org/T408790#11415325 (10OKarakaya-WMF) - itwiki Looking into 17days periods: before: "2025-11-06" "2025-10-22" after: "2025-11-09" "2025-11-24" Now we get t-test is st... [11:19:40] (03PS1) 10Bartosz Wójtowicz: revise-tone-task-generator: Use BatchQuery to optimise Cass writes. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1212555 (https://phabricator.wikimedia.org/T408538) [12:00:36] 06Machine-Learning-Team, 07Essential-Work: Update Aya LLM model-server to run on LiftWing GPUs - https://phabricator.wikimedia.org/T410906#11415517 (10kevinbazira) Finally, as shown below, the llm model-server using MI300X GPU in LiftWing production is able to serve the aya-expanse-8B model: ` $ kubectl get po... [12:07:28] \o/ --^ [12:11:25] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Task generation engine for Revise Tone task - https://phabricator.wikimedia.org/T408341#11415553 (10achou) **Weekly Report** Progress update on the hypothesis for the week, including if something has shipped: - We shipped the pilot wikis (en, f... [12:33:59] 06Machine-Learning-Team, 06Discovery-Search (2025.10.20 - 2025.11.07): Initial task generation and ingestion to Cassandra and Search weight tags - https://phabricator.wikimedia.org/T408533#11415630 (10achou) [12:39:49] 06Machine-Learning-Team, 06Discovery-Search (2025.10.20 - 2025.11.07): Initial task generation and ingestion to Cassandra and Search weight tags - https://phabricator.wikimedia.org/T408533#11415670 (10achou) >>! In T408533#11383565, @dcausse wrote: > [...] > it was designed to support large datasets but I susp... [12:40:06] 06Machine-Learning-Team, 06Discovery-Search (2025.10.20 - 2025.11.07): Initial task generation and ingestion to Cassandra and Search weight tags - https://phabricator.wikimedia.org/T408533#11415671 (10achou) 05Open→03Resolved [12:43:34] 06Machine-Learning-Team, 06Discovery-Search (2025.10.20 - 2025.11.07): Initial task generation and ingestion to Cassandra and Search weight tags - https://phabricator.wikimedia.org/T408533#11415686 (10achou) [12:57:01] kevinbazira: wow good job!! \o/ [13:14:03] 06Machine-Learning-Team: Build and push images to the docker registry from ml-lab - https://phabricator.wikimedia.org/T394778#11415852 (10DPogorzelski-WMF) >>! In T394778#11412197, @kevinbazira wrote: >>>! In T394778#11412129, @DPogorzelski-WMF wrote: >> I would like to resume this discussion and take a practica... [13:37:08] 06Machine-Learning-Team: Build and push images to the docker registry from ml-lab - https://phabricator.wikimedia.org/T394778#11415913 (10kevinbazira) >>! In T394778#11415852, @DPogorzelski-WMF wrote: >>>! In T394778#11412197, @kevinbazira wrote: >>>>! In T394778#11412129, @DPogorzelski-WMF wrote: >>> I would li... [13:43:45] 06Machine-Learning-Team, 13Patch-For-Review: Create a Revise Tone Task Generator in LiftWing - https://phabricator.wikimedia.org/T408538#11415929 (10BWojtowicz-WMF) **Update** After some development time, the Revise Tone Task Generator service is happily running on LiftWing and is processing all edits on `enw... [16:26:17] have a nice weekend folks :) [21:45:22] (03PS1) 10Sbisson: Support checking collection membership by language and titles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1212678 (https://phabricator.wikimedia.org/T408845)