[07:02:02] good morning [07:06:27] good morning :) [07:06:33] good morning [07:11:49] morning!! [07:11:50] hi folks o/ [07:30:54] 06Machine-Learning-Team: Evaluate adding caching mechanism for article topic model to make data available at scale - https://phabricator.wikimedia.org/T401778#11109637 (10BWojtowicz-WMF) We've had a discussion meeting yesterday with @Eevans, @AikoChou and @klausman, thank you all for attending! I'm sharing the... [07:36:16] ^ I've shared some notes regarding Cache from yesterdays discussion with Aiko, Tobias and Eric from the Data Persistence team [07:39:34] 06Machine-Learning-Team, 10Editing-team (Tracking): Incorporate Tone-check Retraining Notebook in ml-pipelines - https://phabricator.wikimedia.org/T401007#11109666 (10achou) [08:51:07] 06Machine-Learning-Team: Evaluate adding caching mechanism for article topic model to make data available at scale - https://phabricator.wikimedia.org/T401778#11109846 (10isarantopoulos) >>! In T401778#11109636, @BWojtowicz-WMF wrote: > We also have to think about invalidating cache entries for the deleted page... [09:28:45] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11109899 (10OKarakaya-WMF) # Release plan - We create two new dags namely release-to-staging, release-to-prod. We can discuss if release-to-prod s... [09:35:55] bartosz: thanks for the update! [09:52:42] 06Machine-Learning-Team: Evaluate adding caching mechanism for article topic model to make data available at scale - https://phabricator.wikimedia.org/T401778#11110008 (10achou) >>! In T401778#11109846, @isarantopoulos wrote: >>>! In T401778#11109636, @BWojtowicz-WMF wrote: > >> We also have to think about inva... [10:26:56] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11110134 (10kevinbazira) Thank you for working on this release plan @OKarakaya-WMF. It LGTM. Could we add a step to ensure the model is only relea... [10:35:36] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11110175 (10OKarakaya-WMF) [10:36:22] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11110177 (10OKarakaya-WMF) thank you @kevinbazira , I've updated the description. [10:54:01] (03PS1) 10AikoChou: locust: update test result for readability model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1181109 (https://phabricator.wikimedia.org/T400352) [10:55:42] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade readability model servers from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400352#11110256 (10achou) Here are the load test results: * Staging (bookworm) ` Load test results are within the threshold [2025-08-22 10:50... [11:00:07] here are two patches that need review [11:00:10] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1180098 [11:00:20] https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1181109 [11:30:55] (03CR) 10Kevin Bazira: [C:03+1] locust: update test result for readability model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1181109 (https://phabricator.wikimedia.org/T400352) (owner: 10AikoChou) [11:32:52] (03CR) 10Bartosz Wójtowicz: "I wonder if we should make our load tests send more requests - in the new results we did sent only 20 requests (vs 26 in the previous stat" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1181109 (https://phabricator.wikimedia.org/T400352) (owner: 10AikoChou) [11:34:12] aiko: o/ +1 on the locus results [11:34:12] regarding the readability image, hope this will be tested on staging too! [12:44:13] I've added all wikis to airflow for add-a-link training @kevinbazira can you take look into the mr https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1627 [12:44:51] ozge_: thank you for working on this. [12:44:56] looking ... [13:25:06] (03CR) 10AikoChou: "Yeah, compared to the previous stats, the new results are slightly worse (26 vs 20 requests; 0.22 rps vs 0.17 rps). I think in the previou" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1181109 (https://phabricator.wikimedia.org/T400352) (owner: 10AikoChou) [13:33:21] kevinbazira: o/ thanks for the review! the readability image has been updated on staging, it was in another patch https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1180082 and merged. And the locus test is for the new deployment on staging [13:44:51] ack [13:54:30] ozge_: the add-a-link airflow pipeline is failiing because blunderbuss has not downloaded the required artifact: hdfs:/wmf/cache/artifacts/airflow/ml/add_a_link-0.1.1-v0.0.3.conda.tgz [13:54:30] I checked HDFS and it's missing: [13:54:30] ``` [13:54:30] kevinbazira@stat1008:~$ hdfs dfs -ls /wmf/cache/artifacts/airflow/ml/ [13:54:30] ``` [13:54:31] tentatively we might have to manually upload the artifact into a personal dir as I did in: https://phabricator.wikimedia.org/T400902#11100246 [14:32:45] yes, the thing is we don't have permissions to write there anymore. I'll check with balthazar [14:40:43] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11110831 (10OKarakaya-WMF) hi @brouberol , to use conda artifacts in airflow, we manually copy artifacts in gitlab to `hdf... [15:00:01] hey @kevinbazira I've switched to a personal folder and changed the email for now. can you take a look? https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1629 [15:03:11] actually anyone can approve. [15:13:27] this just reminded me that I need to request access to that repo [15:17:35] Going afk folks -- have a great weekend! [15:17:50] let me see. what is your gitlab username [15:19:27] you should have permissions now @isaranto [15:19:51] Thanks! [15:29:58] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11110998 (10OKarakaya-WMF) ohh I guess we will have this permission after this patch is merged: https://gerrit.wikimedia.org... [15:31:19] ozge_: lgtm! please remember to add all 3 artifacts that were in the ml cache into `/tmp/ozge/artifacts/airflow/ml` [15:31:20] i.e `wmf-sparksqlclidriver-1.0.0.jar`, `hdfs-tools-0.0.6-shaded.jar`, and `add_a_link-0.1.1-v0.0.3.conda.tgz` [15:32:04] 🙌 thank you! [15:40:08] ozge_ georgekyz: Sorry, i was swamped all day by a ceph/kubernetes issue affecting the dumps. It's solved now. I'll have a look at implementing the PVC for the model training DAG as well as the HDFS permissions on monday [15:40:45] in the meantime ozge_, reach out to Aleksandar on slack. He wrote blunderbuss, he might understand why the artifact didn't sync [16:12:07] hi @brouberol . The feature was work-in-progress when we discussed last time. I'll check again next week. Thank you and everyone for helping with this and have a great weekend. [16:20:23] brouberol: No problem, let me know when you are ready. Thank you that you will work on that one [16:20:51] Enjoy your weekend all. [16:26:49] Thanks! [17:46:40] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11111359 (10xcollazo) >>! In T400902#11110831, @OKarakaya-WMF wrote: > hi @brouberol , > > to use conda artifacts in airfl...