[06:55:53] good morning. [07:27:37] good morning! [08:42:26] morning! :) [09:07:31] 06Machine-Learning-Team, 06Data-Persistence, 06Growth-Team, 07OKR-Work: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11059953 (10Michael) [09:21:18] 06Machine-Learning-Team, 06Data-Persistence, 06Growth-Team, 10Improve-Tone-Suggested-Edit, 07OKR-Work: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11060012 (10Michael) [09:59:32] 06Machine-Learning-Team: Error in revscoring-editquality-damaging - itwiki-damaging-predictor-default - https://phabricator.wikimedia.org/T401109#11060111 (10OKarakaya-WMF) Thank you @Joe , When I set `x-request-id` in header, I don't see something related to maxRequestsPerConnection settings. But I think @el... [11:18:47] hi @elukey could you help me [here](https://phabricator.wikimedia.org/T401109#11060110) to validate some settings as you've configured maxRequestsPerConnection before. [11:36:54] Hey folks, there are some patches for review ready regarding the essential work (bookworm deployments on prod), please review when you have time: [11:36:54] 1. https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1175477 [11:36:54] 2. https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1175478 [12:29:11] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade langid model server from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400347#11060569 (10gkyziridis) 05Open→03Resolved [12:36:27] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade articletopic-outlink model servers from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400349#11060603 (10gkyziridis) 05Open→03Resolved [12:37:44] FIRING: LiftWingServiceErrorRate: ... [12:37:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=itwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:46:18] Folks thnx a lot for the reviews. This is my last patch for today :sweat_smile: regarding the bookworm prod deployments: [12:46:18] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1175888 [13:02:27] 06Machine-Learning-Team, 10Automoderator, 06Moderator-Tools-Team: Use multilingual revert risk model in Automoderator on supported wikis - https://phabricator.wikimedia.org/T365581#11060645 (10Samwalton9-WMF) This is just waiting on T400727. Folks from es.wiki and uk.wiki have both expressed interest in swit... [13:19:45] 06Machine-Learning-Team: Error in revscoring-editquality-damaging - itwiki-damaging-predictor-default - https://phabricator.wikimedia.org/T401109#11060694 (10elukey) Yep in theory it should be applied! We should probably do more extensive tests, but it will require a bit of time and efforts. Before starting, do... [13:29:29] 06Machine-Learning-Team: Error in revscoring-editquality-damaging - itwiki-damaging-predictor-default - https://phabricator.wikimedia.org/T401109#11060720 (10OKarakaya-WMF) thank you @elukey We have 5M successful responses in the same period of time. ` https://logstash.wikimedia.org/app/dashboards#/view/7f88... [13:57:52] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11060880 (10OKarakaya-WMF) follow-up task to have analytics-ml user: https://phabricator.wikimedia.org/T400902 [13:58:18] sorry I'll be ~5 mins late for the team meeting [14:24:58] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11060998 (10brouberol) ` brouberol@krb1002:~$ sudo kadmin.local addprinc -randkey analytics-ml/airflow-ml.discovery.wmnet@WI... [14:28:35] 06Machine-Learning-Team, 10Automoderator, 06Moderator-Tools-Team: Use multilingual revert risk model in Automoderator on supported wikis - https://phabricator.wikimedia.org/T365581#11061003 (10Strainu) Awsome news, looking forward to the moment we can use Automoderator on rowiki as well. [14:30:34] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11061005 (10brouberol) ` brouberol@an-launcher1002:~$ sudo kerberos-run-command hdfs hdfs dfs -chown -R analytics-ml:analyti... [14:35:34] 06Machine-Learning-Team, 07Essential-Work: Upgrade revscoring model servers from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400350#11061034 (10OKarakaya-WMF) I've upgraded revscoring and I'm getting following error. I think we will need to re-train our models first with a newer version of... [14:45:07] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11061090 (10brouberol) ` airflow@airflow-kerberos-6766cc659b-ns96p:/opt/airflow$ klist Ticket cache: FILE:/tmp/airflow_krb5_... [14:47:26] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11061096 (10brouberol) ` brouberol@an-launcher1002:~$ sudo kerberos-run-command hdfs hdfs dfs -rm -r /wmf/cache/artifacts/ai... [14:48:44] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11061107 (10brouberol) ` brouberol@stat1011:~$ sudo run-puppet-agent Info: Using environment 'production' Info: Retrieving p... [14:49:07] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11061109 (10brouberol) [14:49:16] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11061112 (10brouberol) 05In progress→03Resolved [15:02:12] 06Machine-Learning-Team, 07Essential-Work: Upgrade revscoring model servers from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400350#11061170 (10OKarakaya-WMF) a:05BWojtowicz-WMF→03OKarakaya-WMF [15:02:25] 06Machine-Learning-Team, 07Essential-Work: Upgrade revscoring model servers from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400350#11061172 (10OKarakaya-WMF) ` (myenv_revscoring) ozge@wmf3658 inference-services % docker compose up damaging WARN[0000] The "DYLD_FALLBACK_LIBRARY_PATH" va... [15:03:14] 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Upgrade reability model servers from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400352#11061176 (10OKarakaya-WMF) a:05BWojtowicz-WMF→03OKarakaya-WMF [15:03:47] 06Machine-Learning-Team, 07Essential-Work: Upgrade revscoring model servers from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400350#11061190 (10OKarakaya-WMF) 05Open→03In progress [16:37:59] FIRING: LiftWingServiceErrorRate: ... [16:37:59] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=itwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [20:37:59] FIRING: LiftWingServiceErrorRate: ... [20:37:59] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=itwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate