[05:52:01] 06Machine-Learning-Team, 07Essential-Work: Add ROCm build dependencies to wmf-debian-vllm image to support AITER kernel compilation - https://phabricator.wikimedia.org/T419650#11705161 (10kevinbazira) 05Open→03Resolved The wmf-debian-vllm image that includes ROCm build dependencies for AITER kernel com... [06:40:49] 06Machine-Learning-Team, 10Semantic Search, 07OKR-Work: Migrate embeddings inference service from HF Transformers+CK FlashAttention to vLLM+AITER - https://phabricator.wikimedia.org/T418976#11705174 (10kevinbazira) Thank you for running the load tests @OKarakaya-WMF. Below is a consolidated report of the opt... [06:41:45] 06Machine-Learning-Team, 10Semantic Search, 07OKR-Work: Migrate embeddings inference service from HF Transformers+CK FlashAttention to vLLM+AITER - https://phabricator.wikimedia.org/T418976#11705175 (10kevinbazira) 05Open→03Resolved a:03kevinbazira [07:55:18] 06Machine-Learning-Team, 13Patch-For-Review: kserve helm status is broken across ml clusters - https://phabricator.wikimedia.org/T419040#11705307 (10elukey) @DPogorzelski-WMF my idea was to follow what upstream did, namely remove caBundle occurrences in the CRD itself and then re-deploy. I think it is possibly... [07:57:31] 06Machine-Learning-Team: Reduce logstash logs from machine learning infra - https://phabricator.wikimedia.org/T416384#11705311 (10elukey) Kserve's logs look really better now: {F72829296} [08:07:29] 06Machine-Learning-Team: Reduce logstash logs from machine learning infra - https://phabricator.wikimedia.org/T416384#11705342 (10elukey) I think we are out of the woods, we have around ~20k/minute logs now mostly coming from the inference-service pods (kserve-container). I think that we could definitely improve... [09:23:35] 06Machine-Learning-Team, 13Patch-For-Review: kserve helm status is broken across ml clusters - https://phabricator.wikimedia.org/T419040#11705551 (10DPogorzelski-WMF) I can try again but as per screenshot above it's something i have tried and then reverted because it didn't have effect [09:28:05] 06Machine-Learning-Team, 13Patch-For-Review: kserve helm status is broken across ml clusters - https://phabricator.wikimedia.org/T419040#11705561 (10DPogorzelski-WMF) Sorry captured the wrong change there, but pretty sure did test on the side with removing the whole entry, can try again though [09:37:44] 06Machine-Learning-Team, 13Patch-For-Review: kserve helm status is broken across ml clusters - https://phabricator.wikimedia.org/T419040#11705568 (10elukey) So far in staging it looks good: ` 20 Thu Mar 5 13:57:23 2026 failed kserve-0.2.9 0.11.2 Rollback "kserve" fail... [09:40:37] 06Machine-Learning-Team, 13Patch-For-Review: kserve helm status is broken across ml clusters - https://phabricator.wikimedia.org/T419040#11705573 (10DPogorzelski-WMF) Awesome! Then I must have done something wrong [09:43:52] 06Machine-Learning-Team, 13Patch-For-Review: kserve helm status is broken across ml clusters - https://phabricator.wikimedia.org/T419040#11705574 (10elukey) I think the missing bit was to bump the chart's version, that must be it. [14:37:04] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Make article topic data available at scale and within SLOs for Year in Review - https://phabricator.wikimedia.org/T392833#11706981 (10BWojtowicz-WMF) **Weekly Update** 1. We've opened discussion regarding gRPC/HTTP choice for protocol between h... [15:42:32] 06Machine-Learning-Team: Reduce logstash logs from machine learning infra - https://phabricator.wikimedia.org/T416384#11707359 (10colewhite) >>! In T416384#11705342, @elukey wrote: > I think we are out of the woods, we have around ~20k/minute logs now mostly coming from the inference-service pods (kserve-contain... [17:14:16] (03CR) 10Dillon: [C:03+2] "LGTM, thanks!" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1248799 (https://phabricator.wikimedia.org/T407552) (owner: 10Kgraessle) [17:26:07] (03Merged) 10jenkins-bot: Expose the revert risk language agnostic prediction boolean via the RecentChanges API [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1248799 (https://phabricator.wikimedia.org/T407552) (owner: 10Kgraessle) [17:45:44] FIRING: LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=edit-check&var-backend=edit-check-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [18:23:02] 06Machine-Learning-Team, 10Add-Link-Structured-Task, 10Community Feedback (Growth), 06Growth-Team: AI/ML model update request: Named Entity Recognition for Add-a-Link - https://phabricator.wikimedia.org/T405185#11708176 (10KStoller-WMF) Note that [[ https://en.wikipedia.org/wiki/User_talk:Electronmore#c-Sc... [20:55:03] 06Machine-Learning-Team, 10ORES, 10AbuseFilter, 10AntiSpoof, and 9 others: Drop extensions adding database tables which are unused on closed wikis - https://phabricator.wikimedia.org/T420052#11708806 (10Dreamy_Jazz) [21:00:44] RESOLVED: LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=edit-check&var-backend=edit-check-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [21:44:43] 06Machine-Learning-Team, 10ORES, 06Abstract Wikipedia team, 10AbuseFilter, and 10 others: Drop extensions adding database tables which are unused on closed wikis - https://phabricator.wikimedia.org/T420052#11708949 (10A_smart_kitten) >>!**from the task description** > Closed wikis are unlikely to be re-ope... [21:46:11] 06Machine-Learning-Team, 07OKR-Work, 13Patch-For-Review: Deploy gpt-oss-safeguard-20b on LiftWing - https://phabricator.wikimedia.org/T418350#11708954 (10ldelench_wmf) [21:54:18] 06Machine-Learning-Team, 10ORES, 06Abstract Wikipedia team, 10AbuseFilter, and 10 others: Drop extensions adding database tables which are unused on closed wikis - https://phabricator.wikimedia.org/T420052#11708969 (10Dreamy_Jazz) >>! In T420052#11708949, @A_smart_kitten wrote: >>>!**from the task descript... [22:06:20] 06Machine-Learning-Team, 10ORES, 06Abstract Wikipedia team, 10AbuseFilter, and 10 others: Drop extensions adding database tables which are unused on closed wikis - https://phabricator.wikimedia.org/T420052#11709013 (10Pppery) I'd keep AbuseFilter, just because I think dropping historical data is scary and... [22:20:28] 06Machine-Learning-Team, 10ORES, 06Abstract Wikipedia team, 10AbuseFilter, and 10 others: Drop extensions adding database tables which are unused on closed wikis - https://phabricator.wikimedia.org/T420052#11709051 (10Dreamy_Jazz) >>! In T420052#11709013, @Pppery wrote: > I'd keep AbuseFilter, just because... [22:22:20] 06Machine-Learning-Team, 10ORES, 06Abstract Wikipedia team, 10AbuseFilter, and 10 others: Drop extensions adding database tables which are unused on closed wikis - https://phabricator.wikimedia.org/T420052#11709054 (10Pppery) If the AbuseFilter db tables are all empty then feel free to drop them. [22:22:37] 06Machine-Learning-Team, 10ORES, 06Abstract Wikipedia team, 10AbuseFilter, and 10 others: Drop extensions adding database tables which are unused on closed wikis - https://phabricator.wikimedia.org/T420052#11709055 (10A_smart_kitten) >>! In T420052#11708969, @Dreamy_Jazz wrote [quoted in a mixed order]: >... [22:46:30] 06Machine-Learning-Team, 10ORES, 06Abstract Wikipedia team, 10AbuseFilter, and 10 others: Drop extensions adding database tables which are unused on closed wikis - https://phabricator.wikimedia.org/T420052#11709096 (10Dreamy_Jazz) So, for #abusefilter we have 113 closed wikis where they have no rows in `ab... [22:47:41] 06Machine-Learning-Team, 10ORES, 06Abstract Wikipedia team, 10AbuseFilter, and 10 others: Drop extensions where database tables have no rows on closed wikis - https://phabricator.wikimedia.org/T420052#11709099 (10Dreamy_Jazz)