[05:04:43] 10Lift-Wing, 06Machine-Learning-Team, 07ml-model-requests, 07OKR-Work (WE1 FY2025-26): Increase batch size in edit-check service - https://phabricator.wikimedia.org/T419527#11695887 (10ppelberg) [05:05:17] 10Lift-Wing, 06Machine-Learning-Team, 06Editing-team (Tracking), 07ml-model-requests, 07OKR-Work (WE1 FY2025-26): Increase batch size in edit-check service - https://phabricator.wikimedia.org/T419527#11695890 (10ppelberg) [05:12:06] 06Machine-Learning-Team, 10Semantic Search, 07OKR-Work: Migrate embeddings inference service from HF Transformers+CK FlashAttention to vLLM+AITER - https://phabricator.wikimedia.org/T418976#11695895 (10kevinbazira) [05:53:42] 06Machine-Learning-Team, 07Essential-Work: Add ROCm build dependencies to wmf-debian-vllm image to support AITER kernel compilation - https://phabricator.wikimedia.org/T419650 (10kevinbazira) 03NEW [05:55:01] 06Machine-Learning-Team, 07Essential-Work: Add ROCm build dependencies to wmf-debian-vllm image to support AITER kernel compilation - https://phabricator.wikimedia.org/T419650#11695951 (10kevinbazira) [07:08:11] (03CR) 10CI reject: [V:04-1] build: Updating composer dependencies [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1246497 (owner: 10Libraryupgrader) [08:44:51] (03PS1) 10Gkyziridis: edit-check: Increase batchsize. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1250513 [08:57:01] (03CR) 10Gkyziridis: [C:03+2] edit-check: Increase batchsize. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1250513 (owner: 10Gkyziridis) [09:04:43] (03Merged) 10jenkins-bot: edit-check: Increase batchsize. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1250513 (owner: 10Gkyziridis) [10:34:22] 06Machine-Learning-Team: Explore gpt-oss-safeguard-20b - https://phabricator.wikimedia.org/T417860#11696611 (10BWojtowicz-WMF) Closing this Task as exploration phase is complete. The key outcomes from this task: - Deployed and tested gpt-oss-safeguard-20b on ml-lab1002 with our custom vLLM 0.14 Docker image - G... [10:34:44] 06Machine-Learning-Team: Explore gpt-oss-safeguard-20b - https://phabricator.wikimedia.org/T417860#11696613 (10BWojtowicz-WMF) 05Open→03Resolved [11:16:37] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, and 2 others: ORES/LiftWing infrastructure is not working for filtering Recent Changes edits - https://phabricator.wikimedia.org/T418223#11696826 (10achou) [11:16:41] 06Machine-Learning-Team: Incident: 2026-02-23 ml-serve - https://phabricator.wikimedia.org/T418722#11696827 (10achou) [11:16:49] 06Machine-Learning-Team, 10ORES, 07Regression: ORES API query is slow - https://phabricator.wikimedia.org/T418202#11696828 (10achou) [11:16:52] 06Machine-Learning-Team: Incident: 2026-02-23 ml-serve - https://phabricator.wikimedia.org/T418722#11696829 (10achou) [11:17:08] 06Machine-Learning-Team: Incident: 2026-02-23 ml-serve - https://phabricator.wikimedia.org/T418722#11696833 (10achou) [11:17:13] 06Machine-Learning-Team, 10EditCheck, 06Growth-Team, 10Revise-Tone-Structured-Task, and 2 others: LiftWing edit-check:predict model is 404ing - https://phabricator.wikimedia.org/T418173#11696832 (10achou) [11:20:22] 06Machine-Learning-Team: Incident: 2026-02-23 ml-serve - https://phabricator.wikimedia.org/T418722#11696858 (10achou) [11:20:23] 06Machine-Learning-Team: kserve helm status is broken across ml clusters - https://phabricator.wikimedia.org/T419040#11696857 (10achou) [11:31:18] 06Machine-Learning-Team: Fix revertrisk Pyrra SLO - https://phabricator.wikimedia.org/T419235#11696932 (10achou) 05Open→03Resolved [12:02:03] 10Lift-Wing, 06Machine-Learning-Team, 06Editing-team (Tracking), 07ml-model-requests, and 2 others: Increase batch size in edit-check service - https://phabricator.wikimedia.org/T419527#11697089 (10isarantopoulos) related patch https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/... [12:50:15] 06Machine-Learning-Team, 07Essential-Work: Investigate reference-need persistently unavailable replicas alert - https://phabricator.wikimedia.org/T400602#11697181 (10BWojtowicz-WMF) Resolving this as this was a single time incident and the underlying concern about reference-need's high resource requests (22 CP... [12:50:28] 06Machine-Learning-Team, 07Essential-Work: Investigate reference-need persistently unavailable replicas alert - https://phabricator.wikimedia.org/T400602#11697183 (10BWojtowicz-WMF) 05Open→03Resolved [12:54:05] (03PS2) 10Bartosz Wójtowicz: policy-violation: Add CoPE-A-9B model server alongside gpt-oss-safeguard-20b. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1249948 (https://phabricator.wikimedia.org/T418832) [13:02:26] 06Machine-Learning-Team, 10Semantic Search, 07OKR-Work: Migrate embeddings inference service from HF Transformers+CK FlashAttention to vLLM+AITER - https://phabricator.wikimedia.org/T418976#11697235 (10OKarakaya-WMF) after the aiter changes. reference to compare is [vllm test2 (heavy)](https://phabricator.wi... [13:13:54] 06Machine-Learning-Team, 10Prod-Kubernetes, 06ServiceOps new, 07Kubernetes: Upgrade ML clusters to kubernetes 1.31 - https://phabricator.wikimedia.org/T414485#11697303 (10MLechvien-WMF) [13:55:46] (03PS2) 10Kgraessle: Expose the revert risk language agnostic prediction boolean via the RecentChanges API [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1248799 (https://phabricator.wikimedia.org/T407552) [13:57:44] (03CR) 10Jsn.sherman: [C:03+1] "This looks good to me, though I do suggest implementing @dhardy@wikimedia.org's suggestions to make this core code as legible as possible." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1248799 (https://phabricator.wikimedia.org/T407552) (owner: 10Kgraessle) [14:17:23] 10Lift-Wing, 06Machine-Learning-Team, 06Editing-team (Tracking), 07ml-model-requests, and 2 others: Increase batch size in edit-check service - https://phabricator.wikimedia.org/T419527#11697638 (10gkyziridis) === Update === The `max_batch_size` parameter in the edit-check API server is now set to `max_bat... [14:22:26] (03CR) 10Nikerabbit: [C:04-1] "Discussed in daily. There were concerns of adding more requests and the services are already struggling." [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1217170 (owner: 10Nik Gkountas) [14:42:50] 06Machine-Learning-Team: kserve helm status is broken across ml clusters - https://phabricator.wikimedia.org/T419040#11697940 (10DPogorzelski-WMF) i tested it and it always works on first sync, but the problem comes back on following syncs. will check again [14:43:21] 06Machine-Learning-Team: kserve helm status is broken across ml clusters - https://phabricator.wikimedia.org/T419040#11697941 (10DPogorzelski-WMF) [14:43:22] 06Machine-Learning-Team: Incident: 2026-02-23 ml-serve - https://phabricator.wikimedia.org/T418722#11697942 (10DPogorzelski-WMF) [14:44:04] 06Machine-Learning-Team: Incident: 2026-02-23 ml-serve - https://phabricator.wikimedia.org/T418722#11697948 (10DPogorzelski-WMF) 05Open→03Resolved a:03DPogorzelski-WMF [15:26:58] 06Machine-Learning-Team: Experiment with new kserve version on stagin - https://phabricator.wikimedia.org/T419722 (10DPogorzelski-WMF) 03NEW [16:44:36] elukey provisioned ml-serve1014 & ml-serve1015, which necessitaed this patch, https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1249973 [16:44:48] I would like to test the patch, is either host still available? [16:47:42] jhathaway: they are in role insetup (kinda, with gpu capabilities but nowhere near production). The IRC channel is not patrolled often, go ahead, 100% sure it won't be an issue [16:48:04] elukey: thanks, will do [16:53:37] (03PS3) 10Kgraessle: Expose the revert risk language agnostic prediction boolean via the RecentChanges API [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1248799 (https://phabricator.wikimedia.org/T407552) [17:02:40] (03CR) 10Kgraessle: Expose the revert risk language agnostic prediction boolean via the RecentChanges API (0311 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1248799 (https://phabricator.wikimedia.org/T407552) (owner: 10Kgraessle) [17:03:19] (03CR) 10Kgraessle: "while refactoring this, I found a bug where you cannot make an API request with both oresreview and revertrisklanguageagnostic, so that wi" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1248799 (https://phabricator.wikimedia.org/T407552) (owner: 10Kgraessle) [17:31:31] (03PS1) 10AikoChou: revise-tone-task-generator: always process edits from testwiki [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1250646 (https://phabricator.wikimedia.org/T416904) [17:41:17] (03CR) 10Scardenasmolinar: [C:04-1] "Explicitly adding a -1 so we don't merge this until the bug fix based on @kgraessle@wikimedia.org's comments" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1248799 (https://phabricator.wikimedia.org/T407552) (owner: 10Kgraessle)