[06:39:48] good morning [06:43:46] good morning [06:58:21] good morning :) [07:12:35] Hi folks o/ [07:18:02] 10Lift-Wing, 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Update revertrisk to kserve 0.15.2 - https://phabricator.wikimedia.org/T383119#11166129 (10BWojtowicz-WMF) 05Open→03Resolved [07:24:01] (03CR) 10Bartosz Wójtowicz: edit-check: Update locust tests (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1186482 (https://phabricator.wikimedia.org/T403378) (owner: 10Gkyziridis) [07:26:43] good morning [07:26:54] bartosz: o/ when you have a moment lemme know what you think about my comment in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1180823 [07:26:58] I think we are ready to merge [07:31:35] (03CR) 10Ilias Sarantopoulos: edit-check: Update locust tests (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1186482 (https://phabricator.wikimedia.org/T403378) (owner: 10Gkyziridis) [07:31:56] elukey: o/ Thank you, I think it makes a lot of sense to add this note for future us, I've just pushed a new patchset with it 🙌 [07:33:12] elukey: I don’t know when it happened, but I also started seeing 80GB GPUs as something “easy” to run with :D I think it’s mainly because 80GB A100 GPUs got popular and quite cheap to rent at ~0.7$/hour [07:39:40] bartosz: thanks! Merged :) very nice work! Now please make sure that everybody in your team starts using it :D [07:57:03] as FYI I am run provisioning on ml-serve1008/1009/1010 (previously depooled by Tobias) to fix some inconsistency in the BIOS settings that we detected, I'll repool them afterwards [07:59:00] (03CR) 10Abijeet Patro: [C:03+2] add filtering based on lead section size for article suggestions [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1186558 (https://phabricator.wikimedia.org/T403730) (owner: 10Nik Gkountas) [08:00:28] (03Merged) 10jenkins-bot: add filtering based on lead section size for article suggestions [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1186558 (https://phabricator.wikimedia.org/T403730) (owner: 10Nik Gkountas) [08:02:19] (03CR) 10Gkyziridis: edit-check: Update locust tests (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1186482 (https://phabricator.wikimedia.org/T403378) (owner: 10Gkyziridis) [08:19:57] (all done and repooled) [08:40:27] (03PS2) 10Tim Starling: Improve filter group tests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1186924 (https://phabricator.wikimedia.org/T224672) [08:46:24] 06Machine-Learning-Team: Experiment with amd-smi and the new AMD GPUs MI300x - https://phabricator.wikimedia.org/T403697#11166305 (10elukey) Installed amd-smi-lib 6.3 manually on ml-serve1013 (we had it in our repos) and this is the result: ` elukey@ml-serve1013:~$ sudo /opt/rocm-6.3.0/bin/amd-smi set --memory-... [08:59:57] (03CR) 10Ilias Sarantopoulos: edit-check: Update locust tests (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1186482 (https://phabricator.wikimedia.org/T403378) (owner: 10Gkyziridis) [09:00:27] o/ we can now chose to use the MI210 GPU (which is compatible with our amd-pytorch image) instead of the older WX 9100 GPU by using `wmf_airflow_common.kubernetes.node_anti_affinity_for_hostnames()` in our ML training airflow pipelines: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1660 [09:07:38] 06Machine-Learning-Team: Experiment with amd-smi and the new AMD GPUs MI300x - https://phabricator.wikimedia.org/T403697#11166344 (10elukey) Downloaded amd-smi-lib and rocm-core 6.4.3 from upstream, installed them on ml-serve1013 but no luck: ` elukey@ml-serve1013:~$ sudo /opt/rocm-6.4.3/bin/amd-smi set --memor... [09:39:18] (03CR) 10Ladsgroup: [C:03+2] Improve filter group tests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1186924 (https://phabricator.wikimedia.org/T224672) (owner: 10Tim Starling) [09:49:26] 06Machine-Learning-Team: Evaluate adding caching mechanism for article topic model to make data available at scale - https://phabricator.wikimedia.org/T401778#11166489 (10BWojtowicz-WMF) >>! In T401778#11152043, @Eevans wrote: >>>! In T401778#11151147, @BWojtowicz-WMF wrote: >> Thank you for the discussion @Otto... [09:53:33] (03Merged) 10jenkins-bot: Improve filter group tests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1186924 (https://phabricator.wikimedia.org/T224672) (owner: 10Tim Starling) [09:58:53] 06Machine-Learning-Team, 06Moderator-Tools-Team: Use new methods to surface edits to moderators which may require their review - https://phabricator.wikimedia.org/T404174 (10Samwalton9-WMF) 03NEW [11:33:05] Hello, I have two MRs to apply wiki specific filters. For now, we exclude countries and continents from enwiki. I'll train enwiki after these changes. After training, I'll deploy it to staging if the evaluation scores are still good. So it will be ready for prod release. Can you take a look when you have time? @kevinbazira https://gitlab.wikimedia.org/repos/machine-learning/ml-pipelines/-/merge_requests/45 [11:33:05] https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1661 [11:33:30] ack..looking [13:13:26] 06Machine-Learning-Team: Experiment with amd-smi and the new AMD GPUs MI300x - https://phabricator.wikimedia.org/T403697#11167270 (10elukey) I found https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/gpu-partitioning/mi300x/requirements.html that lists a series of requirements, and the only one that do... [13:23:37] 06Machine-Learning-Team: Experiment with amd-smi and the new AMD GPUs MI300x - https://phabricator.wikimedia.org/T403697#11167310 (10elukey) I tried to install the packages but I ended up in https://github.com/ROCm/ROCm/issues/3036: > Consult /var/lib/dkms/amdgpu/6.12.12-2194681.22.04/build/make.log for more in... [13:24:08] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Airflow training pipeline for Tone check model - https://phabricator.wikimedia.org/T398970#11167313 (10kevinbazira) I pushed two MRs to implement the ML training pattern from T396495#11151194: 1. [[ https://gitlab.wikimedia.org/repos/machine-learning/ml-pipel... [13:26:30] georgekyz: o/ I have pushed an MR that adds the tone-check retraining DAG. [13:26:30] please review it whenever you get a minute: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1662 [13:26:30] thanks! [13:45:07] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Apply the Tone Check model to published articles, to learn whether we can build a pool of high-quality structured tasks for new editors - https://phabricator.wikimedia.org/T392283#11167427 (10Eevans) >>! In T392283#11164941, @Ottomata wrote: > G... [13:57:08] 06Machine-Learning-Team, 06Moderator-Tools-Team: Use new methods to surface edits to moderators which may require their review - https://phabricator.wikimedia.org/T404174#11167558 (10Samwalton9-WMF) [13:57:12] 06Machine-Learning-Team: Evaluate adding caching mechanism for article topic model to make data available at scale - https://phabricator.wikimedia.org/T401778#11167560 (10Eevans) >>! In T401778#11166489, @BWojtowicz-WMF wrote: > > [ ... ] > > @Eevans How should we go forward now? We've discussed a few changes t... [14:04:18] kevinbazira: I think we have some fortmatting issues in the tone-check ariflow DAG. I checked the code in the MR and looks good. Lets reformat the code with black and ruff and then I will review again. [14:04:26] kevinbazira: Thnx for working on this one! [14:06:24] (black + isort should do the trick) [14:16:00] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Apply the Tone Check model to published articles, to learn whether we can build a pool of high-quality structured tasks for new editors - https://phabricator.wikimedia.org/T392283#11167673 (10Ottomata) > an event-based approach would produce a b... [14:20:08] Hello, I have a small MR in airflow dags. https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1663 Can you take a look when you have time? @kevinbazira [14:20:23] ack...looking [14:22:14] thanks brouberol <3 [14:22:16] georgekyz: formatting issues fixed: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1662 [14:58:30] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Apply the Tone Check model to published articles, to learn whether we can build a pool of high-quality structured tasks for new editors - https://phabricator.wikimedia.org/T392283#11167951 (10dcausse) >>! In T392283#11167427, @Eevans wrote: >>>!... [16:05:56] 06Machine-Learning-Team, 06Moderator-Tools-Team: Use new methods to surface edits to moderators which may require their review - https://phabricator.wikimedia.org/T404174#11168273 (10Samwalton9-WMF) [16:29:58] 06Machine-Learning-Team: Experiment with amd-smi and the new AMD GPUs MI300x - https://phabricator.wikimedia.org/T403697#11168485 (10elukey) I tried to strace amd-smi and it gave some good insights: ` newfstatat(AT_FDCWD, "/sys/class/drm/renderD128/device/current_memory_partition", {st_mode=S_IFREG|0444, st_siz...