[06:54:45] <ozge_>	 good morning.
[06:58:02] <isaranto>	 good morning!
[07:06:56] <bartosz>	 good morning :) 
[08:15:43] <wikibugs>	 (03PS9) 10Bartosz Wójtowicz: outlink-topic-model: Merge transformer and predictor pods. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1187739 (https://phabricator.wikimedia.org/T404294)
[08:20:02] <bartosz>	 ^ the patch for combining transformer and predictor pods is ready for review again. I have kept the transformer code as-is along with its blubber setup to keep the CI happy, but preprocessing functionality has been already added to the predictor part so the whole service can be run as a single pod.
[09:00:04] <isaranto>	 can someone review that please? 
[09:00:48] <isaranto>	 bartosz: iiuc in https://phabricator.wikimedia.org/T401778 we need to provide a final summary of the discussion so that Data persistence team can proceed. is that correct?
[09:05:11] <klausman>	 I can review the pipeline bits, but I'd rather have an MLE look at the Python parts. I know Python well enough, but not necessarily FastAPI et al
[09:06:31] <bartosz>	 isaranto: Yes, I'm working on the final design proposal including all discussed points
[09:07:09] <isaranto>	 ack, thanks!
[09:08:33] <wikibugs>	 (03CR) 10Klausman: [C:03+1] "LGTM for everything about the Python source changes, I defer to proper MLEs for review on those. 😊" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1187739 (https://phabricator.wikimedia.org/T404294) (owner: 10Bartosz Wójtowicz)
[09:21:38] <elukey>	 klausman: o/ when you have a moment could you complete the rollout of https://phabricator.wikimedia.org/T398600 ?
[09:21:53] <elukey>	 not urgent, even next week
[09:22:02] <klausman>	 ack, will do
[09:22:43] <elukey>	 super thanks
[09:25:02] <bartosz>	 isaranto: I think there’s still one thing I’d like to discuss with the team, I was thinking about todays ML meeting, but we can also discuss it in IRC - using page_id vs page_title in the article topic model. We’ve talked about this and I explored it - technically it's easy to modify model code to use page_id instead of title when searching for outlinks
[09:25:18] <bartosz>	 I’m wondering if we should do it before introducing caching as well - this would allow us to use page_id as cache index and it’d be easier to do backfilling as the current hive snapshots also use page_id as index
[09:25:46] <bartosz>	 there are also questions on how would we rollout this change: do we want to support both page_id and page_title parameters or would we only support page_id?
[09:33:06] <isaranto>	 bartosz: I don't think there is a reason to change the existing functionality. We can just allow both options via different post arguments (page_title & page_id) so there would be no need for migrating existing users
[09:34:37] <elukey>	 klausman: the kernel 6.16 should be available in backports for trixie, ok if I reimage ml-serve1012 to clean up the current state?
[09:35:03] <isaranto>	 I wrote a comment on that task a long while ago https://phabricator.wikimedia.org/T371021#10170457. We can change the title & description of the task to reflect that we are not switching. If you tested it please add your input on the task and we can go ahead and implement that
[09:35:17] <elukey>	 https://packages.debian.org/trixie-backports/linux-image-amd64
[09:35:25] <klausman>	 elukey: yep, sgtm
[09:41:40] <bartosz>	 isaranto: I see, will add a comment there! I can see one potential downside of this approach, not sure yet how big it is - for every request using page_title, we’d need to do 1 additional query to mwapi to get the page_id from page_title so that we can use it in cache. If YiR would query based on page_title, this additional query could slow down our throughput 
[09:49:12] <isaranto>	 let's coordinate with the Apps team and use whatever they are going to use
[09:50:08] <isaranto>	 when using the cache we don't want to make any requests at all
[09:52:23] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team: [articletopic-outlink] fetch data from mwapi using revid instead of article title - https://phabricator.wikimedia.org/T371021#11184095 (10BWojtowicz-WMF) I've tested the option to use `page_id` in the model and found out that it's straightforward to modify the current outl...
[09:53:56] <bartosz>	 isaranto: I agree, will ask under the YIR goal ticket 
[09:56:08] <isaranto>	 thanks! you can ping Dbrant , I think he is the lead engineer on that project
[10:10:41] <wikibugs>	 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Make article topic data available at scale and within SLOs for Year in Review - https://phabricator.wikimedia.org/T392833#11184216 (10BWojtowicz-WMF) Hello @Dbrant!  We have 1 technical question about the way Apps side will query our LiftWing model to retriev...
[11:07:22] * klausman lunch
[12:02:17] <elukey>	 ml-serve1012 seems stuck in booting and a powercycle gets stuck as well..
[12:02:20] <elukey>	 I'll check after lunch sigh
[12:27:13] <wikibugs>	 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Enable volunteer evaluation of Tone Check model in additional languages - https://phabricator.wikimedia.org/T400423#11184761 (10gkyziridis) ==Update==  **Datasets uploaded for the following wikis:** | Wiki           | Project number | Translations | Labels ad...
[12:54:22] <wikibugs>	 06Machine-Learning-Team: Fix CI/CD on ml-pipelines repository - https://phabricator.wikimedia.org/T404717 (10gkyziridis) 03NEW
[13:54:26] <wikibugs>	 06Machine-Learning-Team, 07Essential-Work: Incorporate notebook into Tone-Check data generation ml-pipeline - https://phabricator.wikimedia.org/T404722 (10kevinbazira) 03NEW
[14:05:19] <wikibugs>	 06Machine-Learning-Team, 07Essential-Work: Incorporate notebook into Tone-Check data generation ml-pipeline - https://phabricator.wikimedia.org/T404722#11185239 (10kevinbazira) Since the test/dev iteration cycles take a really long time, I added development limit ([[ https://gitlab.wikimedia.org/kevinbazira/ml...
[14:13:19] <wikibugs>	 06Machine-Learning-Team, 07Essential-Work: Incorporate notebook into Tone-Check data generation ml-pipeline - https://phabricator.wikimedia.org/T404722#11185284 (10kevinbazira) I ended up removing the dev-limits as the small sample size results in no rows making it through the end of the pipeline as shown belo...
[14:40:00] <wikibugs>	 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Airflow training pipeline for Tone check model - https://phabricator.wikimedia.org/T398970#11185395 (10kevinbazira) Started working on tone-check data generation job logic in T404722: * test/dev iteration cycles take a really long time * added logs at major p...
[14:44:29] <wikibugs>	 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11185422 (10Eevans)
[14:44:51] <wikibugs>	 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review: Data Persistence Design Review: Article topic model caching - https://phabricator.wikimedia.org/T402984#11185425 (10Eevans)
[15:40:40] <wikibugs>	 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Make article topic data available at scale and within SLOs for Year in Review - https://phabricator.wikimedia.org/T392833#11185679 (10Ottomata) @BWojtowicz-WMF we should probably sync up about this kind of requirement (and also data modeling when you work on...
[15:50:12] <wikibugs>	 06Machine-Learning-Team: Experiment with amd-smi and the new AMD GPUs MI300x - https://phabricator.wikimedia.org/T403697#11185767 (10elukey) @klausman ml-serve1012 is up and running with 6.16 from backports, and nvtop seems to work without horrors in the dmesg. Also please note that `rocm-smi` is now `/opt/rocm-...
[19:24:46] <wikibugs>	 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Make article topic data available at scale and within SLOs for Year in Review - https://phabricator.wikimedia.org/T392833#11186679 (10Dbrant) >>! In T392833#11184216, @BWojtowicz-WMF wrote: > To make sure we optimize our solution for Year in Review processing...