[08:53:26] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Set up the ml-cache clusters - https://phabricator.wikimedia.org/T302232 (10elukey) a:03elukey [08:53:53] \o Hey everyone,. I'll be out today (but still do that interview with Kevin and Aiko) [08:54:18] elukey: if you want anything reviewed, just ping me, and I'll take a look [08:56:05] klausman: o/ nothing urgent let's sync next week :) [08:56:13] aye, captain [08:57:03] I have started some work for the Cassandra cluster, so we'll be able to experiment with caching etc.. [08:57:15] the perfs of the current pods/models are not great :D [08:57:18] (but we knew it) [09:02:59] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Load test the Lift Wing cluster - https://phabricator.wikimedia.org/T296173 (10elukey) Things to do (in my opinion): * work on T302232 to add score caching * figure out if we can have async connections to the mw-api (so not blocking the main tornado worker,... [09:19:00] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Load test the Lift Wing cluster - https://phabricator.wikimedia.org/T296173 (10elukey) Keeping a note about https://github.com/kserve/kserve/blob/release-0.7/python/kserve/kserve/kfmodel.py#L52 We'd need to figure out if we are already using async or not. [09:21:59] * elukey afk [11:36:01] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10incubator.wikimedia.org: Integrate the model training and the deployment of "Add a link" to new Wikipedias exiting the Incubator - https://phabricator.wikimedia.org/T308146 (10Trizek-WMF) A few small wikis have been trained in [[ https://phabricator.wik... [14:06:21] Morning all! [14:15:09] morning :) [14:15:13] how did the presentation go? [14:16:52] o/ morning! [14:17:40] It went great! It was just a lightening talk so I decided to just talk about model cards. If I had more time I'd talk about our transition to Lift Wing but alas. [14:21:59] nice :) [14:22:11] I'll take a look next week to the other presentations [14:23:21] aiko: o/ I left some notes in https://phabricator.wikimedia.org/T296173#7943902 about possible perf bottlenecks in our kserve setup, nothing urgent of course but (next week) when you have time lemme know your thoughts [14:24:01] My theory is that we may not be using async when fetching from the MW API, ending up in blocking I/O [14:24:35] it will probably not change a lot in our performances, caching will do, but I think it could be an interesting investigation [14:25:32] Yeah [14:26:10] I am very ignorant about asyncio, but IIUC tornado (that is used by kserve) uses asyncio under the hood (at least in recent versions) [14:26:37] so maybe trying to figure out if/how `preprocess` could become a co-routine and leverage async could be good [14:29:00] https://github.com/kserve/kserve/blob/release-0.7/python/kserve/test/test_server.py is interesting, `preprocess` is not labeled as async but the tests don't do what we do, namely abusing it like it was a transformer :D [14:30:04] elukey: hmm that's very interesting. I'll look into it [14:30:30] aiko: super, looking forward to know your opinions :) [14:30:40] I did some basic perf test in https://phabricator.wikimedia.org/T296173#7941763 [14:30:46] the numbers are a little depressing :D [14:31:30] but I suspect that we are not using, or we need to learn better, kserve in the right way [14:45:12] going to the vet with my cat again, if we don't hear each other later have a good weekend folks :) [14:45:34] hope everything works out elukey! [14:45:52] it is a looong process :) [16:46:04] Have a good weekend Luca :)