[06:43:42] Good morning folks! [06:47:10] morning! [06:47:33] isaranto: how did we weekend go? (I noticed also major elections in greece, quite busy time for you :) [06:49:37] Hey all well! (apart from the elections 🙃) [07:09:07] I can definitely understand the feeling [07:45:45] 10Machine-Learning-Team: Update to KServe 0.11 - https://phabricator.wikimedia.org/T337213 (10elukey) [07:46:03] klausman: o/ interested in doing --^ ? [07:46:31] \p [07:46:33] 10Machine-Learning-Team: Update to KServe 0.11 - https://phabricator.wikimedia.org/T337213 (10elukey) https://github.com/kserve/kserve/releases/tag/v0.11.0-rc0 [07:47:24] elukey: do we have some more detailed docs for it? Then yes :) [07:47:40] klausman: what do you mean? [07:47:49] I've never done it before [07:48:07] yep I have outlined the steps in the task's description [07:48:17] (plus there are the old tasks if you want to follow a trace) [07:48:41] Ack. It's just that sometimes I am unaware of "unrelated repo #24" in such processes [07:49:43] sure sure but this is ok, you can ask/brainstorm/etc.. in here anytime [07:50:23] it is just to spread the knowledge about procedures, there is no real rush to upgrade, [07:50:51] Yeah, understood (and agreed re: spreading knowledge. this very conversation is a good indicator :)) [07:52:33] the procedure can be outlined in a wikitech page while we do the upgrade, in theory it may change every time so the previous tasks should be fine (to avoid stale wiki pages) but this is only my opinion, happy to add some docs if people feel so [07:52:56] Nah, I think as long as the old tasks are discoverable, that's fine [07:53:20] 0.11 is still not out, only rc candidates, but hopefully in these days something more stable will be out [07:54:39] reading the rc0 changelog atm [07:57:03] 10Machine-Learning-Team: Update to KServe 0.11 - https://phabricator.wikimedia.org/T337213 (10klausman) a:03klausman [08:40:19] elukey: also, thanks for the update re: reboots. I saw that the cookbook mentioned the daemonsets when draining the node, But I wasn't aware it would delay the reboot. In my experience, once all the on-daemonset pods were gone, there was one more 35s sleep and the it proceeded. [08:41:24] klausman: it happened to me a couple of times, not every time, I think it depends on what the k8s api answers (and the back-off retry in the cookbook etc..) [08:41:30] I was probably unlucky :) [08:41:48] I am also unsure what the 35s sleep is for [09:09:52] 10Machine-Learning-Team: create checklist before adding models to api gateway/prod - https://phabricator.wikimedia.org/T332711 (10elukey) @calbon I improved https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Hosting_a_model today, I think it should contain more-or-less a good starting point for whoeve... [09:09:54] folks I updated https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Hosting_a_model with more info [09:10:07] lemme know if you like it and if anything is missing [09:10:12] I can't think about more things [09:19:19] Nice list! I have one more which I will add and we can discuss about it. [09:20:41] yes please add/modify anything that you want [09:26:56] I added this [09:26:57] ```Is there an expected frequency in which the model will have to be retrained with new data? What are the resources required to train the model and what was the dataset size? [09:26:57] ``` [09:41:20] +1 thanks! [10:08:39] 10Machine-Learning-Team, 10Epic: Lift Wing improvements to get out of MVP state - https://phabricator.wikimedia.org/T333453 (10elukey) [10:08:52] 10Machine-Learning-Team, 10Epic: Add meaningful access logs to KServe's pods - https://phabricator.wikimedia.org/T333804 (10elukey) 05Open→03Stalled This is blocked until we upgrade to KServe 0.11 :) [10:31:14] * elukey lunch! [11:27:48] same [12:39:12] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399 (10Ottomata) @pfischer pasting some stuff from Slack here so it doesn't get lost. For the [[ htt... [13:09:51] 10Machine-Learning-Team, 10ORES, 10artificial-intelligence, 10ML-Governance, 10Documentation: Create citation transclusion template for ORES model cards - https://phabricator.wikimedia.org/T337242 (10kevinbazira) [13:24:14] 10Machine-Learning-Team, 10ORES, 10artificial-intelligence, 10ML-Governance, 10Documentation: Create citation transclusion template for ORES model cards - https://phabricator.wikimedia.org/T337242 (10kevinbazira) A citation template has been created and can be found here: https://meta.wikimedia.org/wiki/... [14:34:25] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Move Revert-risk multilingual model from staging to production - https://phabricator.wikimedia.org/T333124 (10elukey) Next steps: * Wait for https://gerrit.wikimedia.org/r/922073 to be reviewed, merged and deployed to the api gateway. * Test the new... [15:59:11] 10Machine-Learning-Team, 10Patch-For-Review: Create ORES migration endpoint (ORES/Liftwing translation) - https://phabricator.wikimedia.org/T330414 (10elukey) @isarantopoulos I have a proposal to ease the transition from ORES to ORES legacy, lemme know what you think: * We add support in ores-legacy to fetch/... [16:00:23] left some ideas in --^ about the ores cache etc.. [16:00:28] it is not pretty but it should work [16:05:16] just saw it, SGTM! [16:06:44] aha, I was thinking the same thing: I can't think of another way to meet the deadline for ORES deprecation other than adding a cache in ores legacy [16:07:52] isaranto: and also a priming stream sigh [16:07:57] we'll have to maintain both of them [16:12:01] I can try to add support for the ORES cache in these days if you are ok [16:12:29] we'll need to materialize the redis password somewhere in helm, then the python code to access the redis cache should be straightforward [16:13:32] 10Machine-Learning-Team, 10Patch-For-Review: Create ORES migration endpoint (ORES/Liftwing translation) - https://phabricator.wikimedia.org/T330414 (10isarantopoulos) I find the above idea as the best compromise at the moment as I can't think of another way to meet the deadline for ORES deprecation other than... [16:14:02] I added a comment. --^. nice approach Luca [16:14:36] what I believe we should avoid is increasing tech debt on the lift wing side e.g. add custom cache for specific model servers etc [16:15:21] perhaps tech debt is a wrong term, but I mean adding components that do not apply to LW as a whole rather some deployments [16:20:04] nono I agree 100% [16:23:22] let's see what others think during the team meeting :) [16:23:28] logging off for today folks! o/ [16:24:06] isaranto: btw, what kind of resource usage does that LLM you deployed in exp have? And what is its typical latency? [16:25:26] ---> https://phabricator.wikimedia.org/T333861#8863963 [16:26:30] ah, thx! [16:26:36] approx 8s but depends on the lenght of the generated result. resources are the same as revertrisk (6GB) but this is a small version of the model (1-2 GB model) [16:32:08] What do you think would be needed to drop latency to, say, 1s? [16:41:07] that is the $1M question :D [16:46:55] from the top of my head I can think 3 things regarding latency: [16:46:56] - Speed up through resources: here we're mostly talking about GPU but also utilizing more CPU and ram [16:46:56] - code improvements and inference optimization: we can try to see wether prepackaged model servers (serving runtimes -> https://kserve.github.io/website/0.8/modelserving/servingruntimes/) provide an improvement in this case over custom predictors and also if they can fit our use cases [16:46:56] - there is a tradeoff to be decided on all these models: since there are multiple versions of most LLMs ranging in size once needs to experiment with model size vs quality of outputs. Here there is also a tradeoff in the way these models generate outputs as there is a sampling phase in the end (more naive/greedy sampling means faster inference but worse results) [16:47:10] will add the above thoughts on the ticket as well [16:49:33] We talking about LLMs? I love LLMs [16:49:48] Especially really big ones with dubious legal issues to resolve [16:58:10] I really liked a tweet I remember seeing (dont remember the source unfortunately) saying that LLMs are the easiest to create a prototype but the most difficult technology to create an actual product [16:58:17] so many things to consider! [16:58:41] I've been working on the chatgpt plugin and OMFG the fact it doesn't act deterministically is driving me wild [16:58:56] Sometimes ChatGPT just decides to ignore everything and say whatever it wants [17:31:44] I'm guessing that it wont be as easy as setting a random seed :) [17:32:03] due to the complexity of the system (several caching layers involved) [17:45:17] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint), 10User-notice: Deploy "add a link" to 10th round of wikis - https://phabricator.wikimedia.org/T308135 (10KStoller-WMF) [21:47:10] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint), 10User-notice: Deploy "add a link" to 10th round of wikis - https://phabricator.wikimedia.org/T308135 (10KStoller-WMF) p:05Triage→03Medium