[08:41:51] 10Machine-Learning-Team: Add support for REDIS to ores-legacy - https://phabricator.wikimedia.org/T337287 (10elukey) [08:44:54] isaranto: [08:44:56] o/ [08:45:08] I found a nice way to pass the redis password to the ores-legacy pods [08:45:26] it will be materialized as env variable ("REDIS_AUTH") [08:45:38] so in the code we just os.env.get() etc.. and we should be ok [08:45:48] I'll check how ores uses the redis cache and then file a chance [08:46:02] Yeah sounds great [08:46:11] I can help as well,let me know [08:46:12] (first step read-only just to be sure) [08:50:30] 10Machine-Learning-Team: Add support for REDIS to ores-legacy - https://phabricator.wikimedia.org/T337287 (10elukey) Found a nice way to materialize a env variable in the ores-legacy pods via the service scaffolding secret support. We just need to add the following bits to our ores-legacy private config (in pupp... [08:55:06] and https://github.com/wikimedia/ores/blob/master/ores/score_caches/redis.py seems to be the ores implementation [08:55:16] TTL = 60 * 60 * 24 * 365 * 16 # 16 years [08:55:17] lol [08:56:13] and the package used should be https://pypi.org/project/redis/ [09:00:35] also https://redis.readthedocs.io/en/stable/examples/asyncio_examples.html seems nice isaranto [09:25:51] and the format in redis seems really straightforward [09:26:04] localhost:6378> get ores:itwiki:goodfaith:133595563:0.5.0 [09:26:04] "{\"prediction\": true, \"probability\": {\"false\": 0.02953874448878846, \"true\": 0.9704612555112115}}" [09:27:25] 10Machine-Learning-Team: Add support for REDIS to ores-legacy - https://phabricator.wikimedia.org/T337287 (10elukey) Checked the ores cache format: ` localhost:6378> get ores:itwiki:goodfaith:133595563:0.5.0 "{\"prediction\": true, \"probability\": {\"false\": 0.02953874448878846, \"true\": 0.9704612555112115}}... [09:31:00] Amir1: o/ (if you have time) - I am checking the ores pool counter code, do you recall why it was used? To limit incoming traffic or for the various caches (to avoid stampedes etc..) [09:37:20] elukey: hi. Still at Athens, it was needed for rate limiting. Up to four per IP could make reqs, up to six would be slow (we would hold until one of the reqs free up) and Six onwards would simply be rejected [09:39:35] Amir1: thanks a lot! Enjoy Athens :) [10:08:23] folks Alex is rebooting some redis instances, including the ores cache ones, so we'll have a 5 min outage [10:10:17] \o [10:13:17] ack [10:22:44] I am wondering if ores caches also aggregate responses (like multiple models etc..) [10:23:09] so far it doesn't seem so, but it should follow this road given the cached values [10:24:54] (03PS36) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) [10:26:21] if it doesnt cache the aggregate responses then it much closer to lift wing [10:26:24] (03CR) 10CI reject: [V: 04-1] feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [10:27:02] elukey: AIUI, clasically, the ORES cache was primed by the stream of changes, and thus that's where I'd put generation of aggregate cache entries in that design. For LW, I'm less sure [10:27:07] I mean to how we could implement caching on the lift wing side. But for ores legacy ofc it would be faster to cache the request [10:29:56] also it may have to do with the anticipated form of requests: by caching on the model level (instead of request) you will probably have a better over effect in latency as you apply caching to more requests. If I understand the idea could have been the following: stream makes a call to ores -> calculate response ->put in Redis and then this cache will be used in any call, thus making it overall faster for all users [10:32:28] as far as I know it is not changeprop that inserts into redis, only ORES [10:33:10] changeprop simply calls the /precache endpoint with a rev-id and models/etc.. as parameters [10:34:53] yes ORES inserts into redis the first time it gets a request that isnt cached [10:37:23] Ah, so if Redis contains/should contain aggregates, it's ORES's job to put them there. [10:37:48] so if changeprop is the first one to hit the precache endpoint for specific model/wiki/revid triggers model scoring and redis insertion [10:38:21] w8 need to check the code to verify [10:45:56] there is probably some horror aggregation happening at /precache's code, I'll check later on [10:48:32] I wonder if running a thin service in front of actual inference services could help with this. I.e. you call endpoint A, asking for rev 123, and it _in parallel_ calls endpoints B and C with that rev. It could then put the results (including the aggregate) in some cache (and use that cache on the next call) [10:57:34] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10PatchDemoBot) Test wiki **created** on [[ https://patchdemo.wmflabs.org | Patch demo ]] by ISarantopoulos-WMF using patch(es) linked t... [10:58:27] here's a mediawiki deployment that uses Lift Wing 🎉 https://patchdemo.wmflabs.org/wikis/55c5454d6e/wiki/Special:RecentChanges?hidebots=1&limit=50&days=7&enhanced=1&urlversion=2 [11:13:21] <- lunch and a walk [11:48:23] (03PS1) 10AikoChou: revert-risk: fix session host for the wikidata model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/922498 (https://phabricator.wikimedia.org/T333125) [12:10:28] (03CR) 10Kevin Bazira: [C: 03+1] revert-risk: fix session host for the wikidata model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/922498 (https://phabricator.wikimedia.org/T333125) (owner: 10AikoChou) [12:35:54] isaranto: niceeeeeee [12:37:26] I got an in person review from Amir in Athens regarding the other patch with the thresholds. Ill write the comments on the task [12:37:58] basically he suggested to create configuration files instead of loading the responses from files [12:38:34] but now I am wondering where should I commit these configuration files, and how am I going to test it 🤔 [12:39:01] probably these would have to go here and not in the ores extension repo https://github.com/wikimedia/operations-mediawiki-config/blob/e0d5e4c0a3ba258eb3ac7de2cba0cc71f62fdcf9/wmf-config/ext-ORES.php#L4 [12:40:35] Amir1: regarding our discussion and configuration files, should I add another file in here https://github.com/wikimedia/operations-mediawiki-config/blob/50cfeefd26383b8618ccb2ae7ff2cc1a553947fc/multiversion/MWConfigCacheGenerator.php#L100? [12:53:10] isaranto: so my first read of the ores code seems to suggest that ores stores single scores in redis, imports them as dict (via json.loads()) and then reformats the final response [12:53:17] so similar to Lift Wing [12:54:22] adding the score cache to ores-legacy it is probably straightforward [12:54:55] so far I am thinking to expand make_liftwing_calls to use the ores redis cache [12:55:34] because IIUC you already sort out and aggregate response [12:55:36] *resposes [12:55:42] does it make sense? Am I crazy? [12:58:23] (03CR) 10Elukey: [C: 03+1] revert-risk: fix session host for the wikidata model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/922498 (https://phabricator.wikimedia.org/T333125) (owner: 10AikoChou) [12:58:27] if we want to do it per call then adding something like a decorator to the get_liftwing_response would make sense https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/ores-migration/app/liftwing/response.py#L15 [12:58:29] wdyt? [12:59:30] isaranto: much better yes, I can try [12:59:53] isaranto: for read only it works ok, but when we need to set a value? [13:00:24] since I'd probably need to store https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/ores-migration/app/liftwing/response.py#L35 [13:00:31] (to follow what ores does) [13:01:36] ah right it is returned by the function, so the decorator should see it [13:01:38] so: [13:01:42] 1) check redis [13:01:46] 2) make lift wing call [13:01:47] what I have in mind to leave the rest of the code intact- def there are many ways to do this is the following: [13:01:49] 3) store redis result [13:02:16] yes the steps like u mention them similar to this wrapper:https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/ores-migration/app/utils.py#L81 [13:02:55] okok, we'd need to avoid storing error responses but it should be doable and clean [13:03:05] 1) check redis result for tuple (wikiID, revid, model) if it exists return cached result otherwise make liftiwing call and store in redis [13:07:59] a similar approach could then be followed in lift wing - caching the predict step in a similar fashion (or even preprocess) [13:14:21] (03CR) 10AikoChou: [C: 03+2] "Thanks for the review!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/922498 (https://phabricator.wikimedia.org/T333125) (owner: 10AikoChou) [13:20:34] (03Merged) 10jenkins-bot: revert-risk: fix session host for the wikidata model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/922498 (https://phabricator.wikimedia.org/T333125) (owner: 10AikoChou) [14:00:08] (03PS37) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) [14:01:52] (03CR) 10CI reject: [V: 04-1] feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [14:04:13] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10isarantopoulos) I have provided a solution to the above issue is available with two different ways, both of which are available in : h... [14:07:53] isaranto: stupid qs - how do I run the pytests for ores-legacy? [14:08:34] there is no such thing as stupid question [14:08:44] u know it but u make me say it :) [14:08:51] w8 [14:10:01] I should have added it in README.md [14:10:21] running `pytest` from ores-migration dir should do it [14:10:55] ah ok so I create a venv etc.. [14:11:21] not sure if we can have tox do it, or maybe it is too complex for our inference-service use case [14:18:54] yes ofc we can [14:19:11] I'll check, things are already there , perhaps not configured though to run the tests [14:39:56] everything is there it just needs a bit of tweaking [14:40:23] currently I figured out some requirement issue [14:40:52] but I can take the time another day to add running unit tests in CI properly. [14:47:55] yes yes please didn't mean to derail your work, not urgent, I can use a venv :) [14:48:51] 10Lift-Wing, 10Machine-Learning-Team: Deploy Revert-risk wikidata model to ml-staging - https://phabricator.wikimedia.org/T333125 (10achou) The model inference server has been deployed to ml-staging: Test the model: ` aikochou@deploy1002:~$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/rever... [14:48:53] what are your (plural) thoughts on having a thin shim service that can combine/parallelize multiple scores for one rev? Like the WME use case. It'd be a super simple services with no state (but potential for caching). [14:50:38] klausman: atm I'd be in favor of building a ores-like-cache in ores-legacy, and then figure out a more generic and better caching (if any) for Lift Wing itself [14:51:41] basically just add caching to what Ilias already created (that takes care of firing multiple lift wing calls, join them into a single response etc..) [14:52:04] so we don't have to write nothing new for ores-legacy, that I hope we'll decom as well during this fiscal [14:52:40] for a long term vision of lift wing caching etc.. go ahead and propose to the team, more than happy to support it [14:52:50] (hope it makes sense) [14:54:57] also Ilias please comment :) [14:57:33] It's just an idea, I am not sure we should support it beyond making ORES transitions easier. Overall, I think clients making the calls in parallel and synthesizing what they need scales better than services doing it in 99% of cases (I can't think of a case for the 1%, but I can't rule it out, either). [14:58:19] Another point is that the now-split Kafka topics indicate that this view (every score stands by itself, so to speak) is the cleaner approach. [14:58:50] I fell there is some isomorphism to databases having orthogonal/normalized schemata here [15:02:08] klausman: what I tried to say above is that Ilias' code already makes concurrent calls (via asyncio) in ores-legacy [15:02:41] and aggregates the responses in one if needed [15:03:37] Yeah, I think we should only do it for a non-ORES service if there is a compelling reason for it. [15:04:03] sorry, rambling again :) [15:05:49] nope talking out loud is good, we'll design better systems :) [15:06:43] (03PS1) 10Elukey: WIP - Add read-only cache support to ores-legacy [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/922561 (https://phabricator.wikimedia.org/T337287) [15:06:47] How do on-LW services connect to other on-LW services? Still through Discorvery+PyBal? Or could you connect to a specific pod easily? [15:08:13] discovery+pybal ends at the istio gateway "border", then everything is handled by the istio mesh [15:08:42] so pod -> pod is more pod -> k8s svc -> pod most of the times, and external calls to services are handled transparently etc.. [15:09:04] kserve has a thing called "inference graph" to connect multiple isvcs between each other [15:09:05] Neat [15:09:22] https://kserve.github.io/website/0.10/modelserving/inference_graph/#headers-propagation [15:09:32] it may be really interesting to explore for future use cases [15:09:43] (afk for a bit) [15:09:49] Another question re: caching: what is the expiry/caching strategy for redis? Expire the LRU entries? [15:11:04] I am up for taking different approaches in ores chaching and revisit lift wing later on. LW caching is per use case. For some models a cache wouldn't make sense but for others where some queries/calls/inputs repeat it is great [15:11:17] AIUI, the freshness of the cache in the old scheme was mostly due to all (most) recent revisions being hit through /precache, and so the older revisions probably didn't survive in the cache for very long. So I wonder what the best strategy is for the future. [15:12:16] klausman: it seems that there is a TTL in the ores cache set to 16 years in python code - but I am not sure if this is the actual TTL used [15:12:37] Yeah, might be a default that should "never" be used. [15:56:37] Logging off folks.hve a nice evening/rest of day [16:00:18] IIUC the 16y TTL is used [16:04:46] localhost:6378> TTL ores:itwiki:goodfaith:133595563:0.5.0 [16:04:47] (integer) 504318043 [16:04:47] Let's wait a bit to test it.. [16:04:49] lol [16:04:51] :) [16:05:26] but if the LRU gets full then old keys are dropped [16:28:17] Yeah, I guess it's a last line of defense kinda thing, not something that regularly gets applied. [16:41:45] going afk in a bit folks, have a nice rest of the day! [17:37:35] 10Machine-Learning-Team, 10Patch-For-Review: Host open source LLM (bloom, etc.) on Lift Wing - https://phabricator.wikimedia.org/T333861 (10isarantopoulos) Added the above patch to deploy bloom-3b model https://huggingface.co/bigscience/bloom-3b Since it requires additional resources I also increased the limi...