[04:02:10] 10Machine-Learning-Team, 10ORES, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Estimate how many Wikidata items have low/no ORES score - https://phabricator.wikimedia.org/T288262 (10AKhatun_WMF) @ACraze Indeed! I was confusing the models for revision (item quality) with edits (... [10:01:41] hello folks [10:01:43] I found https://redis.com/blog/feast-with-redis-tutorial-for-machine-learning/ [10:01:46] that is very interesting [10:02:23] and also https://redis.com/blog/building-feature-stores-with-redis-introduction-to-feast-with-redis/ [10:10:23] I have a more clear picture of what feast needs. IIUC: [10:11:57] - the online feature store part can handle multiple redis nodes, but they need to be in a Redis Cluster. In our case we could use a proxy like https://github.com/twitter/twemproxy or similar, that takes care of the sharding (so in the feast config we could simply add a single endpoint). No idea how complicated is to bootstrap a redis cluster though. [10:12:46] - the feast feature registry seems to need an object store to save info, like S3 (we could probably use swift). There is also the "local" option but I fear that it is not for production. [10:13:38] - there is also the possibility (optional) to have the Feast Online Serving API part, that should be the light java stack that Theo mentioned (a proxy between feast client and Redis IIUC) [10:14:01] -- [10:14:24] in our case we could probably materialize data periodically to the online feature store using airflow [10:14:31] (not running on kubernetes) [10:16:37] (because of the kerberos barrier) [10:17:14] from what I can see it seems that Feast needs the offline part to work as well, for example: [10:17:50] 1) a feast client running on airflow loads data in a dataframe using spark or similar (in the example it uses pandas, but it supports spark as well). [10:18:08] 2) the same client materializes the data to the online redis nodes [10:18:21] 3) the feast client on lift wing pods fetches the data [10:22:31] the above looks very nice and doable, the only horror that we'll need to solve is authentication of pods training models on the trainwing cluster [10:22:52] because IIUC those pods will run a feast client, that in turn uses spark or similar to load data into dataframes [10:23:21] the offline part in our case would be basically the DE infrastructure [10:23:33] (either spark loading dataframes or us fetching data from Hive) [10:23:42] -- [10:24:18] my 2c - we need to spend time on the feature store, but for the MVP use case and the immediate concerns, like ORES models, we don't really need it [10:24:30] we need something like a score cache [10:24:43] (that could live in the same redis clusters(s)) [10:25:44] so on the procurement front, we could unblock the orders or 3+3 nodes (eqiad/codfw) for the online feature store / score cache [10:26:14] and avoid any order (for the moment) for the offline use case (we have two generic nodes but at this point not sure if we need them) [10:26:39] the online eqiad/codfw caches/clusters could easily sync via redis replication [10:26:50] so that we'd need to load only one cluster (likely the eqiad one) [10:27:05] EOF [10:27:06] :) [10:27:24] I'll add my thoughts to the task [11:39:36] 10Lift-Wing: Implement an online feature store - https://phabricator.wikimedia.org/T294434 (10elukey) Reporting some thoughts from IRC: ` hello folks I found https://redis.com/blog/feast-with-redis-tutorial-for-machine-learning/ that is very interesting and also https://redis.com/blog/building-feature-stores-wi... [12:03:44] * elukey lunch [13:11:05] I am rebooting the orespoolcounter nodes in eqiad (requested by SRE) [13:11:12] one at the time, with some delay [13:16:44] https://www.applyconf.com/apply-meetup-february-2022/#agenda [13:16:51] Using Redis as your Online Feature Store: 2021 highlights & 2022 directions [13:17:08] Twitter’s Feature Store Journey [13:17:09] etc.. [13:17:11] :D [13:17:24] worth to join [13:33:54] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Fix pipeline image publishing workflow - https://phabricator.wikimedia.org/T297823 (10hashar) I have manually triggered CI postmerge builds on the latest change that touched .pipeline/config.yaml: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inf... [14:15:49] I have just tried to deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/753461 to ml-serve-eqiad [14:15:52] and it seems working fine [14:16:04] basically there will be one istio ingress pod for each k8s node by default [14:16:13] to avoid the extra kube-proxy hop [14:33:22] (also done for the cluster-local-gateway, all working) [15:22:49] another nice feast feature: https://docs.feast.dev/reference/alpha-stream-ingestion [15:22:57] (stream ingestion to online feature store) [15:45:44] morning! [15:46:59] o/ [15:55:32] 10Lift-Wing: Implement an online feature store - https://phabricator.wikimedia.org/T294434 (10calbon) Thanks for this Luca. I thought about it yesterday and came to a similar conclusion (score cache >> online feature store) but for slightly different reasons. Mainly, I think that you have done well exposing is t... [16:12:56] o/ [16:15:13] gm [16:15:44] dang elukey, you're really digging deep on feast! [16:16:03] thanks for the links :) [16:21:19] i agree with your take on needing a score cache for the revscoring models [16:22:03] accraze: sooo many things to read! [16:22:11] lololol [16:22:31] feast looks very cool [16:22:37] but also not super straightforward [16:24:29] yeah i think we should give ourselves time to dive deep on feast/feature stores etc [16:25:47] an online score cache won't be as complex and is pretty good for MVP [16:26:57] feature store is a nice to have for the revscoring models, but I imagine integrating it later with the transformers should be fairly straightforward [16:27:07] last famous words [16:27:08] :D [16:31:07] haha too true! [16:59:05] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Fix pipeline image publishing workflow - https://phabricator.wikimedia.org/T297823 (10ACraze) 05Open→03Resolved a:03ACraze Awesome thank you for all your help @hashar! Things look good on my end, going to mark this task as RESOLVED [17:18:57] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Release Pipeline (Blubber): Inference Service pipeline intermittent failures - https://phabricator.wikimedia.org/T298995 (10ACraze) I updated the base image to the most recent version of [[ https://docker-registry.wikimedia.org/buster/tags/ | buster ]] a... [17:21:04] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Draftquality transformer - https://phabricator.wikimedia.org/T298989 (10ACraze) a:03ACraze [21:00:23] So, twist in the scoring cache conversation. I was talking to Dan and Olja and they thought the caching a serving to scores might be something that would fit better on their team. Let's continue with the server procurement as we discussed (long story short, DE's potential boxes for a scoring cache are not approved by finance but ours are, so we should order them) but also klausman lets schedule a meeting with data [21:00:23] engineering to see how things might work out. [21:23:08] Ayup! [21:23:34] Currently digging through old Blue Note 7"s, will do so tomorrow :) [21:32:58] klausman: nice! blue note 7-inches sound like great way to chill out :) [21:33:13] SOme of them are noise fests, tho :) [21:33:44] And Christ, they used to cut them hot. Nearly no headroom. [21:34:29] yeah some of the older recordings are kinda brutal with the compression lol [21:35:56] and gain [22:03:11] wow! congrats to SiMaig for figuring out how to run revscoring on apple m1 https://github.com/wikimedia/revscoring/issues/310#issuecomment-1011277305 [23:28:28] nice!