[03:36:43] 10Lift-Wing: Workflow to upload models to Swift - https://phabricator.wikimedia.org/T294409 (10ACraze) [03:36:46] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Naming convention for the model storage structure - https://phabricator.wikimedia.org/T280467 (10ACraze) [07:20:34] 10Lift-Wing, 10Machine-Learning-Team: Sunset MiniKF sandboxes - https://phabricator.wikimedia.org/T293677 (10elukey) >>! In T293677#7472727, @ACraze wrote: > Echoing thoughts from the ML team meeting today. I'd like to deprecate the sandbox clusters this week, but I think both Kevin and myself have similar qu... [07:22:12] good morning! [07:22:31] patch for kserve merged, in 0.8 we'll have better messages etc.. for the storage initializer [07:22:48] they are also working on an helm chart for kserve afaics [08:40:30] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): API Gateway Integration - https://phabricator.wikimedia.org/T288789 (10elukey) First set of patches merged, we have now a global map for URL/Host-header rewrites per cluster in the API-Gateway's envoy config. Next steps: 1) Try to understand how to add a si... [09:15:34] let's see if https://github.com/tensorflow/io/issues/1548 gets reviewed/merged (maybe it needs a few more iterations) [09:15:57] it should allow us to use tensorflow-rocm with tensorflow-io when we upgrade to 2.6 [09:16:31] the newest 4.3.1 ROCm drivers work only with tf 2.6 and there was a big split of functionalities, hdfs libs went out to tensorflow-io [09:16:48] that hardcodes a dep to tensorflow (not rocm), that doesn't play nice with tensorflow-rocm, etc.. [09:32:15] also opened https://github.com/magenta/magenta/issues/1958 for Magenta [09:32:24] let's see what they say [09:51:56] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Add network policies to the ML k8s clusters - https://phabricator.wikimedia.org/T289834 (10elukey) Added some labels to the eqiad cluster: - `node-role.kubernetes.io/master=""` and `node.kubernetes.io/disk-type=kvm` to ml-serve-ctrl1001 - `node.kuber... [09:52:27] * elukey bbiab [14:21:23] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: API Gateway Integration - https://phabricator.wikimedia.org/T288789 (10elukey) I was wrong about the Kubelet labels bit, see https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/New#Add_node_labels [14:47:55] 10Lift-Wing: Factor out feature retrieve functionality to a transformer - https://phabricator.wikimedia.org/T294419 (10elukey) Relevant for the discussion T294414 I am waiting a bit to see if we can share these functionalities (transformer + local proxy), or if every kserve pod will need to have both (I think t... [15:42:48] I was reading https://www.waitingforcode.com/general-data-engineering/feature-stores-feast-example/read [15:43:07] and feast + airflow for us may be really nice to work with [15:43:25] the nice part is that feast itself is, IIUC: [15:43:29] 1) client code [15:43:50] 2) lightweight daemons for metadata (for example, a jvm in front of redis for online) [15:44:30] so via airflow we'll be able to periodically use spark to create our feature datasets in the offline feature store (HDFS/Hive) and also to load datasets to the online feature store if we want (Redis) [15:44:57] and this will not be on kubernetes, but basically on the DE platform, so no kerberos weirdness [15:45:19] the main issue will be, from kubeflow on ml-train k8s, to authenticate via kerberso [15:45:22] *kerberos [15:45:45] so the feature store is, almost surely, something that we can work ahead of time before ml-train [17:27:35] * elukey afk!