[02:45:32] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Halfak) Here's the importance table. The higher the importance score, the more important the value is to the predictio... [05:21:20] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Production images for ORES/revscoring models - https://phabricator.wikimedia.org/T279004 (10elukey) Little nit: Bullseye is still not officially released, so our images are based on what upstream offers right now. It seems fine to keep using those, but there... [06:16:56] 10Lift-Wing, 10artificial-intelligence, 10revscoring, 10Machine-Learning-Team (Active Tasks): Create generic revscoring inference service - https://phabricator.wikimedia.org/T283526 (10kevinbazira) Yes, this makes sense. Thank you for thinking through the repo structure @ACraze. The previous structure was... [08:26:34] Hello all , [08:26:36] I am a [08:29:53] Anubhav Sharma . The Google summer of Code intern for this year . I will be working on the Lift wing under the mentorship of chrisalbon . It would be a pleasure of mine to work with you all . Hope I could be productive and can contribute to the project . Thank you [08:32:45] \nick anubhav-sharma13 [08:38:15] anubhav-sharma13: hi! Welcome! (Luca, SRE) [08:38:35] please let us know if you need anything and/or if you are blocked with access requests etc.. [08:39:09] yes I was previously but now I got the access once I re-registered on irc . I am still new to it so might face some issues . [08:42:17] yes yes please take your time, no rush :) If you encounter some wmf-access-specific trouble let us now [08:42:34] (I imagine that you'll need an account etc.. not sure if Chris already shared some details) [08:43:42] \nick anubhav_sharma [08:44:38] Thank You elukey . I will let you know if I face any issue regarding that [11:40:31] * elukey lunch! [15:43:53] today I learned something interesting (could be obvious to people but I got it only today :D) [15:44:14] istioctl is not coupled with istio-operator, they are different way to do the same thin [15:44:27] *thing (namely implement an operator) [15:44:31] https://istio.io/latest/docs/setup/install/operator/ [15:44:57] so I created https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/697938 as proposal for SRE [15:45:14] it needs to be coupled with a debian package that provides istioctl binaries (for the various versions) [15:45:25] (that should be very easy to make) [15:45:33] let's see what SRE thinks about it [16:15:23] ^ elukey: this interesting, i think i was under the assumption that istioctl and the operator were much more tightly coupled [16:17:06] accraze: o/ yes me too! Instead they are basically separate things doing the same [16:17:33] and it makes sense - in minikube I don't see the istio-operator pod [16:18:02] so istioctl is basically a glorified helm wrapper [16:18:51] oh i see, that actually kinda makes sense now [16:19:46] so in some sense, we don't really need istioctl now? [16:20:15] in theory we could map the helm charts directly to our deployment-charts repository, and deploy those [16:20:42] ahh ok i gotcha, yeah that would be cool! [16:21:23] but after a chat with various people in SRE (Giuseppe, Tobias, Alex, Janis, etc..) I realized that there is always the risk that upstream changes something that is painful to map to helm charts [16:21:46] istioctl for us is basically a black box, we know that helm is used but they could swap it with anything [16:22:05] especially for upgrades, mapping every time the heml charts etc.. might not be fun [16:22:25] so istioctl may be a good compromise for the moment (not sure if it makes sense) [16:23:39] hmm, yeah that makes sense. also, there are plans for istio & knative to be optional in the kfserving stack in the future, so maintaining a custom map to helm charts might be more work than it's worth [16:23:54] exactly that too [16:24:27] knative and cert-manager worries me a bit since they have very long yaml files with a ton of custom resources rbacs etc.. [16:24:48] so those will probably need helm charts [16:24:55] same thing for kfserving [16:28:26] which helm version are we using at the foundation? helm3 (the one w/o tiller)? [16:28:31] yep! [16:28:52] I mean, most of the services in kubernets are still using tiller + helm 2, but the standard is 3 [16:29:02] (there will be a slow move to helm 3) [16:29:58] ok cool, im trying to start getting my dev tooling to match our internal setup [16:30:41] also working on using a base image from the wmf docker registry for the inference services [16:30:51] perfect :) [16:31:39] iirc the ores models were mostly developed on jesse and had some issues upgrading [16:32:16] for the pickle serialization? [16:32:24] i believe so [16:32:43] will be testing that a bit more today [16:34:45] some of the python scientific computing stack (scipy/numpy/etc) is still not fully supported on the latest version of python(3.9+), so bullseye might be hard to target [16:34:53] going to shoot for buster and see how it goes [16:35:38] yes I think it is a good bet [16:35:50] bulleye is still not released so there is no rush to upgrade [16:50:08] accraze: one thing that we should figure out is if we could avoid at all cert-manager [16:50:24] because in my mind we'd need just one certificate for the istio-ingress [16:50:35] that we can generate manually [16:51:04] agreed, it would be great if we can just use the internal pki service [16:51:05] IIUC the webhooks are needed to allow kfserving to create separate certs for various services, deploying them to istio or similar [16:51:27] accraze: nono even without the internal pki service [16:51:58] there is a tool called cergen that we (SREs) use to create certs from the puppet ca (it is used for all k8s services basically) [16:52:11] and then the key is deployed as secret to k8s [16:52:34] oh that'd be even better! [16:53:04] i'm still not 100% sure why cert-manager is needed if you run kfserving as standalone [16:53:46] I think it is what is it used to create certs that that the istio envoys sidecars use if enabled [16:54:08] ah ok that'd make sense, i don't see us needing that right away [16:55:30] although i did run into some issues last week getting a transformer to communicate with a predictor when i had the sidecars disabled [16:55:42] ah interesting [16:55:52] what kind of issues? [16:56:10] having encryption between pods in the same dc-cluster seems a lot [16:56:18] ^ agreed [16:56:47] the issue i ran into is that the transformer pod was unable to connect to the predictor pod using the cluster local gateway [16:56:59] (using the local svc address) [16:57:07] were you able to solve it or did you have to enable the sidecar? [16:58:16] no i still haven't solved it, but it could also be related to how auth is setup on the sandboxes. minikf on aws uses dex for basic auth but requires some additional steps when calling a service endpoint [16:58:42] its on my list of todos for the rest of the week [17:16:20] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Prepare 4 ORES English models for Lift Wing - https://phabricator.wikimedia.org/T272874 (10calbon) 05Open→03Resolved [17:25:52] 10AI-Governance, 10Lift-Wing, 10ORES, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Create Draft Model Deployment Guidelines - https://phabricator.wikimedia.org/T276598 (10calbon) [17:25:55] 10AI-Governance, 10Lift-Wing, 10ORES, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Create Draft Model Deployment Guidelines - https://phabricator.wikimedia.org/T276598 (10calbon) [17:25:57] 10AI-Governance, 10Machine-Learning-Team (Active Tasks): Talk with WMF Trust & Safety team - https://phabricator.wikimedia.org/T276599 (10calbon) 05Open→03Resolved [17:25:59] 10AI-Governance, 10Lift-Wing, 10ORES, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Create Draft Model Deployment Guidelines - https://phabricator.wikimedia.org/T276598 (10calbon) [17:26:02] 10AI-Governance, 10Machine-Learning-Team (Active Tasks): Talk with WMF Research Team - https://phabricator.wikimedia.org/T276600 (10calbon) 05Open→03Resolved [17:26:08] 10AI-Governance, 10Machine-Learning-Team (Active Tasks): Talk with WMF Product Team - https://phabricator.wikimedia.org/T276602 (10calbon) 05Open→03Resolved [17:26:10] 10AI-Governance, 10Lift-Wing, 10ORES, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Create Draft Model Deployment Guidelines - https://phabricator.wikimedia.org/T276598 (10calbon) [17:26:15] 10AI-Governance, 10Machine-Learning-Team (Active Tasks): Have Public Comment Period - https://phabricator.wikimedia.org/T276604 (10calbon) 05Open→03Resolved [17:26:23] accraze: very interesting https://github.com/kubeflow/kfserving/pull/1646 [17:26:46] istio 1.10 seems not to work yet (but there is a familiar committer solving the problem :) [17:29:14] elukey: nice, theo to the rescue! [17:29:49] I asked in their slack channel a question about webhooks and docs [17:29:55] I should start following it [17:30:00] \nick anubhav_sharma [17:30:27] that's a good point, i should probably get on that slack channel too [17:31:19] accraze: not sure if you have already met anubhav_sharma, we had a quick chat in here today! [17:31:56] o/ hi anubhav_sharma [17:32:25] hi accraze I am the google summer of code intern , nice to meet you . [17:33:12] excited to see your work, the gsoc proposal sounded like an interesting approach :) [17:34:34] Thank You !! I really wish it turns out great as well . [17:39:49] feel free to ping me if you have any questions or need help with any wmf-related stuff [17:43:50] Thank you accraze . Yeah surely will let you know [17:43:57] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10RonnieV) Thanks for the great meeting we had today! Could the output of https://ores.wikimedia.org/v3/scores/nlwiki/12... [18:00:44] Hey Anubhav! [18:01:19] Yeah accraze we are just getting him set up with everything. Had a small issue with IRC but it looks like that was resolved [18:03:41] nice [18:06:53] * elukey afk! o/ [18:09:09] night elukey! [18:44:32] cool, so it seems we can run the revscoring inference service using the wmf buster image just fine [18:46:07] only difference is the we are now running on python 3.7 instead of python3.8 [18:49:54] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Production images for ORES/revscoring models - https://phabricator.wikimedia.org/T279004 (10ACraze) I ran into some issues upgrading the revscoring inference service base image to bullseye (mostly since scipy & numpy have some issues wi... [23:35:45] very interesting, i've been monitoring the enwiki-goodfaith inference service while doing some load tests today and it seems that each pod only uses ~180MB memory and very little cpu. I honestly was expecting it to need _much_ more RAM... [23:36:40] network i/o is all over the place though due to revscoring fetching entire article text over the wire [23:37:10] might need to cache w/ redis or eventually do an online feature store or something