[10:39:35] * elukey lunch! [12:25:46] 10Machine-Learning-Team, 10ORES: ORES worker icinga message not specific enough - https://phabricator.wikimedia.org/T181536 (10fgiunchedi) -observability for backlog cleanup [12:26:01] 10Machine-Learning-Team: Monitoring for top IPs and User-Agents hitting the ORES service - https://phabricator.wikimedia.org/T181542 (10fgiunchedi) -observability for backlog cleanup [16:02:13] this is a starting point for the kserve network policies https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/732939 [16:02:46] I had a chat with Janis and there may be a better way to do it, without adding the master ip nodes [16:03:19] basically the webhook port (running as part of the kserve controller) needs to be reachable only from the Kubernetes API [16:03:24] hosts [16:08:48] ITS FRIDAY [16:09:00] I'm so tired [16:13:02] o/ [16:13:42] chrisalbon: i feel that -- happy friday y'all [16:21:20] elukey: do the kserve network policies include cluster-local gateway? [16:22:06] oh wait that's at the istio level nvm [16:39:53] accraze: o/ we'll go block-by-block, the istio rules hopefully with be shared with service ops (even if they don't need the cluster local gw) [16:40:37] I thought about upgrading knative to 0.19.x (to see if it could work on k8s 1.16) [16:40:46] and then get rid of the cluster local gateway [16:41:00] but better to freeze the stack right now :) [16:42:01] lol yeah i agree let's freeze the stack for now and then upgrade k8s [16:45:47] accraze: did you have the chance to go through the deployment docs? If so, do they make sense? [16:46:19] elukey: i read through them yesterday and it made sense at a high level [16:46:38] today i plan to try out articlequality and see how it goes :) [16:47:19] assuming that goes well, we could probably migrate off the sandbox clusters next week [16:50:37] accraze: so for the moment you don't have +2 to the deployment-charts repo (I need to investigate if SREs only can merge or not), so if you have time we can try now (or in a bit) [16:50:42] otherwise next week :) [16:52:36] elukey: yeah i was actually starting now but also dont want to make you stay late so whatever works best for you [16:52:58] or i can send CRs today and merge next week [16:54:11] accraze: ah article quality is another revscoring group/category right? [16:54:22] if so I have so create some users/namespace/etc.. first [16:54:28] yeahh.... its a bit more involved [16:54:51] okok so before your code change I need to do some work in puppet :) [16:55:20] ah ok that makes sense, we can hold off till next week if you want [17:00:51] accraze: the process is basically what is listed in https://phabricator.wikimedia.org/T293858 [17:01:13] if you want you can definitely work on something like https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/732347/ [17:01:48] but it will need some pre-work before being mergiable [17:02:07] and https://wikitech.wikimedia.org/wiki/User:Elukey/MachineLearning/Deploy#How_to_add_a_new_helmfile_config is the overall idea [17:02:46] so let's proceed in parallel, and on Monday we can sync so you can tell me what are the confusing parts etc.. [17:02:52] how does it sound/? [17:03:08] elukey: cool that sounds great! [17:42:38] 10Lift-Wing, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Add enwiki-articlequality inference service to LiftWing - https://phabricator.wikimedia.org/T294141 (10ACraze) [17:43:59] 10Lift-Wing, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Add enwiki-articlequality inference service to LiftWing - https://phabricator.wikimedia.org/T294141 (10ACraze) [17:47:14] have a good weekend folks! [17:47:16] * elukey afk! [18:40:15] 10Lift-Wing, 10artificial-intelligence, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Add enwiki-articlequality inference service to LiftWing - https://phabricator.wikimedia.org/T294141 (10ACraze) Step One: Upload model binary to Thanos Swift - s3://wmf-ml-models/articlequality/enwiki/... [23:12:49] 10Lift-Wing, 10Machine-Learning-Team, 10ORES, 10artificial-intelligence, and 2 others: Developing the `algo-accountability` repository - https://phabricator.wikimedia.org/T290746 (10Htriedman) I have some more updates after working on the [[ https://gitlab.wikimedia.org/htriedman/algo-accountability | algo...