[06:33:55] wow nice job!! [08:59:34] ORES restarts completed [09:14:17] 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10MoritzMuehlenhoff) >>! In T287238#7242078, @elukey wrote: >>>! In T287238#7240394, @elukey wrote: >> Deployed the new iptables to all ML buster clusters, prelimin... [10:15:23] 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10jbond) > I had to manually install iptables and related dependencies manually even after https://gerrit.wikimedia.org/r/c/operations/puppet/+/708258/8/modules/pro... [10:37:44] 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10elukey) >>! In T287238#7245798, @jbond wrote: >> I had to manually install iptables and related dependencies manually even after https://gerrit.wikimedia.org/r/c/... [10:41:46] 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10MoritzMuehlenhoff) This variant looks good to me, but it should be used rarely and with caution (since it effectively ties software updates to a git commit bumpi... [12:40:27] 10Machine-Learning-Team, 10Patch-For-Review: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10jbond) >>! In T287238#7245838, @MoritzMuehlenhoff wrote: > > This variant looks good to me, but it should be used rarely and with caution (... [15:00:08] Great, thanks elukey [15:22:51] :) [15:23:00] I started to import kfserving 0.6.0 with https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/708783/ [15:23:11] I think that it may be good to just target that version [15:23:19] (last released) [15:23:58] or do we prefer 0.5.x? [15:28:00] All else equal, I'd prefer lock it in as current a version as reasonable. It is easer to "upgrade" from 0.5.x to 0.6.0 now than when we have a ton of dependencies [15:29:08] ack I'll try to go for 0.6.0, unless Andy / Kevin / Tobias disagree [15:37:59] elukey: yeah i agree let's go for 0.6.0 [15:58:09] accraze: perfect thanks for confirming, the docker images are being built as we speak [15:58:31] I'll update the kfserving chart code review with the new yaml from upstream [15:58:53] still looking how to add the certificate for the webhook, but it shouldn't take much [15:59:00] (famous last words) [16:04:27] lol [16:11:14] are we abandoning the idea of having a separate directory in deployment-charts? [16:12:49] starting to think about how our inference services will fit into helm etc.. [16:12:59] klausman thoughts on 0.5.x vs 0.6.0? [16:18:32] I agree with the "latest useful" approach [16:19:07] There's really no reason to use an older version if there are no showstopping bigs and asjunct software doesn't require an older version. [16:19:16] adjunct* [16:21:51] accraze: re: separate dir, the current status is https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/700470 [16:22:01] see kubeflow-kfserving-services [16:22:11] err -inference [16:22:49] (I was finally able to make the helm linting working) [16:24:08] ah wait do you mean in the services dir in helmfile? [16:24:26] ^ yeah services dir in helmfile [16:25:18] ah ok that part is still unclear, but we may want to follow the same approach as in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/708475 [16:25:24] that worked nicely for admin_ng [16:26:28] so from the point of view of deploying models, I think that we'll just need to populate a simple list of values as indicated in the chart example [16:26:59] (following basically what people to do deploy services in the deployment pipeline) [16:27:36] does it make sense? [16:28:16] we'll likely have a dir under "services" called "inference" or similar [16:28:27] ahh ok i think that makes sense to me [16:34:27] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Ciell) PR is done, new quality scale proposed here: https://nl.wikipedia.org/wiki/Wikipedia:ORES/Kwaliteitsschaal_voor_... [17:19:31] going afk! ttl :) [17:20:00] see ya elukey [17:27:03] night elukey! Great work this week [18:04:25] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Halfak) We're unblocked with new work. We have new code ready for modeling/testing that improved unsourced content det... [18:34:30] 10Lift-Wing, 10ML-Governance, 10Machine-Learning-Team (Active Tasks): Outlinks model card - https://phabricator.wikimedia.org/T287527 (10Isaac) @ACraze my first pass at some of these for outlinks-based topic classification model: ` == Basic Details – Title: Outlinks-based Wikipedia Topic Classification – De... [18:38:04] 10Lift-Wing, 10ML-Governance, 10Machine-Learning-Team (Active Tasks): Outlinks model card - https://phabricator.wikimedia.org/T287527 (10calbon) OH wow @Isaac this is great! We are currently discussing if Wikitech or MediaWiki.org is a better spot for the model cards. And @Htriedman is working on a model car... [18:52:12] 10Lift-Wing, 10ML-Governance, 10Machine-Learning-Team (Active Tasks): Outlinks model card - https://phabricator.wikimedia.org/T287527 (10Isaac) > OH wow @Isaac this is great! We are currently discussing if Wikitech or MediaWiki.org is a better spot for the model cards. And @Htriedman is working on a model ca...