[07:17:02] morning! [07:17:18] accraze: of course the fix for the editquality transformer was https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/759381 [07:17:21] I just realized it :) [07:17:23] all pods up! [08:08:00] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): ML Sandbox Transformer Configuration - https://phabricator.wikimedia.org/T299972 (10kevinbazira) Thank you for working on this, @ACraze. I logged into the ml-sandbox and first checked whether the enwiki-articlequality isvc is up and running: ` root@ml-san... [08:47:25] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks): Istio gateways on ml-serve clusters spam syslog with warnings - https://phabricator.wikimedia.org/T300707 (10elukey) Merged also https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/759441 that should improve a little our logging. [09:22:40] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks): Improve ml-serve's Istio logs - https://phabricator.wikimedia.org/T300707 (10elukey) [10:10:38] kevinbazira: o/ if you want to deploy the article quality change you can go ahead (already on deploy1002) [10:11:43] the only difference with previous times is that you'll also need to deploy to ml-serve-cdofw [10:11:46] err codfw [10:40:08] * elukey lunch! [12:16:26] elukey: great ... I'll start the deploy shortly [12:34:04] 10Machine-Learning-Team, 10ORES, 10crosswatch: Crosswatch should use "damaging" instead of "reverted" model when available for a given wiki - https://phabricator.wikimedia.org/T122306 (10Aklapper) 05Open→03Declined Crosswatch seems unmaintained and inactive (see `T269703`). You may want to use https://me... [12:52:38] elukey: the articlequality deployment on deploy1002.eqiad has been completed [13:48:56] kevinbazira: nice! [14:49:23] 10Machine-Learning-Team, 10serviceops: Move Docker settings for kubernetes workers to overlay fs - https://phabricator.wikimedia.org/T300744 (10elukey) There are new ml-serve200* nodes to add to our codfw cluster, so if everybody likes the idea we could start from those to test overlay. As far as I can see ov... [14:54:41] elukey: I have added helmfile commands for ml-serve-codfw to the deployment documentation: https://wikitech.wikimedia.org/wiki/User:Elukey/MachineLearning/Deploy#How_to_deploy [14:55:02] thanks! [15:52:00] o/ [15:53:25] o/ [15:54:19] elukey: glad you were able to figure out the editquality transformer! [15:55:33] i got the networking issue sorted on ml-sandbox yesterday so kevinbazira and i can debug transformers more easily [15:55:52] I saw it very nice! \o/ [15:56:03] thank you for fixing this Andy! [15:57:15] just got our model_upload script working with minio on the ml-sandbox too [15:57:58] almost have a fully working dev environment that mirrors prod [15:58:06] 👏👏👏 [15:58:58] last step is to setup the minikube regeistry add-on so we can push dev images into the stack and then we're good [16:00:55] we have now very nice json logs for ingress/egress gateways [16:01:14] we'd need to build a kibana dashboard but it shouldn't be hard [16:01:36] I am watching timings for connections to the mw api, I see from 50 to 80 ms [16:02:54] niiiice! [16:03:56] I am currently trying to test the circuit breaking protection [16:04:09] but it doesn't seem to work properly, we'll see [16:16:45] 10Machine-Learning-Team, 10ORES, 10Technical-Debt: Inject Config to ORESService, convert tests to unit tests - https://phabricator.wikimedia.org/T232440 (10Aklapper) [16:20:24] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): ML Sandbox Transformer Configuration - https://phabricator.wikimedia.org/T299972 (10ACraze) Excellent, networking issues have been resolved and we can now run transformers on ml-sandbox. Marking this as RESOLVED. [16:21:08] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): ML Sandbox Transformer Configuration - https://phabricator.wikimedia.org/T299972 (10ACraze) 05In progress→03Resolved [16:41:42] 10Lift-Wing, 10Machine-Learning-Team: Sunset MiniKF sandboxes - https://phabricator.wikimedia.org/T293677 (10ACraze) I have installed a minio test instance on ml-sandbox and am able to use it for model storage. I have also configured s3cmd to use minio and can use our model_upload script. ` root@ml-sandbox:/s... [17:12:01] 10Machine-Learning-Team, 10serviceops: Move Docker settings for kubernetes workers to overlay fs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) The plan sounds pretty complete to me. >>! In T300744#7675239, @elukey wrote: > The question mark that I have, for the moment, is if the kube-api control pl... [17:25:12] A little late start for me today but morning all! [17:55:23] morning! I am logging off for today, have a good day folks :) [17:55:58] accraze: FYI I am playing with some egress gw settings on ml-serve-eqiad, so if you test it etc.. it may throttle requests earlier [17:56:35] ahhh ok sounds good, thanks for the heads up elukey! [18:48:22] 10Machine-Learning-Team, 10ORES, 10artificial-intelligence: Research Project Idea: Use AI to suggest improvements to patches uploaded to gerrit - https://phabricator.wikimedia.org/T195235 (10Aklapper) [19:58:33] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Sunset MiniKF sandboxes - https://phabricator.wikimedia.org/T293677 (10ACraze) 05Open→03In progress [20:11:20] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Sunset MiniKF sandboxes - https://phabricator.wikimedia.org/T293677 (10ACraze) @kevinbazira - I believe model storage is now ready on ml-sandbox. Can you try these steps to see if you can upload a model binary to our minio object store? 1. In separate termi... [20:48:32] 10ORES, 10artificial-intelligence, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability - https://phabricator.wikimedia.org/T300195 (10ACraze) The articlequality PR has been merge...