[08:16:20] 10Machine-Learning-Team, 10observability: Improve ORES observability - https://phabricator.wikimedia.org/T299137 (10elukey) Created https://github.com/wikimedia/ores/pull/355 to add more logging. [08:19:24] 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10elukey) 05Open→03Resolved This task can be closed! [08:20:24] 10Machine-Learning-Team, 10observability, 10Patch-For-Review: logstash schema for ORES logs - https://phabricator.wikimedia.org/T299999 (10elukey) a:03elukey [08:27:12] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Integrate cert-manager/issuer in ml-serve clusters - https://phabricator.wikimedia.org/T298976 (10elukey) This task is currently blocked by T299906 [12:43:20] * elukey lunch! [15:09:33] morning all! [15:10:56] morning! [15:37:11] o/ [15:37:16] o/ [16:07:51] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks): Lift Wing proof of concept - https://phabricator.wikimedia.org/T272917 (10calbon) [16:13:19] I added an epic tag [16:19:44] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Draftquality transformer - https://phabricator.wikimedia.org/T298989 (10ACraze) Deployment for the new transformer is currently blocked on T298976 [16:32:09] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Editquality Transformer - https://phabricator.wikimedia.org/T298943 (10ACraze) It seems the editquality-transformer image has not been published yet. I think this is due to the integration/config patchset being merged after the the pip... [19:18:24] cert-manager works!! [19:18:33] all certs now working [19:18:40] and on the pki [19:20:35] I am updating the codfw cluster as well [19:20:48] nice one elukey! [19:22:43] accraze: we can now deploy the draft quality transformers [19:24:08] awesome! the CR that adds it is https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/756064 [19:25:08] accraze: merged, puppet should finish in a sec, do you want to deploy? [19:25:31] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks): Lift Wing proof of concept - https://phabricator.wikimedia.org/T272917 (10elukey) [19:26:11] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Integrate cert-manager/issuer in ml-serve clusters - https://phabricator.wikimedia.org/T298976 (10elukey) 05Open→03Resolved Finally done! [19:26:56] (in eqiad and codfw) [19:27:12] I am going afk for a bit but I can check later, feel free to do it if you want [19:29:48] elukey: ok cool i will give it a shot in just a bit! [19:29:57] have a good one :) [19:40:00] ok trying the draftquality transformer deployment now [19:45:13] ok eqiad deployment seems good! [19:47:02] https://www.irccloud.com/pastebin/1JvCgPD8/ [19:53:20] same with the codfw deploy, all good on the draftquality transformer! [19:57:32] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Draftquality transformer - https://phabricator.wikimedia.org/T298989 (10ACraze) The cert-manager blocker is gone (see: T298976). I was able to deploy the new transformer successfully to both eqiad and codfw ` accraze@ml-serve-ctrl1001:... [19:57:59] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Factor out feature retrieve functionality to a transformer - https://phabricator.wikimedia.org/T294419 (10ACraze) [19:58:27] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Draftquality transformer - https://phabricator.wikimedia.org/T298989 (10ACraze) 05In progress→03Resolved [20:10:01] accraze: nice! [20:10:38] I think that on the k8s side we have completed the things to add for the MVP [20:10:44] \o/ [20:10:47] now we are missing the api-gateway, load testing, etc.. [20:11:04] things are looking good :) [20:11:09] ^^^ [20:11:37] nice! [20:26:50] 10ORES, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Deploy nlwiki articlequality model - https://phabricator.wikimedia.org/T300195 (10ACraze) [20:27:18] 10ORES, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Deploy nlwiki articlequality model - https://phabricator.wikimedia.org/T300195 (10ACraze) [20:27:21] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10ACraze) [21:14:40] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): ML Sandbox Transformer Configuration - https://phabricator.wikimedia.org/T299972 (10ACraze) 05Open→03In progress [21:21:33] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): ML Sandbox Transformer Configuration - https://phabricator.wikimedia.org/T299972 (10ACraze) I can hit a transformer endpoint directly, but I get a 503 error. When I inspect the transformer logs, I see the following ` [E 220124 20:50:32 web:2243] 500 POST /v...