[07:08:02] morning folks! [08:33:09] 10Machine-Learning-Team, 10revscoring: Improving Hindi Language Assets - https://phabricator.wikimedia.org/T299577 (10Aklapper) 05Open→03Resolved a:03Aklapper Patch has been merged a while ago, but as `revscoring` uses GitHub there have not been notifications here about anything... closing [08:34:14] 10Machine-Learning-Team, 10revscoring: Improve Czech Language assets - https://phabricator.wikimedia.org/T223383 (10Aklapper) 05Open→03Declined Declining as this task lacks clear criteria when to call it done (it made sense in a Google Code-in context though) [08:34:16] 10Machine-Learning-Team, 10revscoring: Improving Hindi Language Assets - https://phabricator.wikimedia.org/T299577 (10Aklapper) a:05Aklapper→03dgsahethi [08:39:12] Morning :) [08:47:40] I will start reimaging ores2003 to Buster in a minute or two (now's the time to stop me) [08:48:04] I've looked at logs and stuff from last night, and 2001 and 2002 seemed to have worked just as expected [08:48:18] \o/ [09:22:10] Now doing first puppet run [09:53:29] running scap deploy [10:05:21] Hmmm. celery-ores-worker.service doesn't want to run [10:05:34] `May 10 10:04:59 ores2003 celery-ores-worker[5987]: Error: no such option: --app` [10:14:33] FOr some reason ores2003 gets the wrong version of /lib/systemd/system/celery-ores-worker.service [10:15:12] Ah! [10:15:35] hieradata/hosts/ores2002.yaml and 2001 have a different version for celery, but not 2003, I totally forgot that bit [10:18:59] elukey: if you could lgtm https://gerrit.wikimedia.org/r/c/operations/puppet/+/790634/2, that'd be great :) [10:26:11] I'm gonna submit it as TBR, what could go wrong? [10:31:37] Host is all green and not logging any more errors, pooling it [10:46:58] <- Lunch (but keeping an eye on 2003) [11:30:11] klausman: just seen the code change, all good, forgot to remind you :) [11:30:52] np :) [11:31:14] I'll carry on with 2004 after Lunch and then do the other machines soonish (probably 1-2 even today) [11:35:38] perfect [11:35:47] I am going to do some clean up of vms in horizon [11:36:47] in my opinion the ores and ores-staging vms/projects can go away, they are all stretch vms and we don't really use them [11:40:59] dropped a note on slack [11:41:34] Ack, sgtm [11:45:28] aiko: o/ I merged a change that should allow you to see pods on deploy1002 (the other deployment issue is still open) [11:52:19] in the meantime, I am deploying the changes for the kserve-inference chart [11:52:43] aiko: articlequality is now predictor only, lemme know if all works later on etc.. [11:52:59] elukey: I'll puy you up as reviewer for the ores/celery v5 changes, but submit them myself, so you don't need to review every individual one. [11:54:15] klausman: yep yep they are super simple ones, please go ahead without my +1 :) [11:54:22] so nice that this is unblocked and we are proceeding [11:56:27] Aye. And nice work on figuring out the git-lfs thing [11:56:45] <3 [11:58:07] aiko: I deployed most of the pending things, there is a docker image diff for revscoring-editquality-goodfaith, let's see if you can deploy that [11:58:58] Janis mentioned https://phabricator.wikimedia.org/T305729, I think that some perms got changed and we got affected too [12:00:16] 10Lift-Wing, 10Machine-Learning-Team: Unable to run helmfile and check pods - https://phabricator.wikimedia.org/T307927 (10elukey) The pod-checking permission should now be solved, the other one I believe should be due to https://phabricator.wikimedia.org/T305729. For the time being me and Tobias will deploy c... [12:07:03] elukey: o/ yes I can see pods now :) [12:08:27] 10Lift-Wing, 10Machine-Learning-Team: Unable to run helmfile and check pods - https://phabricator.wikimedia.org/T307927 (10elukey) Given what is written in T305729#7879637, members of the ml-team may been to be placed inside the `deployment` group (even if in theory it is not needed). Will do more research :) [12:08:36] aiko: all right one thing fixed :) [12:12:40] I see articlequality and draftquality are new pods. I'm going to test if all work fine [12:12:48] super [12:13:34] aiko: let me know later on if you can deploy the diff for revscoring-editquality-goodfaith, I think yes (so only new helm charts are a problem) but let's see [12:15:01] elukey: ok! I will try to deploy that and let you know [12:21:39] * elukey afk for a bit! [12:30:48] draftquality works well and the augmented feature output is correct [12:34:36] hmm articlequality returns 500 Internal Server Error, checking logs.. there is a mwapi request error [12:35:03] Could not find a suitable TLS CA certificate bundle, invalid path: /etc/ssl/certs/wmf-ca-certificates.crt [12:40:20] will look into that later [12:49:00] try to deploy diff for revscoring-editquality-goodfaith via helm [12:51:20] nope.. I can't deploy :( [12:55:47] 2004 is now pooled and serving requests [12:56:02] 10Lift-Wing, 10Machine-Learning-Team: Unable to run helmfile and check pods - https://phabricator.wikimedia.org/T307927 (10achou) [13:20:33] so wmf-certificates is missing for articlequality in blubber.yaml, it was in transformer.yaml [13:21:18] I will send a patch to fix it [13:35:00] Morning all! [13:36:48] o/ [13:36:53] aiko: super yes that is the fix [13:51:49] Heyo Chris [14:21:31] 10Lift-Wing, 10artificial-intelligence, 10editquality-modeling, 10Epic, 10Machine-Learning-Team (Active Tasks): Migrate editquality models - https://phabricator.wikimedia.org/T301409 (10calbon) [14:21:41] 10Lift-Wing, 10artificial-intelligence, 10editquality-modeling, 10Epic, 10Machine-Learning-Team (Active Tasks): Migrate editquality models - https://phabricator.wikimedia.org/T301409 (10calbon) [14:21:54] 10Lift-Wing, 10artificial-intelligence, 10editquality-modeling, 10Machine-Learning-Team (Active Tasks): Upload editquality model binaries to storage - https://phabricator.wikimedia.org/T301413 (10calbon) 05In progress→03Resolved [14:22:11] 10Lift-Wing, 10artificial-intelligence, 10editquality-modeling, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Add editquality isvc configurations to ml-services helmfile - https://phabricator.wikimedia.org/T301415 (10calbon) 05In progress→03Resolved [14:22:55] 10Lift-Wing, 10artificial-intelligence, 10editquality-modeling, 10Epic, 10Machine-Learning-Team (Active Tasks): Migrate editquality models - https://phabricator.wikimedia.org/T301409 (10calbon) 05In progress→03Resolved [14:27:55] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Deploy Outlinks topic model to production - https://phabricator.wikimedia.org/T287056 (10calbon) [14:28:13] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): HTTP error handling for Outlink topic model - https://phabricator.wikimedia.org/T306029 (10calbon) 05Open→03Resolved [14:35:12] going afk for a bit folks :) [14:40:55] 2005 is pooled and working well [14:41:04] taking a break before the meeting as well [14:44:21] (03PS1) 10AikoChou: articlequality: add wmf-certificates to blubber.yaml [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/790700 (https://phabricator.wikimedia.org/T301766)