[04:24:53] (03CR) 10Kevin Bazira: [C: 03+2] articlequality: update dependencies to use revscoring 2.11.4 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800025 (https://phabricator.wikimedia.org/T309102) (owner: 10Elukey) [04:25:21] (03CR) 10Kevin Bazira: [C: 03+2] draftquality: update dependencies to use revscoring 2.11.4 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800032 (https://phabricator.wikimedia.org/T309102) (owner: 10Elukey) [04:36:18] (03Merged) 10jenkins-bot: articlequality: update dependencies to use revscoring 2.11.4 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800025 (https://phabricator.wikimedia.org/T309102) (owner: 10Elukey) [04:36:20] (03Merged) 10jenkins-bot: draftquality: update dependencies to use revscoring 2.11.4 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800032 (https://phabricator.wikimedia.org/T309102) (owner: 10Elukey) [06:06:11] thanks for the reviews and merge folks! [06:06:14] I noticed " [06:06:14] This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset." [06:06:20] never seen it [06:11:21] ahh https://phabricator.wikimedia.org/T309371 [06:21:57] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Unable to run helmfile and check pods - https://phabricator.wikimedia.org/T307927 (10elukey) Kevin and Aiko's users are now in the `deployment` POSIX group, they should be able to deploy now. Let's try do to it before closing the task :) [06:22:15] need to run some errands soon [06:24:25] elukey o/ [06:24:25] Looks like aiko and I are now clear to deploy: https://phabricator.wikimedia.org/T308308#7961986 [06:24:25] I was thinking of proceeding with articlequalitle model deployments. Should I proceed? (Checking to make sure we don't destabilise anything) [06:25:41] ***articlequality [08:34:15] kevinbazira: o/ yep let's try it! [08:34:41] great. let me push a patch. [09:19:54] kevinbazira: green light to deploy when you prefer :) [09:20:34] thanks for the merge. deploying now ... [09:21:38] kevinbazira: keep an extra eye on the helmfile diff, there should be only two new isvcs and nothing more [09:22:04] yep, will check the diff [09:27:11] the diff showed two new isvcs as expected [09:27:15] super [09:27:21] both eqiad and codfw deployments have been completed successfully [09:27:29] checking pods now ... [09:28:02] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Unable to run helmfile and check pods - https://phabricator.wikimedia.org/T307927 (10elukey) 05Open→03Resolved a:03elukey Kevin was able to deploy successfully without issues, so I think that we can close for the moment! [09:30:27] 1/2 new pods is up and running [09:30:45] super [09:31:19] the euwiki has run into a CrashLoopBackOff [09:31:19] NAME READY STATUS RESTARTS AGE [09:31:19] euwiki-articlequality-predictor-default-ft6c2-deployment-6275b6 1/3 CrashLoopBackOff 4 2m55s [09:31:19] fawiki-articlequality-predictor-default-psxtw-deployment-7xglgm 3/3 Running 0 2m54s [09:31:19] investigating the cause now [09:32:44] lemme know if you need help [09:40:53] ok, I will. [09:40:53] the kserve-container logs show this: [09:40:53] AttributeError: Can't get attribute 'UTF16EnchantStr' on [09:40:53] looks like we have to do the dependency hell dance :) [09:42:22] kevinbazira: the change in deps that you merged this morning failed to be published to the docker registry, I am trying to figure out what's best, so we can't try if the newer one works [09:42:42] oh ... I see. [09:43:59] I am wondering if UTF16EnchantStr got deprecated or similar in enchant [09:44:10] checking ... [09:53:24] kevinbazira: look at https://github.com/pyenchant/pyenchant/commit/a864273ea40aa1c19d4ce6d367605bfded3a336b, it seems that the first tag available (that I guess is the first release) is 1.6.7 [09:53:29] and we have 1.6.6 [09:53:49] 👀 👀 👀 [09:54:40] yeah editquality and draftquality have 1.6.11 [09:54:44] shall we bump? [09:54:48] yep [10:03:55] I'll leave you send the code change kevinbazira, ok? [10:04:00] Or do you prefer me doing it? [10:04:13] yep, I'll send it. [10:04:18] (anytime, even next week, I wanted to avoid a deadlock you wait me and I wait you :D) [10:07:03] np ... I'll tag you in the patch so you get a notification whenever it's sent [10:27:39] ack! going afk for lunch :) [10:27:45] elukey: o/ [10:28:00] elukey: I'm working on T302851 deploy to beta. I have a question about our beta cluster. [10:28:33] elukey: on https://wikitech.wikimedia.org/wiki/ORES/Deployment#Deploy_to_the_test_server there says beta is deployment-ores01 but your comment says we will deploy the change to deployment-ores02. I’m a bit confused? [10:33:42] but it seems deploy with `scap deploy` on deployment-deploy03 we don't need to specify deployment-ores01 or deployment-ores02 [10:37:00] (03PS1) 10Kevin Bazira: articlequality: bump pyenchant from 1.6.6 to 1.6.11 in the model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800682 (https://phabricator.wikimedia.org/T307418) [10:43:03] elukey: currently I cherry picked the change. But first I had to setup ssh keys in gerrit that I haven't done it from deployment-deploy03 before. (maybe I should add it to the onboarding doc?) [11:19:58] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks): Lift Wing proof of concept - https://phabricator.wikimedia.org/T272917 (10achou) [11:20:02] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Support (or not) the ORES augmented feature output in liftwing - https://phabricator.wikimedia.org/T301766 (10achou) 05Open→03Resolved [11:20:05] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Support (or not) the ORES augmented feature output in liftwing - https://phabricator.wikimedia.org/T301766 (10achou) This task is done. :) In T309102, we also applied the changes that we tested in arwiki-goodfaith to all revscoring-edit... [11:32:47] (03CR) 10AikoChou: [C: 03+1] articlequality: bump pyenchant from 1.6.6 to 1.6.11 in the model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800682 (https://phabricator.wikimedia.org/T307418) (owner: 10Kevin Bazira) [12:55:54] aiko: o/ [12:55:59] you are definitely right [12:56:16] so deployment-ores02 is the new host withe Debian Buster and python37 that replaced 01 [12:57:35] fixed the references in the docs [12:58:08] about scap - if you check in the "scap" directory of ores-deploy you'll see some "targets" mentioned for beta, that is the place used to specify where to deploy [12:58:26] so if you don't tell to scap a specific hostname, it will deploy to its predefined targets [12:58:57] aiko: about the ssh keys - when cherry picking it is convenient to use the gerrit link with anonimous credentials [12:59:02] it will go through http and not ssh [13:00:51] (03CR) 10Elukey: [C: 03+2] articlequality: bump pyenchant from 1.6.6 to 1.6.11 in the model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800682 (https://phabricator.wikimedia.org/T307418) (owner: 10Kevin Bazira) [13:04:43] (03Merged) 10jenkins-bot: articlequality: bump pyenchant from 1.6.6 to 1.6.11 in the model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800682 (https://phabricator.wikimedia.org/T307418) (owner: 10Kevin Bazira) [13:11:18] kevinbazira: new docker image for articlequality ready to go https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/800723 [13:13:24] elukey: ah I see. Thanks for the clarification! [13:13:50] yep, the predefined target for ores-beta is deployment-ores02 :) [13:14:15] elukey: the new docker image LGTM +1'd. [13:14:19] super thanks :) [13:19:13] elukey: nice I didn't notice there is anonimous http :D [13:19:36] :) [13:19:43] kevinbazira: new pods are up, euwiki looks good! [13:19:47] at least it doesn't crash [13:19:52] do you want to test them with some requests? [13:20:24] (also deploy in codfw) [13:22:40] now I need to figure out how to trigger the publish of the draftquality code review (with revscoring 2.11.4) [13:22:45] since CI failed this morning [13:23:29] woohoo... just checked and both pods are up and running [13:23:29] NAME READY STATUS RESTARTS AGE [13:23:29] euwiki-articlequality-predictor-default-c4f8b-deployment-7nj87q 3/3 Running 0 4m58s [13:23:29] fawiki-articlequality-predictor-default-8zn5q-deployment-6lrml7 3/3 Running 0 4m57s [13:23:29] thanks for your help elukey! [13:25:04] (03PS1) 10Elukey: draftquality: null change to trigger image publishing [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800726 [13:25:14] kevinbazira: happy to help! \o/ [13:25:19] I created another change --^ [13:25:27] it is an hack to publish a new docker image [13:25:55] lemme know if it is ok for you two, if you don't like it I'll drop it [13:31:52] (03CR) 10Kevin Bazira: [C: 03+2] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800726 (owner: 10Elukey) [13:33:42] elukey: LGTM, I've +2'd. Hope it will trigger CI this time... fingers crossed :) [13:35:40] deployed to beta! ran the httpbb test and all passed [13:38:55] (03Merged) 10jenkins-bot: draftquality: null change to trigger image publishing [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/800726 (owner: 10Elukey) [13:39:58] ohhh I just got my new laptop! gonna unbox it \o/ [13:45:53] nice... enjoy :) [13:47:41] aiko: ah nice! enjoy! Also good job for the deployment in beta, I think that next week we'll be ready for an ORES deploy to make chrisalbon happy :D [13:48:22] kevinbazira: new image published! Going to file a change for deployment-chart [13:48:37] great. on standby... [13:50:03] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/800732/ [13:50:38] after this we should have revscoring 2.11.4 in all lift wing pods [13:56:50] Looks like the latest draftquality image is 2022-05-03-135028-publish. I don't know whether this is a cache issue on my end: https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-draftquality/tags/ [13:57:19] yeah I think it is a cache issue, I see the other one from the CI job [13:57:43] ok ... it's good to know you can see it on your end. [13:58:40] yep pod is up! all pods with revscoring 2.11.4 \o/ [13:59:20] Yay! Wow this is great news to wake up to [13:59:37] woohoo \o/ [14:03:50] chrisalbon: we'll deploy 2.11.4 to ORES too next week, for your joy :) [14:08:02] 10ORES, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: revscoring feature extraction error for wikitext papes in Wikidata - https://phabricator.wikimedia.org/T302851 (10elukey) Aiko deployed the change to deployment-prep, it looks very good: ` elukey@deployment-ores02:~$ curl localhost:8081/v3/... [14:08:11] chrisalbon: this is the fix --^ [14:08:14] nice work aiko :) [14:15:38] elukey: thanks for checking the fix!! Looks great \o/ [14:19:25] aiko: I'll let you organize the ORES prod deployment next week if you want/have-time, I'll be helping of course [14:20:22] (03CR) 10Elukey: [C: 03+1] Update wheels submodule with latest changes [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/798894 (owner: 10AikoChou) [14:22:22] elukey: sounds good. I'll do it :) [14:24:24] super thanks [14:40:59] going afk folks, have a nice weekend :) [14:42:37] have a nice weekend Luca :) [14:42:44] Bye luca!