[01:15:55] 10Lift-Wing, 10Machine-Learning-Team: Deploy NSFW model to production - https://phabricator.wikimedia.org/T314810 (10Htriedman) @Aklapper it's the output of a project during Innovation Week to 1) retrain an image model for classifying nude, pornographic, gory, etc. imagery and 2) deploy it on the new ML infras... [07:40:13] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Deploy Outlinks topic model to production - https://phabricator.wikimedia.org/T287056 (10elukey) @Isaac I can reproduce the error from stat1004, I think that you are going through the http(s) proxy for a .discovery.wmnet domain (internal one). Try with `unse... [07:49:04] hello folks! [07:49:22] trying to understand why editquality's event generation doesn't work in prod [07:49:25] sigh [07:49:27] it works in staging [07:49:40] maybe different docker images [08:05:02] ok I tested the same image on staging, it works fine [08:05:37] in prod I see [08:05:37] [E 220809 08:05:12 events:79] Unexpected error while trying to send a revision score event to EventGate: IOStream is not idle; cannot convert to SSL [08:09:00] I also tested the image locally, and it works [08:19:55] from https://www.tornadoweb.org/en/stable/_modules/tornado/iostream.html it looks like happening when it starts the tls conn [08:20:01] but the error is a little cryptic [08:26:55] ahhh yes it is my bad [08:27:07] i haven't synced a admin setting for knative [08:27:48] yeah works now! [08:27:50] * elukey dances [08:30:55] ok proceeding with ml-serve-eqiad [08:31:29] (03CR) 10Elukey: [C: 03+2] draftquality: add code to send events to EventGate [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/820401 (https://phabricator.wikimedia.org/T301878) (owner: 10Elukey) [08:32:40] also start the build process for draftquality [08:40:38] (03Merged) 10jenkins-bot: draftquality: add code to send events to EventGate [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/820401 (https://phabricator.wikimedia.org/T301878) (owner: 10Elukey) [08:47:07] going to take a little break! [09:29:11] deployed also articlequality, all good, I can see events [09:31:07] hello Luca! nice job :) [09:33:19] and thanks for answering Isaac's question [09:33:34] np! Sorry if I overstepped, I saw it passing by and I tried the "fix" [09:38:12] I'm glad you step in! I would have asked you anyway because I couldn't reproduce the error. :) [09:41:46] ack :) [09:44:50] aiko: qq as brainstorm - while checking the preprocess code of revscoring models I noticed that from the models we only need the feature list, that is a fixed list of things IIRC. In theory we could have a transformer with the list of features hardcoded into a yaml file or similar, without the need of a model. Does it sound right? Not suggesting that we invest time on it, just wondering out loud [10:19:03] (03CR) 10Elukey: [C: 03+2] drafttopic: add code to send events to EventGate [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/820418 (https://phabricator.wikimedia.org/T301878) (owner: 10Elukey) [10:23:33] elukey: yeah, that sounds right! From what I see, we only use model.features and model.version in preprocess code. If the feature list is definite for the given model binary, what you just said should work. [10:27:06] (03Merged) 10jenkins-bot: drafttopic: add code to send events to EventGate [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/820418 (https://phabricator.wikimedia.org/T301878) (owner: 10Elukey) [10:40:45] hello.. any updates on the wikilabels db stretch upgrade work? [10:42:13] taavi: o/ Tobias is on holidays, IIRC he'll be back next Monday [10:42:36] I think that some work started to verify the dbs etc.. [10:42:40] so in progress :) [11:00:35] draftquality event generation code works in prod! [11:01:15] aiko: ack thanks for the confirmation.. this could allow us to use transformers, and scale them differently from predictors for revscoring, but maybe not worth it for the moemnt [11:05:33] aiko: one thing - is it ok to proceed with https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/820456 ? [11:05:42] it is basically a no-op, no need to deploy the new image [11:05:50] it is just to include the new shared modules [11:09:55] going afk for lunch! [11:28:44] (03CR) 10AikoChou: [C: 03+1] outlink: move Blubber config to the new standard [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/820456 (https://phabricator.wikimedia.org/T301878) (owner: 10Elukey) [11:32:10] elukey: yes! [13:18:28] (03CR) 10Elukey: [C: 03+2] outlink: move Blubber config to the new standard [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/820456 (https://phabricator.wikimedia.org/T301878) (owner: 10Elukey) [13:19:10] (03CR) 10Elukey: [V: 03+2 C: 03+2] Update README.md files after the recent Blubber config refactor [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/820458 (owner: 10Elukey) [13:19:20] (03CR) 10Elukey: [V: 03+2 C: 03+2] python: Add more info about Docker image rebuild [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/821167 (https://phabricator.wikimedia.org/T301878) (owner: 10Elukey) [13:19:44] (03PS1) 10Elukey: articlequality: move preprocess() to async [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/821722 (https://phabricator.wikimedia.org/T313915) [13:22:08] draftquality deployed in prod, drafttopic is the last one [13:49:33] kevinbazira: o/ I merged your change, but I'd like to do the same for https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/821749 [13:49:37] and test in staging first [13:49:42] let's coordinate when you have a moment [13:51:24] thanks for the merge elukey. happy to coordinate .. [13:51:24] would you like me to first deploy my changes to prod before we test your s in staging? [13:54:29] kevinbazira: sure! [13:54:53] ok. let me run the deployment now ... [13:54:58] thanks! [13:59:31] both eqiad and codfw prod deployments have been completed successfully. [13:59:32] checking pods now ... [14:00:27] nice! [14:00:35] all new pods are up and running. [14:00:35] NAME READY STATUS RESTARTS AGE [14:00:35] arwiki-drafttopic-predictor-default-zlbq7-deployment-78bfcnt68q 3/3 Running 0 2m14s [14:00:35] cswiki-drafttopic-predictor-default-rhjq9-deployment-6c496d4n99 3/3 Running 0 2m10s [14:00:35] enwiki-drafttopic-predictor-default-ctzb5-deployment-5b4c5bp2p2 3/3 Running 0 2m12s [14:00:57] \o/ [14:01:20] after the meeting lemme know if you are ok with me proceeding with the above code review :) [14:01:38] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/821749 [14:39:41] elukey: I've added a small comment on the review :) [14:41:00] kevinbazira: thanksss you are righttttt [14:41:02] fixing :) [14:45:52] great. I've +1'd. [14:48:59] super merging and deploying [14:59:22] the drafttopic event generation works nicely! [15:01:20] fixed blubber references in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/KServe [15:02:09] ok event generation code rolled out \o/ [15:03:56] kevinbazira: the fawiki task looks very good in my opinion, let's see what the community member thinks about it [15:04:10] kevinbazira: I had a quick look on the ORES issue you are working on. I think the score they pointed out is articlequality, because the prediction changes from GA to C for the example they gave. [15:04:22] https://ores.wikimedia.org/v3/scores/fawiki?models=articlequality&revids=35130784%7C%2035130948 [15:05:13] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Send score to eventgate when requested - https://phabricator.wikimedia.org/T301878 (10elukey) All revscoring-based models are now able to accept a revision-create event, generate a revision-score one and send it to EventGate! \o/ [15:06:46] kevinbazira: they didn't make it clear [15:10:24] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Send score to eventgate when requested - https://phabricator.wikimedia.org/T301878 (10elukey) >>! In T301878#8138283, @Ottomata wrote: > Hello! Separate streams for different models seems fine, but perhaps what you want are s... [15:12:38] aiko: oh wow I assumed editquality as well! [15:12:58] let's follow up on the task asking explicitly what models they are using [15:16:58] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Send score to eventgate when requested - https://phabricator.wikimedia.org/T301878 (10Ottomata) > So this spark job would run on the Hadoop cluster as always, configured via puppet We don't currently maintain any streaming job... [15:26:40] thanks elukey and aiko, I'll look into whether articlequality scores change when tags are changed to templates. [15:33:18] going afk for today folks! Have a nice evening :) [15:44:23] bye Luca :) [16:54:33] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Deploy Outlinks topic model to production - https://phabricator.wikimedia.org/T287056 (10Isaac) > I think that you are going through the http(s) proxy for a .discovery.wmnet domain (internal one). Try with unset https_proxy, it should work afterwards! Confir...