[08:09:55] hello folks [08:11:59] (03PS3) 10AikoChou: nsfw: create model-server and blubberfile [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/822046 (https://phabricator.wikimedia.org/T314810) [08:12:55] (03PS4) 10AikoChou: nsfw: create model-server and blubberfile [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/822046 (https://phabricator.wikimedia.org/T314810) [08:15:20] o/ morning [08:31:44] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Deploy NSFW model to production - https://phabricator.wikimedia.org/T314810 (10achou) The nsfw model has been uploaded successfully to Thanos Swift. ` aikochou@stat1007:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls s3://wmf-ml-models/experimental/nsfw/2... [08:32:37] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Deploy NSFW model to production - https://phabricator.wikimedia.org/T314810 (10achou) [08:33:29] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Deploy NSFW model to production - https://phabricator.wikimedia.org/T314810 (10achou) [08:41:06] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Create NSFW model inference service - https://phabricator.wikimedia.org/T314982 (10achou) [08:48:13] aiko: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/822326 :) [08:48:17] almost ready to go [08:48:28] I am going to check the blubber+python config for nsfw [08:50:51] (03CR) 10Elukey: [C: 03+2] nsfw: create model-server and blubberfile (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/822046 (https://phabricator.wikimedia.org/T314810) (owner: 10AikoChou) [08:52:12] let's wait for the CI to kick off the image build :) [09:06:35] docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-nsfw:2022-08-11-085124-publish [09:06:38] \o/ [09:07:00] at this point I can update my change to deployment-charts [09:12:18] updated [09:14:03] nice! :D [09:31:44] aiko: fixed the commit msg! [09:31:49] the CI diff looks good afaics [09:47:11] thanks Luca, ready to merge :) [09:48:25] wow CI is really slow [09:48:30] I am waiting for the +2 verified [10:03:51] being kicked off from the place where I have been working, taking an early lunch break! [11:59:23] aiko: there is a problem with CI still not solved, once fixed we should be unblocked :) [12:04:34] elukey: ack! [12:05:22] aiko: just got the +2! Going to set up staging and let you deploy [12:05:41] (if you have time, otherwise I can do it) [12:10:07] aiko: staging is ready! [12:11:50] deploying to staging since I am already on deploy etc.. [12:15:39] NAME READY STATUS RESTARTS AGE [12:15:42] nsfw-model-predictor-default-vctcs-deployment-77986c5f96-f6tj7 3/3 Running 0 3m54s [12:15:45] pod up! [12:17:22] added the 'experimental' namespace in prod clusters as well [12:17:30] (but I haven't deployed the nsfw pod yet) [12:17:41] aiko: leaving the testing and rollout to prod to you when you prefer :) [12:18:01] elukey: o/ I have a question [12:18:11] sure [12:19:26] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move revscoring isvcs to async architecture - https://phabricator.wikimedia.org/T313915 (10elukey) Next steps: - add async support for drafttopic - add async support for draftquality [12:19:36] elukey: For deploying to staging, we don't need to add inference_services: - name: "nsfw-model" to the values-ml-staging-codfw.yaml? [12:20:39] aiko: the helmfile config picks up the values.yaml file first, then the staging one, so unless you specifically override things in the staging yaml nothing will be picked up from it [12:21:48] elukey: ohhh! I didn't know that [12:21:55] (if you check helmfile.yaml in the experimental dir of deployment charts at line 22 "values" will explain what I am saying) [12:22:26] (values are picked up from top to bottom) [12:24:28] I see, thanks for the explanation :) [12:27:27] aiko: super curious - does it work? [12:27:45] testing.. [12:31:26] nice! it works fine \o/ [12:32:04] aikochou@deploy1002:~$ time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/nsfw-model:predict" -d @input_sfw.json -H "Host: nsfw-model.experimental.wikimedia.org" --http1.1 [12:32:04] {"prob_nsfw": 6.397998884161149e-12, "prob_sfw": 1.0} [12:34:17] woooowwww \o/ [12:34:26] great work aiko! [12:35:13] we can choose to keep things deployed to staging only or to proceed to prod [12:36:21] let's deploy to prod [12:38:17] what do you think? [12:41:29] +1 :) [12:43:01] ok! proceed to prod [12:52:43] pods up on codfw and eqiad! [12:54:25] \o/ [12:55:30] elukey: should we delete the pod in staging? [12:56:55] elukey: or just keep it there? [13:04:00] aiko: we can keep it in my opinion, maybe we can think in the future how many of the "experimental" ones to keep in staging [13:04:08] let's see what's best as we go [13:04:13] what do you think? [13:04:58] sounds good! [13:05:51] super [13:05:57] taking a break! [13:06:10] \o/ thanks for your help Luca :) [13:09:06] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Deploy NSFW model to production - https://phabricator.wikimedia.org/T314810 (10achou) [13:09:13] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Create NSFW model inference service - https://phabricator.wikimedia.org/T314982 (10achou) 05Open→03Resolved [13:50:12] Morning! [13:51:29] Congrats Aiko on the deployment! [13:59:27] morning! [14:09:59] chrisalbon: I sent an email to Tobias with a recap of what we discussed yesterday and the day before (events, async, experimental namespace, etc..) [14:10:15] so we should all be on the same page about the last changes [14:10:40] I am not going to touch anything today to avoid problems :D [14:10:57] (other than the experimental namespace) [14:17:44] also verified that all my changes are deployed to staging/prod correctly (I missed a couple of deploys for staging, now done) [14:20:34] chrisalbon: o/ thanks :) [14:20:57] chrisalbon: I'll write some documentation on wikitech about the procedure of the deployment [14:24:59] Thanks elukey! [16:40:44] 10Machine-Learning-Team, 10Abstract Wikipedia team, 10Wikilabels, 10function-orchestrator: Discuss Not Running Static Validation - https://phabricator.wikimedia.org/T315026 (10cmassaro)