[07:41:26] hello folks [07:50:02] (03CR) 10Elukey: Avoid sharing the same aiohttp session in rr and outlink (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884924 (owner: 10Elukey) [08:11:10] ah lovely, to test revert risk I need to fetch a 2GB model :D [08:29:47] heyy [08:34:56] morning :) [08:54:25] aiko: o/ I am testing the rr model with kserve 0.9, I noticed this when trying to get a score [08:54:28] AttributeError: 'BertTokenizerFast' object has no attribute '_in_target_context_manager' [08:54:31] does it ring a bell? [08:59:40] elukey: hi luca!! sounds like a problem with version [09:00:45] elukey: are you testing locally? [09:01:05] elukey: which model file you used? [09:01:44] aiko: hi! The last one (the one set in staging) [09:02:39] the exception is raised in result = classify(self.model, request["revision"]) [09:11:22] I think it's because I haven't uploaded the latest model to swift [09:12:54] elukey: can you try with this model https://drive.google.com/file/d/1aRMMk6feLXGrfxH5xojuq-hRo35WAApI/view or wait for me uploading it [09:16:39] aiko: definitely, downloading it now :) [09:19:57] we should probably start thinking about how to share these models [09:20:03] with the outside world I mean [09:21:23] yeah agree. they are quite big [09:22:07] We have to eventually migrate away from our Swift cluster (observability is kindly giving us space but it is not the right one) [09:22:54] the target is MOSS (misc object storage) handled by the Data Persistence team, we could ask a feature to expose buckets to the outside world [09:35:28] sounds good! [09:37:59] 10Machine-Learning-Team: Add access restriction to WikiGPT - https://phabricator.wikimedia.org/T328526 (10kevinbazira) [09:42:53] aiko: worked! [09:42:55] * isaranto goes and reads about MOSS [09:43:10] elukey: nice :D [09:43:21] elukey: o/ I just saw the patch https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/883964/ while reading your task about changeprop. [09:43:27] (03CR) 10Elukey: "Tested locally with docker and it worked." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885286 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [09:43:44] elukey: I found '*.revscoring-articletopic.wikimedia.org’ was missing in tlsExtraSANs. Isn't it? [09:44:11] aiko: ah snap yes, will add it thanks! [09:47:41] (03CR) 10AikoChou: [C: 03+1] "Thank you for testing it!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885286 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [09:48:26] also created https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/885740 [09:57:19] 10Machine-Learning-Team, 10Research: Update torch's settings in the Knowledge Integrity repo - https://phabricator.wikimedia.org/T325349 (10achou) 05Openβ†’03Resolved The problem is resolved. [09:59:27] 10Machine-Learning-Team: Enrich revertrisk image tag with model's package version - https://phabricator.wikimedia.org/T325295 (10achou) 05Openβ†’03Resolved The task is done. [10:10:16] 10Lift-Wing, 10Machine-Learning-Team: Deploy MultilingualRevertRiskModel to production - https://phabricator.wikimedia.org/T325218 (10achou) A new model that works with transformers 4.25.1 and torch 1.13.1 is uploaded: (It is mainly because joblib serialisation specifics. It is needed to reload the model with... [10:16:23] (03PS1) 10Elukey: Upgrade nsfw's requirements.txt for Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885743 (https://phabricator.wikimedia.org/T325528) [10:16:25] (03PS1) 10Elukey: Upgrade outlink predictor and transformer to Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885744 (https://phabricator.wikimedia.org/T325528) [10:22:32] (03CR) 10Elukey: "Tested locally, worked fine!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885743 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [10:23:09] (03CR) 10Ilias Sarantopoulos: [C: 03+1] Update Revert Risk's requirements.txt to support kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885286 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [10:23:37] (03CR) 10Ilias Sarantopoulos: [C: 03+1] Upgrade nsfw's requirements.txt for Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885743 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [10:37:59] (03PS2) 10Elukey: Avoid sharing the same aiohttp session in rr and outlink [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884924 [10:38:01] (03PS2) 10Elukey: Update Revert Risk's requirements.txt to support kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885286 (https://phabricator.wikimedia.org/T325528) [10:38:03] (03PS2) 10Elukey: Upgrade nsfw's requirements.txt for Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885743 (https://phabricator.wikimedia.org/T325528) [10:38:05] (03PS2) 10Elukey: Upgrade outlink predictor and transformer to Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885744 (https://phabricator.wikimedia.org/T325528) [10:38:28] (03CR) 10Elukey: "I fixed some issues with outlink that I didn't spot before!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884924 (owner: 10Elukey) [10:38:47] (03CR) 10Elukey: "tested locally, worked fine (also the sending event part)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885744 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [10:40:10] ok folks all the code reviews are out for kserve 0.9, should work fine [10:40:25] I'll wait for aiko's final approval in https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/884924 before proceeding [10:43:06] (03CR) 10Ilias Sarantopoulos: [C: 03+1] Upgrade outlink predictor and transformer to Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885744 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [10:47:01] (03CR) 10AikoChou: [C: 03+1] Upgrade nsfw's requirements.txt for Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885743 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [10:56:52] (03CR) 10AikoChou: [C: 03+1] Avoid sharing the same aiohttp session in rr and outlink (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884924 (owner: 10Elukey) [10:57:33] (03CR) 10AikoChou: [C: 03+1] Upgrade outlink predictor and transformer to Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885744 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [10:58:21] thanks all for the reviews <3 [11:27:21] I just realized ORES drafttopic only supports enwiki.. while checking out https://github.com/wikimedia/mediawiki-services-ores-deploy/blob/1c8b8a1b6280ce1a971aad3b9476ed96b9144665/config/00-main.yaml#L109-L624 and https://ores.wikimedia.org/v3/scores/ [11:27:58] there are models for other languages in the drafttopic repository but they are not in ORES production. But we've deployed all of them on Lift Wing. [11:30:10] Maybe we should remove them [11:50:43] good point, we don't really know how accurate they are.. [11:50:55] but we can say that for the enwiki one as well :D [11:52:01] (03CR) 10Elukey: [C: 03+2] Avoid sharing the same aiohttp session in rr and outlink [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884924 (owner: 10Elukey) [11:52:09] (03CR) 10Elukey: [C: 03+2] Update Revert Risk's requirements.txt to support kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885286 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [11:52:12] (03CR) 10Elukey: [C: 03+2] Upgrade nsfw's requirements.txt for Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885743 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [11:52:18] (03CR) 10Elukey: [C: 03+2] Upgrade outlink predictor and transformer to Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885744 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [11:52:40] * elukey lunch! [11:58:06] (03Merged) 10jenkins-bot: Avoid sharing the same aiohttp session in rr and outlink [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884924 (owner: 10Elukey) [11:58:08] (03Merged) 10jenkins-bot: Update Revert Risk's requirements.txt to support kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885286 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [11:58:10] (03Merged) 10jenkins-bot: Upgrade nsfw's requirements.txt for Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885743 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [11:58:12] (03Merged) 10jenkins-bot: Upgrade outlink predictor and transformer to Kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885744 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [12:04:49] 10Machine-Learning-Team: Add access restriction to WikiGPT - https://phabricator.wikimedia.org/T328526 (10kevinbazira) A login page has been added to WikiGPT. {F36571978} It has the message: "This project is still under the experimentation phase. Please enter password to access." When a user enters the correc... [12:26:27] * isaranto afk for approx 1h lunch + errand [13:43:35] (03PS12) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [13:45:26] (just a rebase)--^ [14:08:48] I tried to be smart and +2 the four code reviews I had, to then come back after lunch with new docker images [14:08:51] of course I was a fool [14:11:54] logged in and tried a rebuild of the failed step for rr [14:11:55] https://integration.wikimedia.org/ci/job/trigger-inference-services-pipeline-revertrisk-publish/21/ [14:16:28] (03CR) 10Elukey: [C: 03+2] "Re-ran manually the job that failed:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885286 (https://phabricator.wikimedia.org/T325528) (owner: 10Elukey) [14:17:35] aiko: o/ what should I do with the revert risk image? [14:17:49] because I see that staging seems a little more advanced that prod [14:18:00] and you told me about the new model to upload etc.. [14:20:55] elukey: yeah staging is more advanced than prod [14:21:16] do we have a new docker image for rr? [14:22:25] we can deploy the new image and new model to staging [14:22:51] yep I have the new one! https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-revertrisk/tags/ [14:23:09] if you have time to upload the model I'll change it as well in my code review [14:25:12] mmm or I can upload it with your link [14:25:14] doing it [14:25:28] elukey: I've done it. :) New model's timestamps 20230201095010 [14:25:44] wow <3 [14:27:07] filed https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/885823 [14:30:50] I removed the nsfw config in staging since it seemed equal to the prod one, but lemme know if you want both [14:32:01] remove +1 [14:32:15] elukey: what do u think about deploying and testing python 3.9 in prod for revscoring? [14:32:52] isaranto: +1, we can start with one namespace in ml-serve-codfw [14:33:00] should already be ready to go in theory [14:34:58] ok, I will start rolling things out tomorrow morning then (if I start now I know how the evening will go ) πŸ˜› [14:36:05] will also check to add some tests for outlink and rr so we can check everything [14:36:36] httpbb doesnt support json yet. waiting for one more round of reviews [14:36:48] nice job, I am following! [14:37:03] anybody have handy a curl command for these models? (outlink+rr) [14:38:11] I can give you the input.json files [14:39:27] rr takes lang and rev_id right? [14:40:09] yeah [14:40:12] outlink a bit more [14:40:30] remembered why we have nsfw in staging's yaml [14:40:31] https://integration.wikimedia.org/ci/job/helm-lint/9212/console [14:40:44] if we override the inference services list in staging we need to add all [14:40:59] (see the diff, the isvc for nsfw gets removed) [14:41:33] ok then , a sample call for outlink would be helpful [14:41:55] I have one for local testing, not sure if it is the same for prod [14:41:56] lemme check [14:43:12] regarding the thing with nsfw: yeah it is the same thing with kustomize, it replaces the item in the yaml, doesnt append to the list [14:44:18] isaranto: I keep saying that there is no joy in this line of work [14:45:57] https://news.ycombinator.com/item?id=26681807 [14:45:57] haha [14:52:10] 10Machine-Learning-Team: Host WikiGPT on Toolforge - https://phabricator.wikimedia.org/T328398 (10calbon) Hey Zache! There is no documentation. It is just a proof-of-concept demo I made. But the Python code to run it yourself is here: https://github.com/chrisalbon/helpful_fountains/ Have fun! [14:59:05] elukey: I thought you intended to remove nsfw isvc from staging haha [15:00:05] aiko: ahahah no sorry should we do it? [16:07:55] elukey: no no, all good :) It's good to keep it [16:11:54] logging off, bye folks o/ [16:13:21] cu Aiko o/ [16:22:51] elukey: where do u think it is best I commit the httpbb yaml? inf services or deployment-scripts? I mean doesnt really fit in deployment scripts but it would end up on the deployment server as it is intended to [16:24:35] isaranto: I think the best one is the puppet repo, since it will be deployed to the deploy1002 node automatically [16:24:44] lemme find the change chat I did with Aiko [16:25:08] https://gerrit.wikimedia.org/r/c/operations/puppet/+/766590 [16:26:04] ok, thanks, exactly what I wanted [16:27:15] so for the calls to rr and outlink [16:27:17] curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revert-risk-model:predict" -X POST -d @input-rr.json -i -H "Host: revert-risk-model.experimental.wikimedia.org" --http1.1 [16:27:29] $ cat input-rr.json [16:27:30] { "rev_id": 21774755, "lang": "en"} [16:29:16] and [16:29:17] curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/outlink-topic-model:predict" -X POST -d @outlink_test.json -i -H "Host: outlink-topic-model.articletopic-outlink.wikimedia.org" --http1.1 [16:30:31] and the outlink_test.json [16:30:33] { "lang": "en", "page_title": "Victoria_Jubilee_Government_High_School", "features_str": "Q15987189 Q6604679 Q57137630 Q902 Q875538 Q13059501 Q4813767 Q4937781 Q19894466 Q25586928 Q13057539 Q2269756 Q7045494 Q6402931 Q2061316 Q25586927 Q22664 Q9610 Q16345577 Q9439 Q4860797 Q14980579 Q6947799 Q5151846 Q7121813" [16:30:39] } [16:30:41] isaranto: --^ [16:30:49] checked in staging and they all work fine [16:31:36] great, thanks a lot! [16:31:39] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade ml clusters to kserve 0.9 - https://phabricator.wikimedia.org/T325528 (10elukey) All model servers updated, the last step is to deploy the new models to production :) [16:31:50] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade ml clusters to kserve 0.9 - https://phabricator.wikimedia.org/T325528 (10elukey) [16:33:37] 10Machine-Learning-Team: Migrate ORES clients to LiftWing - https://phabricator.wikimedia.org/T312518 (10elukey) [16:34:06] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Research: Proposal: deprecate the mediawiki.revision-score stream in favour of more streams like mediawiki-revision-score- - https://phabricator.wikimedia.org/T317768 (10elukey) 05Openβ†’03Resolved a:03elukey I am going to close this task sin... [16:37:40] 10Machine-Learning-Team: Implement new mediawiki.revision-score streams with Lift Wing - https://phabricator.wikimedia.org/T328576 (10elukey) [16:37:56] created the task to implement what we discussed during the last meeting --^ [16:40:55] one thing that troubles me with the httpbb tests is that they are all static and very big amount of work to maintain... [16:41:23] what happens with old revision ids etc.. [16:41:50] then again nevermind, a script can create/update the tests πŸ˜„ [16:43:11] (just thinking aloud) [16:46:58] * elukey nods [16:47:08] going afk for today folks, have a nice rest of the day :) [16:57:15] me2 folks, cu tomorrow! [17:05:04] me three :)