[07:42:17] morning folks :) [07:43:07] kevinbazira: o/ [07:43:27] qq - is drafttopic a new model kind? Namely, do you need a new k8s namespace etc..? [07:43:34] (as we did for revscoring-articletopic) [08:09:55] --- [08:10:11] I checked the release details for kserve 0.9 and there is an interesting thing [08:10:15] https://github.com/kserve/kserve/releases/tag/v0.9.0 [08:10:29] they added a nice support to scale transformers separately from predictor pods [08:10:34] morning o/ [08:10:42] so there is no more a 1:1 correspondence [08:10:56] we can have few transformers (all using async http calls) and more predictors [08:13:19] elukey: yes, drafttopic will need it's own namespace revscoring-drafttopic ... it has 10 models. [08:13:22] (03PS9) 10Elukey: editquality - add MWAPICache to preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915) [08:13:38] kevinbazira: ah perfect, I'll create all configs asap so you'll be unblocked [08:13:47] I am yet to create a task for migrating these models. will let you know soon as I've created it. [08:14:07] Thank you for helping with the configs. [08:17:08] (03CR) 10Elukey: editquality - add MWAPICache to preprocess (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915) (owner: 10Elukey) [08:18:52] kevinbazira: no problem, I can proceed anyway ahead of you, I'd just need the name of the k8s namespace. Is revscoring-drafttopic ok? (double t) [08:20:08] yep, revscoring-drafttopic is good. [08:22:52] super [08:23:44] (03PS10) 10Elukey: editquality - add MWAPICache to preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915) [08:25:38] (03CR) 10Elukey: editquality - add MWAPICache to preprocess (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915) (owner: 10Elukey) [08:34:29] starting to deploy the last set of articletopic isvcs to prod. [08:38:24] super [08:43:52] both eqiad and codfw prod deployments have been completed successfully. [08:43:52] checking pods now ... [08:47:42] all pods are up and running besides: [08:47:42] NAME READY STATUS RESTARTS AGE [08:47:42] wikidatawiki-articletopic-predictor-default-79n9q-deploymekd7rq 0/3 Init:CrashLoopBackOff 4 3m24s [08:47:42] now investigating the cause of this CrashLoopBackOff issue [08:53:45] the storage-initializer says it cant find the model in path articletopic/wikidatawiki/20220720074925/ [08:54:52] and that's true because is located in articletopic/wikidata/20220720074925/ [08:55:25] going to change the models path in Thanos to articletopic/wikidatawiki/20220720074925/ [08:58:21] (03CR) 10AikoChou: [C: 03+2] "LGTM! Let's see what happens in prod." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915) (owner: 10Elukey) [09:00:52] elukey: that's a nice support to scale transformers separately from predictor pods! I was wondering if that is possible. [09:03:20] (03Merged) 10jenkins-bot: editquality - add MWAPICache to preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915) (owner: 10Elukey) [09:03:27] thanks aiko :) [09:03:47] I am also testing Ray workers at the moment, to see if the model can go in its separate process [09:03:53] if so it would complete the picture [09:04:28] the docs say "KServe integrates RayServe which provides a programmable API to deploy models as separate python workers so the inference can be ran in parallel." [09:05:00] and in theory it is the same thing as I can see in https://phabricator.wikimedia.org/T309624#7994316 right? [09:10:27] yes, we can use RayServe to deploy 2 replicas for one model (the link you pasted), or 2 different models (https://phabricator.wikimedia.org/T309624#8005409) running in parallel. [09:11:03] aiko: I think that even a single replica for a model would be fine [09:11:19] the important bit in my opinion is to avoid running the model inside the tornado io loop [09:14:35] elukey: completely agree [09:17:25] aiko: where did you add https://phabricator.wikimedia.org/T309624#8002302 in our model.py code? [09:20:28] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Fix wikidatawiki articletopic predictor Init:CrashLoopBackOff issue - https://phabricator.wikimedia.org/T314278 (10kevinbazira) [09:22:06] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Fix wikidatawiki articletopic predictor Init:CrashLoopBackOff issue - https://phabricator.wikimedia.org/T314278 (10kevinbazira) The wikidata articletopic model has been moved to the location that the storage initializer expects i... [09:24:08] elukey: the model.py looks like this https://phabricator.wikimedia.org/P32119 [09:24:29] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Upload articletopic model binaries to storage - https://phabricator.wikimedia.org/T313305 (10kevinbazira) In T314278 the wikidata model has been moved to a new location as this is where the wikidatawiki-articletopic-predictor sto... [09:27:34] the wikidatawiki articletopic predictor CrashLoopBackOff issue has been fixed. [09:27:35] this means all articletopic pods are now up and running. [09:29:34] nice! [09:36:43] elukey: do you have time for a short meeting with me today? I encounter some problems when restarting the minikube on ml-sandbox.. [09:38:34] aiko: sure! [09:39:12] elukey: what time are you available? [09:39:45] aiko: check my calendar, we can do anytime after 3 pm [09:41:36] I am currently trying to create Ray workers, the code seems fine but I don't see the new processes [09:41:51] ahhh because I am stupid [09:42:02] elukey: meeting invite sent. Thanks :) [09:49:28] elukey: another question - how do I install wrk on deploy1002? I want to do some load tests for outlink model [09:51:04] aiko: ah snap they may have removed it [09:51:21] there is `siege` if you want to use it, but it is limited [09:51:25] I'll try to find an alternative [09:54:54] elukey: I got -bash: siege: command not found, maybe it has been removed as well [09:55:07] :( [10:08:07] aiko: they got removed recently for https://phabricator.wikimedia.org/T230178 [10:10:31] I asked to SRE what is the policy for those tools, let's see what is the answer [10:13:38] thanks Luca :) [10:17:43] aiko: for outlink I think that the performances will not be super great, the model takes a ton of time to execute and IIRC it is still running on tornado [10:25:46] going out for lunch, ttl [12:11:01] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Use non-blocking HTTP calls to get outlinks for Outlinks topic model - https://phabricator.wikimedia.org/T311043 (10achou) I found out the response time for outlink model highly depends on the input article. When the queried article is long and has many wi... [12:12:31] elukey: ^^^ something I just learned [13:01:46] aiko: ah nice! [13:01:49] good finding :) [13:11:45] elukey: are you coming to the meeting? :) [13:16:15] aiko: ah sorry! I was doing code reviews, for some reason I recalled 15:30 [13:16:17] joining [14:13:02] Morning all! [14:13:19] Its presentation making day for me [14:13:20] morning :) [14:20:05] 10artificial-intelligence, 10WMF-Inspiration-Week-2022-ML-Collab: Deploy Image content filtration model for Wikimedia Commons - https://phabricator.wikimedia.org/T279416 (10dmaza) [14:23:22] chrisalbon: https://github.com/scikit-learn/scikit-learn/pull/23936 - we have a great committer in our team :) [14:24:18] Amazing!!!! [14:24:30] We really need to keep track of you all's open source contributions [14:24:41] That is so cool Aiko! [14:29:54] kevinbazira: the revscoring-drafttopic deployment-charts settings are deployed :) [14:30:03] you can start anytime adding models to deployment-charts [14:30:52] * elukey taking a little break [15:23:43] aiko: how is it going with the ml sandbox? [15:31:50] elukey: Hi Luca, I re-installed the minio, but it still has the same error. keep investigating.. [15:40:21] chrisalbon: I also met scikit-learn folks in-person at europython! :) [15:40:31] That is so cool! [15:40:37] Love scikit-learn [16:07:55] going afk for today, have a nice rest of the day folks :) [16:52:41] kserve-test namespace was missing for the minio-service in the documentation.. that's why it couldn't find the endpoint URL: "http://minio-service.kserve-test:9000/xxxxxx" [16:53:29] updated the documentation https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/ML-Sandbox/Configuration#Minio [17:44:23] 10Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Preferences, 10Moderator-Tools-Team: 'Highlight likely problem edits' preference doesn't work in mobile web - https://phabricator.wikimedia.org/T314026 (10Samwalton9) [17:44:53] 10Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Preferences, 10Moderator-Tools-Team: 'Highlight likely problem edits' preference doesn't work in mobile web - https://phabricator.wikimedia.org/T314026 (10Samwalton9) These settings likely require the ORES extension to be enabled to test: https://www.mediaw... [17:45:38] 10Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Preferences, 10Moderator-Tools-Team (Kanban): 'Highlight likely problem edits' preference doesn't work in mobile web - https://phabricator.wikimedia.org/T314026 (10Samwalton9) p:05Triage→03Low