[07:42:17] <elukey>	 morning folks :)
[07:43:07] <elukey>	 kevinbazira: o/
[07:43:27] <elukey>	 qq - is drafttopic a new model kind? Namely, do you need a new k8s namespace etc..?
[07:43:34] <elukey>	 (as we did for revscoring-articletopic)
[08:09:55] <elukey>	 ---
[08:10:11] <elukey>	 I checked the release details for kserve 0.9 and there is an interesting thing
[08:10:15] <elukey>	 https://github.com/kserve/kserve/releases/tag/v0.9.0
[08:10:29] <elukey>	 they added a nice support to scale transformers separately from predictor pods
[08:10:34] <kevinbazira>	 morning o/
[08:10:42] <elukey>	 so there is no more a 1:1 correspondence
[08:10:56] <elukey>	 we can have few transformers (all using async http calls) and more predictors
[08:13:19] <kevinbazira>	 elukey: yes, drafttopic will need it's own namespace revscoring-drafttopic ... it has 10 models.
[08:13:22] <wikibugs>	 (03PS9) 10Elukey: editquality - add MWAPICache to preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915)
[08:13:38] <elukey>	 kevinbazira: ah perfect, I'll create all configs asap so you'll be unblocked
[08:13:47] <kevinbazira>	 I am yet to create a task for migrating these models. will let you know soon as I've created it.
[08:14:07] <kevinbazira>	 Thank you for helping with the configs.
[08:17:08] <wikibugs>	 (03CR) 10Elukey: editquality - add MWAPICache to preprocess (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915) (owner: 10Elukey)
[08:18:52] <elukey>	 kevinbazira: no problem, I can proceed anyway ahead of you, I'd just need the name of the k8s namespace. Is revscoring-drafttopic ok? (double t)
[08:20:08] <kevinbazira>	 yep, revscoring-drafttopic is good.
[08:22:52] <elukey>	 super
[08:23:44] <wikibugs>	 (03PS10) 10Elukey: editquality - add MWAPICache to preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915)
[08:25:38] <wikibugs>	 (03CR) 10Elukey: editquality - add MWAPICache to preprocess (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915) (owner: 10Elukey)
[08:34:29] <kevinbazira>	 starting to deploy the last set of articletopic isvcs to prod.
[08:38:24] <elukey>	 super
[08:43:52] <kevinbazira>	 both eqiad and codfw prod deployments have been completed successfully.
[08:43:52] <kevinbazira>	 checking pods now ...
[08:47:42] <kevinbazira>	 all pods are up and running besides:
[08:47:42] <kevinbazira>	 NAME                                                              READY   STATUS                  RESTARTS   AGE
[08:47:42] <kevinbazira>	 wikidatawiki-articletopic-predictor-default-79n9q-deploymekd7rq   0/3     Init:CrashLoopBackOff   4          3m24s
[08:47:42] <kevinbazira>	 now investigating the cause of this CrashLoopBackOff issue
[08:53:45] <kevinbazira>	 the storage-initializer says it cant find the model in path articletopic/wikidatawiki/20220720074925/
[08:54:52] <kevinbazira>	 and that's true because is located in articletopic/wikidata/20220720074925/
[08:55:25] <kevinbazira>	 going to change the models path in Thanos to articletopic/wikidatawiki/20220720074925/
[08:58:21] <wikibugs>	 (03CR) 10AikoChou: [C: 03+2] "LGTM! Let's see what happens in prod." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915) (owner: 10Elukey)
[09:00:52] <aiko>	 elukey: that's a nice support to scale transformers separately from predictor pods! I was wondering if that is possible.
[09:03:20] <wikibugs>	 (03Merged) 10jenkins-bot: editquality - add MWAPICache to preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/818067 (https://phabricator.wikimedia.org/T313915) (owner: 10Elukey)
[09:03:27] <elukey>	 thanks aiko :)
[09:03:47] <elukey>	 I am also testing Ray workers at the moment, to see if the model can go in its separate process
[09:03:53] <elukey>	 if so it would complete the picture
[09:04:28] <elukey>	 the docs say "KServe integrates RayServe which provides a programmable API to deploy models as separate python workers so the inference can be ran in parallel."
[09:05:00] <elukey>	 and in theory it is the same thing as I can see in https://phabricator.wikimedia.org/T309624#7994316 right?
[09:10:27] <aiko>	 yes, we can use RayServe to deploy 2 replicas for one model (the link you pasted), or 2 different models (https://phabricator.wikimedia.org/T309624#8005409) running in parallel.
[09:11:03] <elukey>	 aiko: I think that even a single replica for a model would be fine
[09:11:19] <elukey>	 the important bit in my opinion is to avoid running the model inside the tornado io loop
[09:14:35] <aiko>	 elukey: completely agree
[09:17:25] <elukey>	 aiko: where did you add https://phabricator.wikimedia.org/T309624#8002302 in our model.py code?
[09:20:28] <wikibugs>	 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Fix wikidatawiki articletopic predictor Init:CrashLoopBackOff issue - https://phabricator.wikimedia.org/T314278 (10kevinbazira)
[09:22:06] <wikibugs>	 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Fix wikidatawiki articletopic predictor Init:CrashLoopBackOff issue - https://phabricator.wikimedia.org/T314278 (10kevinbazira) The wikidata articletopic model has been moved to the location that the storage initializer expects i...
[09:24:08] <aiko>	 elukey: the model.py looks like this https://phabricator.wikimedia.org/P32119
[09:24:29] <wikibugs>	 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Upload articletopic model binaries to storage - https://phabricator.wikimedia.org/T313305 (10kevinbazira) In T314278 the wikidata model has been moved to a new location as this is where the wikidatawiki-articletopic-predictor sto...
[09:27:34] <kevinbazira>	 the wikidatawiki articletopic predictor CrashLoopBackOff issue has been fixed.
[09:27:35] <kevinbazira>	 this means all articletopic pods are now up and running. 
[09:29:34] <elukey>	 nice!
[09:36:43] <aiko>	 elukey: do you have time for a short meeting with me today? I encounter some problems when restarting the minikube on ml-sandbox..
[09:38:34] <elukey>	 aiko: sure!
[09:39:12] <aiko>	 elukey: what time are you available?
[09:39:45] <elukey>	 aiko: check my calendar, we can do anytime after 3 pm
[09:41:36] <elukey>	 I am currently trying to create Ray workers, the code seems fine but I don't see the new processes
[09:41:51] <elukey>	 ahhh because I am stupid
[09:42:02] <aiko>	 elukey: meeting invite sent. Thanks :)
[09:49:28] <aiko>	 elukey: another question - how do I install wrk on deploy1002? I want to do some load tests for outlink model
[09:51:04] <elukey>	 aiko: ah snap they may have removed it
[09:51:21] <elukey>	 there is `siege` if you want to use it, but it is limited
[09:51:25] <elukey>	 I'll try to find an alternative
[09:54:54] <aiko>	 elukey: I got -bash: siege: command not found, maybe it has been removed as well
[09:55:07] <elukey>	 :(
[10:08:07] <elukey>	 aiko: they got removed recently for https://phabricator.wikimedia.org/T230178
[10:10:31] <elukey>	 I asked to SRE what is the policy for those tools, let's see what is the answer
[10:13:38] <aiko>	 thanks Luca :)
[10:17:43] <elukey>	 aiko: for outlink I think that the performances will not be super great, the model takes a ton of time to execute and IIRC it is still running on tornado 
[10:25:46] <elukey>	 going out for lunch, ttl
[12:11:01] <wikibugs>	 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Use non-blocking HTTP calls to get outlinks for Outlinks topic model - https://phabricator.wikimedia.org/T311043 (10achou) I found out the response time for outlink model highly depends on the input article.   When the queried article is long and has many wi...
[12:12:31] <aiko>	 elukey: ^^^ something I just learned
[13:01:46] <elukey>	 aiko: ah nice!
[13:01:49] <elukey>	 good finding :)
[13:11:45] <aiko>	 elukey: are you coming to the meeting? :)
[13:16:15] <elukey>	 aiko: ah sorry! I was doing code reviews, for some reason I recalled 15:30
[13:16:17] <elukey>	 joining
[14:13:02] <chrisalbon>	 Morning all!
[14:13:19] <chrisalbon>	 Its presentation making day for me
[14:13:20] <elukey>	 morning :)
[14:20:05] <wikibugs>	 10artificial-intelligence, 10WMF-Inspiration-Week-2022-ML-Collab: Deploy Image content filtration model for Wikimedia Commons - https://phabricator.wikimedia.org/T279416 (10dmaza)
[14:23:22] <elukey>	 chrisalbon: https://github.com/scikit-learn/scikit-learn/pull/23936 - we have a great committer in our team :)
[14:24:18] <chrisalbon>	 Amazing!!!!
[14:24:30] <chrisalbon>	 We really need to keep track of you all's open source contributions
[14:24:41] <chrisalbon>	 That is so cool Aiko!
[14:29:54] <elukey>	 kevinbazira: the revscoring-drafttopic deployment-charts settings are deployed :)
[14:30:03] <elukey>	 you can start anytime adding models to deployment-charts
[14:30:52] * elukey taking a little break
[15:23:43] <elukey>	 aiko: how is it going with the ml sandbox?
[15:31:50] <aiko>	 elukey: Hi Luca, I re-installed the minio, but it still has the same error. keep investigating..
[15:40:21] <aiko>	 chrisalbon: I also met scikit-learn folks in-person at europython! :)
[15:40:31] <chrisalbon>	 That is so cool!
[15:40:37] <chrisalbon>	 Love scikit-learn
[16:07:55] <elukey>	 going afk for today, have a nice rest of the day folks :)
[16:52:41] <aiko>	 kserve-test namespace was missing for the minio-service in the documentation.. that's why it couldn't find the endpoint URL: "http://minio-service.kserve-test:9000/xxxxxx"
[16:53:29] <aiko>	 updated the documentation https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/ML-Sandbox/Configuration#Minio
[17:44:23] <wikibugs>	 10Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Preferences, 10Moderator-Tools-Team: 'Highlight likely problem edits' preference doesn't work in mobile web - https://phabricator.wikimedia.org/T314026 (10Samwalton9)
[17:44:53] <wikibugs>	 10Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Preferences, 10Moderator-Tools-Team: 'Highlight likely problem edits' preference doesn't work in mobile web - https://phabricator.wikimedia.org/T314026 (10Samwalton9) These settings likely require the ORES extension to be enabled to test: https://www.mediaw...
[17:45:38] <wikibugs>	 10Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Preferences, 10Moderator-Tools-Team (Kanban): 'Highlight likely problem edits' preference doesn't work in mobile web - https://phabricator.wikimedia.org/T314026 (10Samwalton9) p:05Triage→03Low