[05:51:50] (03CR) 10Kevin Bazira: [C:03+1] "LGTM! The two HTTP response status codes are returned in the two scenarios mentioned in the commit message. I just have one question about" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1016341 (https://phabricator.wikimedia.org/T360406) (owner: 10AikoChou) [06:45:33] Guten tag! [08:40:58] kevinbazira: o/ Do you by any chance know where I can find the training dataset for the logo detection model? [08:42:23] just wondering if it has been shared. Otherwise I'll ask mfossati for it [08:42:29] isaranto: o/ [08:44:10] In https://phabricator.wikimedia.org/T358676 Marco shared the code he used for training: https://gitlab.wikimedia.org/mfossati/scriptz/-/blob/fbbef87dbd7d1ac4958754a53cc3643376fcdf12/multiclass_efficientnet.py [08:45:00] that code shows `INPUT_DIR = '/home/mfossati/{clazz}/'` [08:47:48] yes, but that is just a local dir. I was just wondering if he had shared anything else with you. I'll ask him in the task if he can provide it then. Thanks! [08:48:24] I'm exploring if we can use pytorch instead. I believe it will make hosting a lot easier [08:51:51] sure sure, he didn't share the training set with me. It is likely in `/home/mfossati/` on stat1008. [08:51:51] Please reach out to him, he'll share the exact location. [09:00:18] a ok, I didn't think about looking in statbox. thanks! [09:02:24] would it be easier to first ask Marco if it is easy for them to switch the backend to torch? we can let them know the work of pytorch base image we're working on [09:03:19] good mooorning o/ [09:05:58] they use Keras 3 and it supports torch backend [09:07:08] hey Aiko good morning! [09:07:23] but they'll need to modify their training codes for sure [09:07:33] hiii Ilias :) [09:08:03] yeah that's a better idea. I just wanted to check if it is feasible but probably it will be easier for them to check [09:09:15] also sending my <3 to Taiwan, what a tragedy :( [09:10:01] I just saw the news [09:15:10] yeah that was a strong earthquake in many years.. luckily my family in Taiwan are safe. thank you! 🤗 [09:15:19] <3 [09:24:26] kevinbazira: the question about how end users access the batch isvc is a good one. thanks for raising it! I'll draft some proposals today [09:26:14] aiko: o/ [09:26:23] good to know your family is safe [09:26:54] isaranto: great. looking forward to the proposal :) [09:34:20] kevinbazira: thanks :) [09:34:32] isaranto: o/ have you had time to check the inconsistent load test results for revertrisk? [09:37:37] 06Machine-Learning-Team, 10Structured-Data-Backlog (Current Work): Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9683491 (10CodeReviewBot) mfossati merged https://gitlab.wikimedia.org/mfossati/scriptz/-/merge_requests/8 lw_prototype: image download error handling [10:01:55] 06Machine-Learning-Team, 13Patch-For-Review, 10Structured-Data-Backlog (Current Work): Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9683574 (10CodeReviewBot) kevinbazira opened https://gitlab.wikimedia.org/mfossati/scriptz/-/merge_requests/9 lw_prototype: add Lo... [10:37:07] 06Machine-Learning-Team, 10Structured-Data-Backlog (Current Work): Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9683661 (10isarantopoulos) Hi @mfossati ! Thanks a lot for all this great work! I was wondering if you had tried to train the same model using pytorch as... [10:55:13] aiko: shall we merge this one https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1015341? [11:08:34] yep! [11:08:55] (03CR) 10AikoChou: [C:03+2] locust: fix missing host header for revertrisk load tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1015341 (https://phabricator.wikimedia.org/T361234) (owner: 10AikoChou) [11:11:05] (03CR) 10AikoChou: [V:03+2 C:03+2] locust: fix missing host header for revertrisk load tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1015341 (https://phabricator.wikimedia.org/T361234) (owner: 10AikoChou) [11:12:20] (03CR) 10Ilias Sarantopoulos: revertrisk: error handling for batch requests (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1016341 (https://phabricator.wikimedia.org/T360406) (owner: 10AikoChou) [11:13:26] aiko: regarding error handling for batch requests - sorry for the late review above --^ I was wondering if anything else stops us from serving multiple languages in the same request (maybe I am missing sth) [11:21:56] isaranto: no there is no other reason other than reusing the async session. at first I was thinking to make it simple for the initial verison [11:22:15] I think it is good point given we'll limit the num of requests [11:22:19] cool cool [11:22:28] let's do it like that, one step at a time [11:22:28] and we can group the requests for the same wiki [11:23:38] (03CR) 10Ilias Sarantopoulos: [C:03+1] revertrisk: error handling for batch requests (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1016341 (https://phabricator.wikimedia.org/T360406) (owner: 10AikoChou) [11:26:07] thanks for the review :) [11:34:51] * isaranto lunch! [12:42:02] hello folks [12:44:41] hey Luca! [13:16:21] Good morning all [13:42:17] Hey Chris! hope everything is ok 🤞 [13:56:54] folks I may be a little late, there was a k8s (wikikube) outage [13:59:21] ruh orh [13:59:48] ack [14:04:40] 06Machine-Learning-Team, 10Structured-Data-Backlog (Current Work): Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9684510 (10CodeReviewBot) mfossati merged https://gitlab.wikimedia.org/mfossati/scriptz/-/merge_requests/9 lw_prototype: add LogoDetectionModel class [14:11:17] 06Machine-Learning-Team, 10Structured-Data-Backlog (Current Work): Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9684535 (10mfossati) >>! In T358676#9683661, @isarantopoulos wrote: > I was wondering if you had tried to train the same model using pytorch as a keras b... [15:01:29] isaranto: I need to get a review from serviceops for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1016798 and then we should be good to go with the pytorch base image [15:02:01] for the moment let's duplicate the config, I'll study something to hopefully make it more generic [15:03:01] ack [15:03:12] elukey: what's the weekly rebuild here -> https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1015530/8..9? [15:03:36] isaranto: it is something that runs as cron on the build node, set up by SRE [15:03:56] it happens every sunday IIRC [15:04:13] in theory it should allow us to verify that our builds are not broken [15:05:09] cool! [15:11:42] shall I keep the `-1` at the end of the version name in my patch as well? e.g. `2.1.2rocm5.5-1` instead of `2.1.2rocm5.5`? [15:12:42] yep, we'll use it in case we need to apply fixes etc.. to the Dockerfile [15:13:12] I am currently reworking a little the comments: what do you think about adding a common README.md and point people to it (with a comment) in the Dockerfiles? [15:13:19] so we don't copy/paste [15:14:55] yeah that sounds much better [15:15:02] filing a patch in a sec [15:18:32] I'll update my patch accordingly. You'll have to push the image though after we merge as I don't have root access [15:18:45] 5 euros :) [15:20:50] + tip for the review :D [15:24:17] I'm logging off folks, have a nice evening /rest of day [15:28:05] created the change https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1016807 [15:32:32] I +1 and I update the pytorch21-rocm55 patch accordingly! [15:32:53] now I am officially logging off :) [15:37:39] thank youuu, merged [15:41:04] logging off a little earlier folks to take care of Alessandro [15:41:30] hopefully tomorrow we should be able to have pytorch 2.2 and 2.1 in the Docker registry [15:41:34] fingers crossed. [15:41:38] have a nice rest of the day! [15:47:31] bye Luca and Ilias! have a nice evening :) [16:20:55] 06Machine-Learning-Team, 13Patch-For-Review: Error handling in Batch Predictions for RevertRisk Models - https://phabricator.wikimedia.org/T360406#9685089 (10achou) @kevinbazira posed a question - how can end users switch between batch and non-batch requests? First to clarify, the batch model can also handle...