[06:37:37] (03CR) 10Kevin Bazira: Makefile: add support for revertrisk-multilingual (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/995198 (https://phabricator.wikimedia.org/T356501) (owner: 10AikoChou) [07:06:40] Good morning! [07:57:10] * isaranto running an errand - bbl [10:45:43] Morning! I'll do the roll-drain thing for serve-codfw today, unless there are any objections in the next few minutes [11:07:15] Morning Tobias! Sure, go ahead! [11:08:37] For some reason, readability-predictor-default is misbehaving [11:08:45] It's stuck in PodInitializing [11:10:13] Ah, it's stuck downloading the image [11:14:17] Ok, cordone the stuck machine, killed the pod, it started fine on another uncordoned machine. Still not sure how the docker pull got stuck [11:14:29] uncordoned machine and now proceeding with the rest [11:25:38] 10Machine-Learning-Team: Support building and running of article-descriptions model-server via Makefile - https://phabricator.wikimedia.org/T356176 (10kevinbazira) 05Open→03Resolved Support for building the article-descriptions model-server using the Makefile was added and it can be tested using: ` # first t... [11:25:42] 10Machine-Learning-Team: Add a script for running the Revert Risk model server locally - https://phabricator.wikimedia.org/T352689 (10kevinbazira) [11:33:12] 10Machine-Learning-Team: Maintain models directory structure for model-server make builds to remain consistent with the analytics repo - https://phabricator.wikimedia.org/T356985 (10kevinbazira) [11:35:20] 10Machine-Learning-Team: Maintain models directory structure for model-server make builds to remain consistent with the analytics repo - https://phabricator.wikimedia.org/T356985 (10kevinbazira) p:05Triage→03Medium [11:36:15] 10Machine-Learning-Team: Maintain models directory structure for model-server make builds to remain consistent with the analytics repo - https://phabricator.wikimedia.org/T356985 (10kevinbazira) 05Open→03In progress [11:36:17] 10Machine-Learning-Team: Add a script for running the Revert Risk model server locally - https://phabricator.wikimedia.org/T352689 (10kevinbazira) [11:40:15] (03PS1) 10Kevin Bazira: Makefile: maintain models directory structure as analytics repo [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998283 (https://phabricator.wikimedia.org/T356985) [11:42:20] (03CR) 10Kevin Bazira: "To make reviewing easier, here are the commands I used to test the 2 model-server builds:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998283 (https://phabricator.wikimedia.org/T356985) (owner: 10Kevin Bazira) [12:40:56] roll-restart is done, but there are some non-LW issues (docker-registry is very slow for some unknown reason). I'll help SRE investigate. [12:53:30] ack! [12:53:57] * isaranto lunch! [13:01:58] ditto [14:24:00] * isaranto sighs [14:24:32] the things I have been trying with dumb-init won't work. perhaps I got it wrong [14:29:23] Jumping on revertrisk multilingual upgrade! [14:51:20] Good luck! [14:55:42] (03CR) 10Ilias Sarantopoulos: "One small nit, other than that it works great!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998283 (https://phabricator.wikimedia.org/T356985) (owner: 10Kevin Bazira) [15:05:00] (03PS1) 10Ilias Sarantopoulos: rrml: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998946 (https://phabricator.wikimedia.org/T347551) [15:10:19] (03CR) 10CI reject: [V: 04-1] rrml: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998946 (https://phabricator.wikimedia.org/T347551) (owner: 10Ilias Sarantopoulos) [15:35:20] (03PS2) 10Ilias Sarantopoulos: rrml: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998946 (https://phabricator.wikimedia.org/T347551) [15:46:30] also I followed up on catboost releases https://github.com/catboost/catboost/discussions/2592 [15:48:09] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade Revert Risk Multilingual docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347551 (10isarantopoulos) Since the latest catboost release is still pending we discussed in proceeding without it for now by manually limiting the number of threads.... [15:48:22] isaranto: o/ there is also a catboost community channel on telegram, one of the main dev is very active.. others are asking for the new release, it was scheduled last month but I think that they are waiting for new stuff to be merged/fixed [15:50:30] Luca! thanks for mentioning that. I totally missed the telegram channel [15:52:09] I found it by chance, tried to check every now and then but no mention of a release :( [15:52:13] soooo long [15:52:15] <3 [16:11:56] (03PS2) 10Kevin Bazira: Makefile: maintain models directory structure as analytics repo [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998283 (https://phabricator.wikimedia.org/T356985) [16:13:27] (03CR) 10Kevin Bazira: Makefile: maintain models directory structure as analytics repo (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998283 (https://phabricator.wikimedia.org/T356985) (owner: 10Kevin Bazira) [16:34:48] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "Nice!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998283 (https://phabricator.wikimedia.org/T356985) (owner: 10Kevin Bazira) [16:47:38] (03CR) 10Kevin Bazira: [V: 03+2 C: 03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998283 (https://phabricator.wikimedia.org/T356985) (owner: 10Kevin Bazira) [16:57:55] (03PS3) 10Ilias Sarantopoulos: rrml: upgrade kserve to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/998946 (https://phabricator.wikimedia.org/T347551) [17:21:02] heading out now, see y'all tomorrow! [17:31:19] good evening Tobias, I'm heading out as well o/ [23:27:54] 10Machine-Learning-Team, 10Wikipedia-Android-App-Backlog (Android Release - FY2023-24): Migrate Machine-generated Article Descriptions from toolforge to liftwing. - https://phabricator.wikimedia.org/T343123 (10Isaac) This is very useful (and exciting) data -- thank you @isarantopoulos ! > shall I use the tes...