[07:13:57] (03CR) 10Kevin Bazira: [C:03+1] "I've tried the new error messages using this payload:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) (owner: 10AikoChou) [08:44:55] 06Machine-Learning-Team: Add Dragonfly to the ML k8s clusters - https://phabricator.wikimedia.org/T359416#9636703 (10JMeybohm) >>! In T359416#9624256, @elukey wrote: > Afaics from the logs the client were getting chunks of data every time from the registry (not the entire content at once), but I am wondering if... [09:50:24] good morning :) [10:33:03] 06Machine-Learning-Team: Support building and running of articletopic-outlink model-server via Makefile - https://phabricator.wikimedia.org/T360177#9636947 (10kevinbazira) Running into the error below which is caused by a missing `events` module. This module is used to [[ https://github.com/wikimedia/machinelear... [11:06:13] 06Machine-Learning-Team, 06Structured-Data-Backlog: Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9637065 (10mfossati) >>! In T358676#9629389, @kevinbazira wrote: > Thank you for providing details about the logo detection project, @mfossati! The ML team is excited t... [12:24:57] * aiko lunch [13:08:51] (03PS2) 10AikoChou: revertrisk: improve error messages [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) [13:09:57] hello folks! [13:12:43] o/ hi luca [13:14:53] o/ [13:15:55] aiko: if you have time, I have a question - I tried the new readability docker image on friday (in staging) but the code failed since pyopencl was not present. Is it because we don't install the dependencies of the 'python''s requirement.txt? [13:16:28] I know that Ilias opened a task about rolling out the code to all model servers, I wanted to know if the dep issue is known or not [13:24:49] found it https://phabricator.wikimedia.org/T360212 [13:24:58] 06Machine-Learning-Team: Add Dragonfly to the ML k8s clusters - https://phabricator.wikimedia.org/T359416#9637830 (10elukey) >>! In T359416#9636703, @JMeybohm wrote: >>>! In T359416#9624256, @elukey wrote: >> Afaics from the logs the client were getting chunks of data every time from the registry (not the entire... [13:26:57] elukey: yeah I think that is the reason. we didn't install python's requirement.txt in blubber file [13:27:34] should be added like this https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/.pipeline/revertrisk/revertrisk.yaml#16 [13:29:33] (03PS1) 10Elukey: Force readability's Blubber config to add python's dir deps [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1012375 (https://phabricator.wikimedia.org/T360212) [13:29:36] aiko: --^ :) [13:33:49] (03CR) 10AikoChou: [C:03+1] Force readability's Blubber config to add python's dir deps [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1012375 (https://phabricator.wikimedia.org/T360212) (owner: 10Elukey) [13:35:20] thanksss [13:35:44] (03CR) 10Elukey: [C:03+2] Force readability's Blubber config to add python's dir deps [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1012375 (https://phabricator.wikimedia.org/T360212) (owner: 10Elukey) [13:53:14] Morning all [13:54:12] o/ [13:55:32] \o [14:59:04] aiko: the trick worked for readability! But now I tested the new image and the torch's threads don't seems to work [14:59:12] I see from the logs that they are set correctly [14:59:36] but if I call the model server I see it hanging [15:07:36] hmmm :( [15:15:34] it may be https://github.com/pytorch/pytorch/issues/16894#issuecomment-461871456 [15:16:04] even if torch 1.13 (used by readability) was released in 2022 [15:17:20] 06Machine-Learning-Team: Add Dragonfly to the ML k8s clusters - https://phabricator.wikimedia.org/T359416#9638302 (10JMeybohm) That's right. By default the supernode does act as a CDN in front of the docker-registry but I intentionally disabled that behavior as there's no benefit of that in our infra. I would as... [15:19:35] maybe they haven't fixed it 0.0 so looks like we should turn back to use OMP_NUM_THREADS [15:33:45] (03PS1) 10Elukey: readability: add entrypoint to set environment variables [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1012398 (https://phabricator.wikimedia.org/T360111) [15:33:53] aiko: yeah, I filed another proposal :( [16:00:47] elukey: I think the new proposal is good. do we want to set the NUM_THREADS for catboost in the entrypoint.sh as well? [16:10:02] aiko: thanks! So in theory with the new version it should auto-recognize the number of threads by itself, but I don't recall if there was a specific env variable tbh [16:10:17] but we can add anything to the entrypoint, maybe we can have a generic one shared by all [16:18:02] elukey: ahh I meant line 133 in model.py. the param thread_count is set by the NUM_THREADS, and it also uses get_cpu_count() [16:21:40] so it is the same as OMP_NUM_THREADS [16:23:28] ah okok yes! We could in theory get rid of NUM_THREADS at this point, lemme amend the patch [16:32:50] (03PS2) 10Elukey: readability: add entrypoint to set environment variables [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1012398 (https://phabricator.wikimedia.org/T360111) [16:32:53] done :) [16:34:55] 06Machine-Learning-Team, 13Patch-For-Review: Set automatically libomp's num threads when using Pytorch - https://phabricator.wikimedia.org/T360111#9638729 (10elukey) Tried setting the number of threads via torch's library directly in the code, but unfortunately it didn't work (at least with 1.13.0). Tried a di... [16:45:18] (03CR) 10AikoChou: [C:03+1] "LGTM! only a small typo :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1012398 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [16:47:09] (03PS3) 10Elukey: readability: add entrypoint to set environment variables [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1012398 (https://phabricator.wikimedia.org/T360111) [16:47:17] (03CR) 10Elukey: readability: add entrypoint to set environment variables (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1012398 (https://phabricator.wikimedia.org/T360111) (owner: 10Elukey) [16:47:25] aiko: thanks for the review! [16:48:56] logging off for today, o/ [16:49:16] have a nice rest of the day folks [16:51:11] o/ bye Luca :D [16:59:26] night elukey [17:09:38] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES: breaking change to WatchedItemQueryServiceExtension causes ci failure for ORES - https://phabricator.wikimedia.org/T360352 (10jsn.sherman) 03NEW [17:11:57] (03PS1) 10Jsn.sherman: Update WatchedItemQueryServiceExtension typehint [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1012411 (https://phabricator.wikimedia.org/T360352) [17:14:36] (03CR) 10CI reject: [V:04-1] Update WatchedItemQueryServiceExtension typehint [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1012411 (https://phabricator.wikimedia.org/T360352) (owner: 10Jsn.sherman) [17:16:49] (03PS2) 10Jsn.sherman: Update WatchedItemQueryServiceExtension typehint [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1012411 (https://phabricator.wikimedia.org/T360352) [17:18:35] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 13Patch-For-Review: change to WatchedItemQueryServiceExtension signature type hint causes phan error for ORES - https://phabricator.wikimedia.org/T360352#9639078 (10jsn.sherman) [17:21:54] (03CR) 10Jsn.sherman: "Any chance I could get a quick review to resolve a phan error in CI?" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1012411 (https://phabricator.wikimedia.org/T360352) (owner: 10Jsn.sherman) [17:22:57] (03CR) 10Ladsgroup: [C:03+2] Update WatchedItemQueryServiceExtension typehint [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1012411 (https://phabricator.wikimedia.org/T360352) (owner: 10Jsn.sherman) [17:26:19] (03Merged) 10jenkins-bot: Update WatchedItemQueryServiceExtension typehint [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1012411 (https://phabricator.wikimedia.org/T360352) (owner: 10Jsn.sherman) [17:31:43] (03PS2) 10Umherirrender: Type hint IReadableDatabase in WatchedItemQueryServiceExtension [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1011172 [17:31:57] (03Abandoned) 10Umherirrender: Type hint IReadableDatabase in WatchedItemQueryServiceExtension [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1011172 (owner: 10Umherirrender) [17:34:56] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES: 14change to WatchedItemQueryServiceExtension signature type hint causes phan error for ORES - 14https://phabricator.wikimedia.org/T360352#9639165 (10Umherirrender) 05Open→03Resolved a:03jsn.sherman 14https://gerrit.wikimedia.org/r/c/mediawiki/ext... [17:37:33] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES: 14change to WatchedItemQueryServiceExtension signature type hint causes phan error for ORES - 14https://phabricator.wikimedia.org/T360352#9639194 (10jsn.sherman) 14>>! In T360352#9639165, @Umherirrender wrote: > https://gerrit.wikimedia.org/r/c/mediawi... [17:56:13] logging off! [17:57:44] night aiko! [19:11:03] 07artificial-intelligence, 06Machine-Learning-Team, 10Diffusion, 10editquality-modeling, 10Release-Engineering-Team (Seen): 14Gerrit repo scoring/ores/editquality not mirroring - 14https://phabricator.wikimedia.org/T224996#9639489 (10Aklapper) [20:22:26] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Special-pages: Entries on Special:Version page not alphabetically sorted (as ORES extension is listed as "Machine Learning Platform") - https://phabricator.wikimedia.org/T356566#9639874 (10Aklapper) p:05Triage→03Low