[04:53:00] (03CR) 10Abijeet Patro: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1013978 (owner: 10L10n-bot) [06:10:44] Good morning o/ [07:56:52] (03PS17) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) [07:57:23] (03CR) 10Ilias Sarantopoulos: huggingface: add huggingface image (033 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [08:12:22] (03CR) 10CI reject: [V:04-1] huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [09:10:54] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES: Use multilingual revert risk for anonymous edits - https://phabricator.wikimedia.org/T356280#9660607 (10Samwalton9-WMF) I think I'm right in saying that the MLRR model isn't ready/available to use yet, is that right @diego? [10:30:42] o/ [10:33:33] o/ aiko [10:35:55] (03PS4) 10AikoChou: revertrisk: improve error messages [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) [10:36:20] 06Machine-Learning-Team, 10drafttopic-modeling: drafttopic has two issue trackers - https://phabricator.wikimedia.org/T360990 (10Aklapper) 03NEW [10:39:25] (03CR) 10AikoChou: "Thanks for the input! I've changed the returned code to 422" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) (owner: 10AikoChou) [10:42:43] (03CR) 10AikoChou: [C:03+1] huggingface: add huggingface image (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [10:45:02] 06Machine-Learning-Team, 06Structured-Data-Backlog: Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9660836 (10kevinbazira) Thank you for sharing this information, @mfossati. Based on the requirements you've shared so far, we have worked on a first pass of the [[ http... [10:52:37] (03CR) 10Ilias Sarantopoulos: [C:03+1] "This seems better. I think 422 is much better than the 207 I suggested as we would want to be able to know that something is off. Nice!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) (owner: 10AikoChou) [10:53:07] Morning :) [10:53:56] (03CR) 10Klausman: [C:03+1] "Thanks!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) (owner: 10AikoChou) [10:54:33] (03PS18) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) [10:54:42] Hey Tobias! [10:57:07] Hi everyone. Which of the {drafttopic, articlequality, draftquality, editquality, ores} repositories are still deployed on WMF servers (and how could I find out myself)? If some are still deployed, is there an estimated undeployment date? Cannot find a Phab ticket about undeploying. If there is one, could you please share the ID? If there is none, could you please create one? TIA. [11:02:13] (03CR) 10AikoChou: [C:03+2] revertrisk: improve error messages [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) (owner: 10AikoChou) [11:11:37] (03Merged) 10jenkins-bot: revertrisk: improve error messages [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1011305 (https://phabricator.wikimedia.org/T351278) (owner: 10AikoChou) [11:12:32] andre: Hi! Thanks for following up on this! There is no undeployment date or plan as we speak. At the moment we have proceeded in a code/model freeze and will not update the models. Some of the models will be replaced while others will remain in the same status until they are not used or considered irrelevant/obsolete. However it is a good idea to a task to track and discuss this work. I will do it! [11:12:47] (03PS19) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) [11:22:10] andre: the list of available (as in: running on LiftWing) Revscoring models his here: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Revscoring_models_(migrated_from_ORES) [11:26:32] isaranto, klausman: Ah, thanks a lot for that info! I asked because of non-public https://phabricator.wikimedia.org/T213246 as we're wondering how to proceed there (if any one of you are interested in commenting over there) but that's already helpful to know! [11:28:03] andre: I don't have access to that task :( [11:28:58] isaranto: Eh, sorry, fixed now! [11:29:12] thaaank u [11:31:09] naah, thank you :) [11:49:22] * klausman lunch [12:07:58] (03CR) 10Ilias Sarantopoulos: [C:03+2] huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [12:24:54] (03CR) 10CI reject: [V:04-1] huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [12:28:42] * aiko lunch! [12:53:20] * isaranto lunch [13:36:50] hello folks :) [13:37:43] isaranto: o/ shouldn't we wait for the pytorch base image before merging the hf one? [13:38:44] also there are some things that i'd have liked to review before the +2 :) [13:41:48] (03CR) 10Elukey: "Post merge review :D" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [13:46:27] hello luca o/ [13:47:39] I filed another commit based on KI v0.6 https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/commit/d7a01a5d6271f37b344f2706a57c5904a425553f [13:48:03] (03CR) 10Ilias Sarantopoulos: huggingface: add huggingface image [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1009783 (https://phabricator.wikimedia.org/T357986) (owner: 10Ilias Sarantopoulos) [13:49:27] elukey: sure! you're right. I was a bit impatient so that I could try it out on the GPU. I will build the pytortch image you created and try it on top as there may be dependency issues [13:49:44] ci saved us, the patch is not merged :) [13:50:37] RRLA image publish failed.. weird :/ https://integration.wikimedia.org/ci/job/inference-services-pipeline-revertrisk-publish/100/console [13:51:39] isaranto: ack np! Also please not use inline sh scripts for entrypoints :D [13:53:23] ack! will change it to a bash script [13:54:16] aiko: o/ thanksss [13:56:55] aiko: there are some dep issues for pydantic :( [13:57:09] #24 18.46 knowledge-integrity 0.6.0 depends on pydantic<3.0.0 and >=2.1.1 [13:57:12] #24 18.46 knowledge-integrity[revertrisk-multilingual] 0.6.0 depends on pydantic<3.0.0 and >=2.1.1 [13:57:15] #24 18.46 fastapi 0.95.0 depends on pydantic!=1.7, !=1.7.1, !=1.7.2, !=1.7.3, !=1.8, !=1.8.1, <2.0.0 and >=1.6.2 [13:58:22] elukey: ahhh so https://phabricator.wikimedia.org/P58922 that's how I put in requirement.txt [13:59:43] we'll use a pre-release commit from kserve that fixes the pydantic issue [14:00:33] ackkkk [14:13:29] (03PS1) 10Jsn.sherman: [WIP] update revertrisk-language-agnostic min & desc [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014519 [14:14:37] 06Machine-Learning-Team, 10drafttopic-modeling: drafttopic has two issue trackers - https://phabricator.wikimedia.org/T360990#9661674 (10calbon) a:03isarantopoulos [14:16:46] 06Machine-Learning-Team: Investigate temporary high latency in revscoring service for wikidata - https://phabricator.wikimedia.org/T360894#9661690 (10calbon) a:03klausman [14:16:58] 06Machine-Learning-Team: Investigate temporary high latency in revscoring service for wikidata - https://phabricator.wikimedia.org/T360894#9661691 (10klausman) [14:23:15] 06Machine-Learning-Team, 13Patch-For-Review: Create a Pytorch base image - https://phabricator.wikimedia.org/T360638#9661711 (10calbon) [14:40:12] 06Machine-Learning-Team, 05Goal: Goal: Expand Lift Wing Cluster and add GPU capacity to production - https://phabricator.wikimedia.org/T353338#9661811 (10calbon) At risk because we don't have a GPU in the data centers yet. [14:41:47] 06Machine-Learning-Team, 05Goal: Goal: Implement caching for revertrisk-language-agnostic - https://phabricator.wikimedia.org/T353333#9661815 (10calbon) [14:47:25] 06Machine-Learning-Team: 14Investigate if it is possible to reduce torch's package size - 14https://phabricator.wikimedia.org/T359569#9661827 (10klausman) 05Open→03Resolved [14:52:45] hey everyone, I had a question: I'm trying to push my latest code to my branch but git keeps saying everything is up to date when it's not [14:53:58] does git status say you haveuncommitted changes, maybe? [14:56:57] I got it... sorry I was doing commit and not commit -m lol [14:57:52] ah, classic [15:00:36] bbiab, taking a short break [15:10:37] aiko: tested the image locally, it all works! [15:11:51] woooow \o/ [15:11:56] nice! [15:12:37] Nice work! [15:17:35] elukey: how did you solve the aiohttp build error? did you remove the lib from python's requirement? [15:17:48] aiko: upgraded to 3.9.0 [15:19:32] ack! [15:20:21] klausman: if you have time - I left a comment in https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1013335 related to the various python paths, lemme know if it makes sense [15:23:22] the TL;DR is that Python 3 (configured in blubber images) looks for libraries under /opt/etc.. (where we copy packages via Blubber statements) and also under /usr/lib/python3.11/etc.. [15:23:31] torch gets installed to the latter [15:23:44] (by Debian's pip) [15:24:06] I think it is nice since we'll be able to check straight away if anything weird is happening, like torch is installed in both places etc.. [15:24:33] this is for the whole team, lemme know if you agree :) [15:24:54] if it is too confusing I'll try to explain it again more verbosely [15:25:45] I don't see a mention of paths in that change, only about --break-system-packages [15:25:57] Oh, it's resolved, nvm :D [15:26:55] (also for a moment I was worried that the 'etc' in your statement above was literal :D) [15:27:30] so LGTM [15:29:46] ahahahha nono [15:36:22] 06Machine-Learning-Team, 06serviceops, 13Patch-For-Review: Bump memory for registry[12]00[34] VMs - https://phabricator.wikimedia.org/T360637#9662002 (10elukey) High level plan for codfw: * Book a mw infrastructure maintenance in the deployments wikitech page. * When the time comes, disable puppet on regist... [15:42:18] q - Is there a way to let CI re-run the image publish pipeline for RRLA? https://integration.wikimedia.org/ci/job/inference-services-pipeline-revertrisk-publish/ It failed in the last run [15:42:31] the prod image was successfully built in https://integration.wikimedia.org/ci/job/inference-services-pipeline-revertrisk/522/ but very weird CI seemed to get the wrong image to publish, so 'the image does not exist locally' [15:42:56] or should I create a dummy patch to trigger CI again? [15:43:43] aiko: there is away - you can login and hit Rebuild [15:43:49] *a way [15:46:41] ah ok thanks! I'll try that [15:48:13] lemme know if it works [15:48:16] I can help in case [15:48:31] isaranto: o/ for some reason I get pytorch with rocm5.7 in the base image [15:50:35] so the image is >10G? [15:52:10] elukey: yess rebuild works [15:55:11] isaranto: it is yes [15:55:16] and it has all the libs etc.. [15:55:20] ok, that's nice! [15:55:23] you can check in the code review's comments [15:55:30] no idea why it works on my side, even locally [15:56:26] anyway, tomorrow I should be able to upgrade the docker registry [15:56:45] and after that I'll build the pytorch image, so that you'll be unblocked [15:56:49] (hopefully) [16:01:59] I tried to find the code that builds this python index to fix it myself but I couldn't find it [16:02:42] 06Machine-Learning-Team: Deploy RevertRisk language-agnostic with knowledge integrity v0.6.0 - https://phabricator.wikimedia.org/T360423#9662122 (10achou) [16:02:45] maybe I was looking in the wrong place. But anyway the good thing is that someone picked up that task to resolve the issue [16:06:40] 06Machine-Learning-Team, 13Patch-For-Review: Deploy RevertRisk language-agnostic with knowledge integrity v0.6.0 - https://phabricator.wikimedia.org/T360423#9662148 (10achou) a:03achou [16:16:58] 06Machine-Learning-Team, 06Structured-Data-Backlog: Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9662204 (10CodeReviewBot) mfossati updated https://gitlab.wikimedia.org/mfossati/scriptz/-/merge_requests/6 Improve functionality [16:22:05] 06Machine-Learning-Team, 06Structured-Data-Backlog: Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9662253 (10mfossati) Hey @kevinbazira , I went through P58917, took the liberty of versioning it, and added some changes. Please have a look at https://gitlab.wikimedia... [16:29:14] 06Machine-Learning-Team, 06Structured-Data-Backlog: Host a logo detection model for Commons images - https://phabricator.wikimedia.org/T358676#9662292 (10mfossati) @kevinbazira: I'm hitting this ignored exception when running the code: ` Exception ignored in: Tr... [16:45:55] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Moderator-Tools-Team (Kanban): Exclude first revision on page from scoring - https://phabricator.wikimedia.org/T356281#9662420 (10jsn.sherman) a:03jsn.sherman [16:46:18] I was looking to undeploy some revscoring models from staging to free up resources but we don't have thaaat many per model type [16:46:57] We could be a bit more aggressive and keep just one deployment per model type e.g. for goodfaith only zhwiki, for damaging enwiki etc [16:47:03] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Moderator-Tools-Team (Kanban): Exclude first revision on page from scoring - https://phabricator.wikimedia.org/T356281#9662430 (10jsn.sherman) 05Open→03In progress [16:54:03] going afk folks! have a nice evening/rest of day! [17:02:09] bye Ilias o/ [17:02:13] o/ [17:37:01] (03PS1) 10Jsn.sherman: [WIP] Exclude first revision on page from scoring [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014572 (https://phabricator.wikimedia.org/T356281) [17:52:39] (03PS3) 10Jsn.sherman: Exclude first revision on page from scoring [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014572 (https://phabricator.wikimedia.org/T356281) [17:58:58] (03PS4) 10Jsn.sherman: Exclude first revision on page from scoring [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014572 (https://phabricator.wikimedia.org/T356281) [18:12:46] logging off o/ [18:12:50] have a nice rest of the day folks! [18:28:03] night all!