[07:16:37] <wikibugs>	 (03PS1) 10Kevin Bazira: Makefile: add support for langid [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1003032 (https://phabricator.wikimedia.org/T357382)
[07:25:29] <wikibugs>	 (03CR) 10Kevin Bazira: "To make reviewing easier, here are the commands I used to test the langid model-server build:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1003032 (https://phabricator.wikimedia.org/T357382) (owner: 10Kevin Bazira)
[08:16:33] <isaranto>	 Good morning *
[10:12:03] <wikibugs>	 (03PS6) 10AikoChou: revertrisk: use GPU for revertrisk-multilingual [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/995214 (https://phabricator.wikimedia.org/T356045)
[10:15:50] <wikibugs>	 (03CR) 10AikoChou: [C: 03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/995214 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou)
[10:22:45] <wikibugs>	 (03Merged) 10jenkins-bot: revertrisk: use GPU for revertrisk-multilingual [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/995214 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou)
[10:26:44] <klausman>	 Morning!
[10:39:15] <isaranto>	 o/ Tobias!
[10:41:41] <klausman>	 Hey Ilias :)
[11:01:29] <aiko>	 post-merge failed 0.0
[11:02:51] <aiko>	 waiting for it retrying..
[11:07:22] <wikibugs>	 10Machine-Learning-Team: Support running revertrisk-multilingual model-server via Makefile - https://phabricator.wikimedia.org/T356501 (10achou) 05Open→03Resolved
[11:13:36] <isaranto>	 it seems that it stopped while pushing the wikidata image
[11:35:07] <aiko>	 mmm but I saw the multilingual-publish failed https://integration.wikimedia.org/ci/job/inference-services-pipeline-revertrisk-multilingual-publish/66/execution/node/59/log/
[11:37:08] <aiko>	 I'm going to manually build
[11:38:35] <aiko>	 mm seems not possible
[11:42:13] <wikibugs>	 (03PS1) 10Ladsgroup: Migrate away from wfGetDB() [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1003397 (https://phabricator.wikimedia.org/T330641)
[11:43:08] <isaranto>	 I started one - let's see https://integration.wikimedia.org/ci/job/inference-services-pipeline-revertrisk-multilingual-publish/
[11:44:02] <wikibugs>	 (03PS2) 10Ilias Sarantopoulos: Makefile: add support for langid [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1003032 (https://phabricator.wikimedia.org/T357382) (owner: 10Kevin Bazira)
[11:44:15] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+1] Makefile: add support for langid [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1003032 (https://phabricator.wikimedia.org/T357382) (owner: 10Kevin Bazira)
[11:44:17] <aiko>	 ah thanks!
[11:44:36] <isaranto>	 kevinbazira: I just rebased the above patch. LGTM! nice job
[11:45:40] <kevinbazira>	 isaranto: great! thanks for the review :)
[11:45:58] <wikibugs>	 (03CR) 10Kevin Bazira: [C: 03+2] Makefile: add support for langid [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1003032 (https://phabricator.wikimedia.org/T357382) (owner: 10Kevin Bazira)
[11:47:06] <wikibugs>	 (03Merged) 10jenkins-bot: Makefile: add support for langid [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1003032 (https://phabricator.wikimedia.org/T357382) (owner: 10Kevin Bazira)
[11:47:16] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Migrate away from wfGetDB() [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1003397 (https://phabricator.wikimedia.org/T330641) (owner: 10Ladsgroup)
[11:48:58] * isaranto going for lunch!
[11:56:15] <wikibugs>	 (03PS2) 10Ladsgroup: Migrate away from wfGetDB() [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1003397 (https://phabricator.wikimedia.org/T330641)
[11:58:35] * klausman lunch
[12:10:41] <isaranto>	 the image seems to take too long to build
[12:28:24] <isaranto>	 oh my these gpu images take a ton of time to build
[12:28:34] <klausman>	 Any idea why?
[12:28:46] <isaranto>	 I'm building one locally to validate the new requirements for llms
[12:29:05] <klausman>	 I mean, the whole ROCm/Torch stack is big, so I'd expect it to take a bit, but I wonder if something else is going on.
[12:30:35] <isaranto>	 mainly cause of rocm, can't think of any other reason. the python package by itself is 1.5GB , so downloading it takes time (let alone installing)
[12:31:11] <isaranto>	 I'm curious though if we can just use a smaller one since we already have the required drivers etc
[12:31:43] <klausman>	 Maybe that should be a layer on the WMF Docker Registry, so it can be fetched in one go (and no install scripts need to run
[12:32:53] <isaranto>	 you mean building a custom base image and start from there?
[12:34:20] <klausman>	 yeah, basically.
[12:34:37] <klausman>	 At least with the components that will rarely change
[12:35:29] <isaranto>	 I agree, that's a good idea. for the moment we use the same torch + rocm version so it is doable (2.0.1 and 5.4.2 respectively)
[12:56:28] <aiko>	 I tried to build one locally but failed as well 
[12:57:47] <isaranto>	 can I help? what was the issue when building locally?
[12:57:48] <aiko>	 https://phabricator.wikimedia.org/P56759 weird there are some nvidia stuff 
[12:57:54] <isaranto>	 I just saw my build failed
[12:59:42] <aiko>	 maybe we also need to change pyproject.toml in knowledge integrity? ~"~ https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob/v0.3.0/pyproject.toml?ref_type=tags#L21 
[13:00:13] <isaranto>	 hmm the sha mismatch. I've encountered this in the past a few times but I can't recall what I was doing
[13:01:15] <isaranto>	 a it is using 1.13... We'll need to find a way to facilitate both versions (cpu and gpu). You're right though we should do sth in pyproject.toml. lemme check
[13:23:22] <isaranto>	 I see in https://download.pytorch.org/whl/rocm5.4.2/torch/ that there is no package available for our architecture(x86_x64) and pytorch 1.13.1 with rocm...
[13:24:14] <isaranto>	 we need either a x86_x64 version or a "manylinux" version, if I am not mistaken. klausman: am I right?
[13:24:56] <klausman>	 No, aarch64 is ARM64
[13:25:29] <klausman>	 The topmost ones, e.g, torch-2.0.1+rocm5.4.2-cp39-cp39-linux_x86_64.whl are the right arch
[13:26:26] <klausman>	 cp39 stands for cPython 3.9 (cPython being the standard Python implementation, written in C)
[13:27:09] <klausman>	 In that list, I don't see a 1.13.1 for Linux x86_64
[13:27:19] <isaranto>	 ack! I mentioned linux_x86_x64 nor aarch64
[13:27:37] <isaranto>	 my question is: also the manylinux distros work right?
[13:27:38] <klausman>	 I don't see a "manylinux" for x86_64
[13:28:55] <isaranto>	 a just read only if they are for x86_x64 e.g. manylinux2014_x86_64
[13:29:30] <isaranto>	 but yes in this case we don't have an option for rocm 5.4.2 and torch 1.13.1
[13:30:13] <isaranto>	 aiko: we could try bumping torch to 2.0.1 in ki repo. wdyt?
[13:37:37] <klausman>	 I find the absence of a whl for linux_x86_64 of 1.13.1 a bit odd
[13:38:57] <klausman>	 There _is_ one for Rocm5.4
[13:39:01] <klausman>	 https://download.pytorch.org/whl/rocm5.5/torch-2.1.2%2Brocm5.5-cp311-cp311-linux_x86_64.whl
[13:39:06] <klausman>	 er 5.5
[13:39:33] <klausman>	 Also for 5.3
[14:30:34] <chrisalbon>	 hey all
[14:30:42] <klausman>	 Hey Chris
[14:33:26] <isaranto>	 Hola!
[14:35:31] <aiko>	 isaranto: yeah we could try that! if there is no option for rocm 5.4.2 and torch 1.13.1
[14:35:45] <aiko>	 hi Chris
[14:42:36] <klausman>	 I will also take a look at getting a newer rocm into WMFs apt repository. I think the newest might be a bit too much (because it has other deps), but we'll see
[17:04:14] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: llm: enable quantization with AutoGPTQ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1003488 (https://phabricator.wikimedia.org/T354870)
[17:20:18] <isaranto>	 logging off folks o/ 
[17:20:33] <isaranto>	 have a nice evening/rest of your day
[18:57:21] <wikibugs>	 10Machine-Learning-Team, 10Goal: Goal: Implement caching for revertrisk-multilingual - https://phabricator.wikimedia.org/T353333 (10calbon) Possibly change to LA-revertrisk
[19:20:22] <chrisalbon>	 night ilias!