[08:08:07] Good morning, have a great week everyone! [08:16:45] (03PS5) 10Kevin Bazira: Makefile: add support for article-descriptions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993481 (https://phabricator.wikimedia.org/T356176) [08:17:47] (03CR) 10Kevin Bazira: Makefile: add support for article-descriptions (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/993481 (https://phabricator.wikimedia.org/T356176) (owner: 10Kevin Bazira) [10:19:53] my internet router killed its root dsik last night, currently trying ro repair [10:57:09] o/ Tobias! [10:57:17] sounds like the revenge of the machines [11:43:51] kevinbazira: I tried to run the article model server by running `make article_descriptions` and got an error for kserve [11:43:51] https://phabricator.wikimedia.org/P56202 [11:44:00] have you encountered that? [11:44:17] I'll take a look again after lunch [11:44:19] * isaranto lunch [11:44:34] isaranto: o/ [11:44:38] let me check ... [11:50:30] I have not encountered this on Linux with Python 3.9.2 [11:51:10] are you suing Python 3.11? [11:51:28] **using [11:56:11] good morning, there are runc security updates for https://github.com/opencontainers/runc/security/advisories/GHSA-xr7r-f8xq-vfvv [11:56:24] I'd first upgrade the ML staging nodes? [11:57:01] moritzm: o/ [11:57:03] isaranto: KServe uses ray package <2.5.0 and >=2.4.0. Based on https://pypi.org/project/ray/2.4.0/, this version of the ray package supports Python 3.6 to 3.10. [13:35:15] moritzm: yespls [13:35:49] moritzm: I am currently getting my Internet back in working order (root disk went poof), but I will help as much as I can [13:37:22] ack, staging nodes are upgraded npw [13:37:24] ack, staging nodes are upgraded now [13:37:42] Do I need to restart anything or do the upgrades happen in-place? [13:57:22] existing pods are not affected, we have two options essentially: [13:57:50] 1. let them refresh organiically as deployments/updates are ongoing [13:58:03] 2. drain/undrain the nodes to force refreshes [13:58:56] if most ML deployments are refreshed on an ongoing basis anyway, we can just as well proceed with 1., but 2. should also be reasonably lightweight [14:13:52] Yeah, I'll go witha rolling drain (probably tomorrow) [14:21:43] sounds good, when this has proven to be fine, we can proceed with the main/non-staging nodes [14:21:57] Ack [14:30:13] Morning all [14:33:26] Heyo Chris [14:35:01] hey Chris! [14:41:25] * klausman very late lunch [14:48:11] finally figured out the issue with locust and multiple users. It needs python 3.9 to use the latest locust version [14:48:22] so I wasn't getting the expected results on statbox [14:54:49] well, still doesnt work :( [14:56:32] (03PS2) 10AikoChou: Makefile: add support for revertrisk-multilingual [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/995198 (https://phabricator.wikimedia.org/T356501) [15:02:40] (03CR) 10AikoChou: "I tested the Makefile with python 3.9 but ran into an asyncio/multiprocessing error (https://phabricator.wikimedia.org/P56220). After inve" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/995198 (https://phabricator.wikimedia.org/T356501) (owner: 10AikoChou) [15:05:26] (03CR) 10CI reject: [V: 04-1] Makefile: add support for revertrisk-multilingual [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/995198 (https://phabricator.wikimedia.org/T356501) (owner: 10AikoChou) [15:32:37] (03PS3) 10AikoChou: Makefile: add support for revertrisk-multilingual [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/995198 (https://phabricator.wikimedia.org/T356501) [16:48:19] aiko: I ran the makefile for multilingual and got revertrisk-language-agnostic up and running [16:48:20] lol [16:51:15] 10Lift-Wing, 10Machine-Learning-Team: Debug GPU deployments on ml-staging - https://phabricator.wikimedia.org/T356038 (10isarantopoulos) Using dumb-init with blubber seems to be a challenge as I can't find the equivalent of a docker CMD command. We'd like to do the following: ` ENTRYPOINT ["/usr/bin/dumb-init"... [16:52:40] lol was on my side. For some reason I ran `make run revertrisk-multilingual` so `make run` will run rrla by default [16:53:28] 2nd PEBCAK of the day [16:53:55] earlier some requests for article-descriptions were failing was I was trying "num_beans" instead of "num_beams" [17:06:22] (03CR) 10Ilias Sarantopoulos: "I got the same error with kserve 0.10. I'm trying to check the differences with previous local runs." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/995198 (https://phabricator.wikimedia.org/T356501) (owner: 10AikoChou) [17:07:50] (03CR) 10Ilias Sarantopoulos: Makefile: add support for revertrisk-multilingual (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/995198 (https://phabricator.wikimedia.org/T356501) (owner: 10AikoChou) [17:12:42] Going afk for the afternoon,I know tomorrow will fix all of today's issues :) [17:35:37] kevinbazira: I'm using python 3.9. After cleaning everything it passes that step (resolved after upgrading pip) but I ran into an error on model load . I'll run it again on the morning and will also review properly [17:38:20] isaranto: good to know running with Python 3.6 clears the ray package issue. [17:38:28] please share the stack trace for the model load error whenever you're ready. [17:38:43] enjoy your evening o/ [17:39:15] ** Python 3.9 :)