[08:06:46] Morning everyone. Does one of you managed to install and use the pyenchant library on an Apple Silicon (new M1)? [08:06:57] I am stuck with this error: https://github.com/pyenchant/pyenchant/issues/265#issuecomment-996499863 [08:07:48] and have no idea how to solve it. [08:09:31] Hi SiMaig, all the devs have mostly linux so we have little experience with Apple Silicon. Did you try to install enchant via homebrewi with the x86 workaround? [08:09:44] if so it will not work, the M1 is not x86 [08:10:19] Yes I did and the error changed to this one https://github.com/pyenchant/pyenchant/issues/265#issuecomment-964114762 [08:10:41] what is the original error? [08:11:05] ImportError: The 'enchant' C library was not found and maybe needs to be installed. [08:11:05] I mean if you run brew without the x86 restriction [08:12:51] The install with the "arch -x86_64 /usr/local/bin/brew install enchant" worked but brought the second error complaining about the incompatible architecture. [08:13:44] I'll keep digging and ask around on GitHub if no one works on Mac in this chat ;) [08:14:12] ack, let us know if you manage to solve the issue! [08:14:17] That's a good exercise to learn to use Mac OS (that's my first Mac) [11:32:08] * elukey lunch [15:28:26] o/ [15:31:19] o/ [15:31:38] accraze: good timing, I am deploying the transformer chart change :) [15:32:34] niiiiice [15:36:09] elukey: if you want, you can try adding the articlequality transformer, the newest image should be good to go [15:36:12] https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-articlequality-transformer/tags/ [15:36:54] accraze: ready to go, if you want to send a patch! [15:37:06] (just updated all the revscoring namespaces on both clusters [15:37:18] woot, ack creating patch now [15:43:48] elukey: do i need the `kserve-transformer-override` service account? [15:45:41] CR: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/748147 [15:51:53] nono I don't think so, but lemme check [15:51:59] ah no wait a sec [15:52:14] the account is needed to fetch the credentials for swift [15:52:30] we don't really need it for the transformer right? [15:53:15] even the min replicas value,we can remove it [15:53:17] accraze: --^ [15:53:38] aha ok pushing up another patch, one sec [15:54:47] I just realized that one bit is missing in my previous patch, namely network policies [15:55:52] so in theory, with the transformer the egress gw should not be called by the predictor right? [15:56:23] yes for articlequality that is correct [15:56:40] ok new patch is up [15:57:18] yeah so I'd need to think about it, that part is still related to the predictor only [15:57:49] ahhh i see what you are saying [16:00:22] so the transformer will spin up an entire new pod, with queue-proxy etc.. ? [16:00:33] i believe so [16:03:18] accraze: do we have it deployed on the sandbox? [16:04:27] lemme double check [16:04:27] mmm seems not, can we deploy it in there? [16:04:34] just to verify the pods/containers/etc.. [16:04:38] yeah doing it now [16:04:41] <3 [16:07:17] ok its up checkings pods/containers/etc [16:07:56] yeah its got queue-proxy and kserve-container [16:08:38] perfect [16:08:50] the port to open for the transformer is the same? [16:08:59] i believe so [16:09:36] (i.e. i didnt change anything) [16:12:06] for the moment we can simplify and state that transformers and predictors will listen on the default port, 8080 [16:13:07] that works for me [16:17:52] creating a change [16:28:22] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/748153/ [16:33:30] LGTM +1 [16:41:10] ok new policies deployed, but I think they don't work [16:41:12] sigh [16:43:58] :( [16:46:27] it is weird, the storage initializer seems not able to contact swift [16:49:46] accraze: my soul is in pain https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/748159 [16:54:53] lololol [16:56:01] the burden of yaml engineers [17:00:15] ok it seems working, we have a lot of confused pods that I hope will be deleted [17:02:10] nice! [17:05:08] for some reason there are some pods stuck bootstrapping, hopefully they will be killed [17:08:10] hmmm we may need to manually remove them [17:08:33] elukey@ml-serve-ctrl1001:~$ kubectl get pods -n revscoring-articlequality [17:08:33] NAME READY STATUS RESTARTS AGE [17:08:36] enwiki-articlequality-predictor-default-zswgw-deployment-5kt9rv 2/2 Running 0 9m13s [17:08:39] enwiki-articlequality-transformer-default-5qnqh-deploymentfv94q 2/2 Running 0 26s [17:08:42] all good now [17:08:55] ahahah [17:08:56] 500: Internal Server Error500: Internal Server Error [17:09:28] mwapi.errors.RequestError: Invalid URL 'None/w/api.php': No schema supplied. Perhaps you meant http://None/w/api.php? [17:09:31] [E 211217 17:08:58 web:2243] 500 POST /v1/models/enwiki-articlequality:predict (127.0.0.1) 3.06ms [17:09:38] ah snap accraze, we forgot one variable [17:09:50] the egress gw endpoint [17:10:43] oh on the transformer? [17:11:20] yeah [17:11:37] I think we can have a generic transformer config for article quality [17:11:39] lemme send a patch [17:12:26] go for it! [17:26:31] trying to find a nasty "mapping values are not allowed in this context" that sometimes occurs [17:31:53] ah yes a cryptic error :) [17:40:11] accraze: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/748172 [17:40:14] this is what I meant [17:41:07] ahhhh ok i see [17:42:00] the predictor doesn't need the WIKI_HOST var right? [17:59:04] correct [17:59:34] mwapi.errors.RequestError: Could not find a suitable TLS CA certificate bundle, invalid path: /usr/share/ca-certificates/wikimedia/Puppet_Internal_CA.crt [17:59:47] missing wmf-certificates package in the transformer image? [17:59:53] ohhhhhh [17:59:54] derp [18:00:00] yeah i think so [18:00:15] but progress!! [18:02:55] patch incoming [18:10:45] elukey: this should fix https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/748181 [18:11:41] +1 [18:15:35] need to step afk for a bit, feel free to merge and then to file the deployment-chart change, I'll merge once back :) [18:17:36] ok sounds good :) [19:11:52] accraze: the change needs a rebase :( [19:12:05] ahhh snap hold on [19:23:03] thanks for the merge elukey! [19:29:05] mmmm now I see [19:29:06] File "/opt/lib/python/site-packages/mwapi/session.py", line 126, in _request [19:29:10] raise APIError.from_doc(doc['error']) [19:29:12] mwapi.errors.APIError: badinteger: Invalid value "None" for integer parameter "revids". -- None [19:31:41] if you want the full stacktrace, jump on ml-serve-ctrl1001 and: [19:31:42] kubectl logs enwiki-articlequality-transformer-default-ldnf4-deploymentbht7r -n revscoring-articlequality kserve-container [19:32:40] ah no wait, we need the input like [19:32:53] { "rev_id": 132421 } [19:33:00] okok not the full text [19:33:10] yeah! was just going to say, the transformer takes the rev_id [19:33:21] accraze: it works! [19:33:57] I see logs on both transformer and predictor [19:33:57] [I 211217 19:33:05 web:2243] 200 POST /v1/models/enwiki-articlequality:predict (127.0.0.1) 212.44ms [19:34:07] \o/ [19:35:04] very nice :) [19:36:50] YESSSSS [19:37:11] we got the first transformer running on ml-serve :) :) :) [19:37:55] yessss [19:38:03] great way to end the working year :) [19:38:27] yeah seriously [19:39:39] we will be a great spot to come back to next year :D [19:40:09] * in a great spot [19:40:33] I agree :) [19:40:46] going to log off, have a nice break during the holidays folks! [19:40:55] talk with you (hopefully) in 2022 :) [19:41:13] yes, same to you, enjoy the break elukey! [20:35:20] 10Lift-Wing, 10artificial-intelligence, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Add enwiki-articlequality inference service to LiftWing - https://phabricator.wikimedia.org/T294141 (10ACraze) 05Open→03Resolved a:03ACraze Confirming that we were able to... [20:50:31] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Factor out feature retrieve functionality to a transformer - https://phabricator.wikimedia.org/T294419 (10ACraze) We were able to run the first transformer on ml-serve today. See: T294141 Additionally, we also have another transformer ready to deploy for th...