[07:21:52] one more of these: I'm temporarily switching ml-etcd2003 to DRBD to allow moving it to another host for the Ganeti bullseye update, latency will increase for a bit [07:54:55] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Upload articletopic model binaries to storage - https://phabricator.wikimedia.org/T313305 (10kevinbazira) a:03kevinbazira [07:57:31] hello folks :) [08:10:06] Mornin' [08:10:15] Whi is 28C at 10 am. Do not want. [08:10:48] :) [08:18:58] https://github.com/kserve/kserve/releases/download/v0.8.0/kserve.yaml - 15k lines of yaml [08:19:01] * elukey cries [08:40:33] interesting, a new step https://kserve.github.io/website/0.8/admin/serverless/#5-install-kserve-built-in-clusterservingruntimes [08:41:56] we probably don't need it for the moment since we use custom images [08:46:27] I wonder what's in there [08:46:57] Ah, stuff like sklearnserver [08:59:41] in theory we don't need those, they have special kserve images that we haven't ported yet etc.. [08:59:57] so I am inclined not to add them for the moment (I'll add a note in the chart's README though) [09:03:10] wdyt? [09:08:29] sounds good to me [09:10:26] super, the initial change is https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/815691, now I need to test it :D [09:10:55] but first we probably need to have the new inference service images, or at least revscoring working with the new numpy [09:11:11] I've done some tests today and I think there is no problem in bumping the dep [09:11:25] As we suspected [09:18:10] we have a couple of changes pending for revscoring, we can also bump the numpy dep and release 2.11.5 [09:18:26] after that in theory we should be able to build with kserve 0,8 [09:18:57] :+1: [10:07:04] ok so https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/815691 passes CI tests now :D [10:07:10] and the diff looks reasonable [10:07:43] aiko: o/ I sent an email to Aaron to see if we can get the mediawiki-utilities pypi account, to push the mwapi package (once ready) [10:07:56] revscoring uses another account as maintainer, so we can't push for now :( [10:07:59] I'll have a look at the yaml change [10:08:39] klausman: ack thanks, also added some notes about what I did in README [10:13:20] +1'd. [10:14:09] thanks! [10:14:58] kevinbazira: o/ if/when you have time (no rush), lemme know if you like https://github.com/wikimedia/revscoring/pull/522 [12:19:00] elukey: o/ lgtm too. [12:31:46] 10Lift-Wing, 10artificial-intelligence, 10Machine-Learning-Team (Active Tasks): Upload articletopic model binaries to storage - https://phabricator.wikimedia.org/T313305 (10kevinbazira) 11/11 articletopic models were uploaded successfully to Thanos Swift. Here are their storage uris: s3://wmf-ml-models/art... [13:19:01] kevinbazira: thanks! [13:27:55] created another pull request for revscoring to bump up numpy https://github.com/wikimedia/revscoring/pull/523 [13:28:19] elukey: I'll get to the numpy change in a Milan Minute [13:28:48] :D [13:29:28] LGTM'd [13:29:49] seconds before you send more change :D [13:30:08] ah snap sorry I changed only the commit msg! [13:31:01] Re-approved [13:33:11] merged thanks :) [13:37:32] aaand also https://github.com/wikimedia/revscoring/pull/524 to release 2.11.5 :) [13:40:53] Approved!@ [13:43:41] <3 [13:49:41] (03PS1) 10Elukey: Update Python model servers and requirements to KServe 0.8 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/815721 (https://phabricator.wikimedia.org/T311982) [13:51:02] (03CR) 10Elukey: "This needs to be tested locally first :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/815721 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [13:55:15] (03CR) 10CI reject: [V: 04-1] Update Python model servers and requirements to KServe 0.8 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/815721 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [14:09:21] published the new revscoring 2.11.5 [14:09:36] (03CR) 10Elukey: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/815721 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [14:11:01] I am thinking that the upgrade of kserve can be done in two steps [14:11:10] 1) upgrade python code and docker images [14:11:18] 2) upgrade the control plane in k8s [14:11:26] in theory they should be separate things [14:11:42] (I started to think about this after playing with docker and kserve) [14:15:03] (03PS2) 10Elukey: Update Python model servers and requirements to KServe 0.8 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/815721 (https://phabricator.wikimedia.org/T311982) [14:19:50] (03CR) 10CI reject: [V: 04-1] Update Python model servers and requirements to KServe 0.8 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/815721 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [14:22:56] ahhh what a lovely dependency resolution hell [14:45:07] elukey: one owrry I have re: upgrades is that those new packages would be "vsisible" in production, right? [14:45:26] asin: if we had to redeploy, it would pick up the latest images [14:45:34] Or am I thinking wrong? [14:47:41] klausman: we should explicitly set the new docker image version in deployment-charts' helmfile config [14:47:50] Ack [14:48:25] but yeah I have the same concern, I'd like to merge only when we are relatively confident that the images work, otherwise we block subsequent changes etc.. [14:50:49] the other alternative is to proceed one by one, namely one model type at the time [14:51:26] I think we can go either way, as long as there is a way back [14:52:01] ah yes it is sufficient to revert and merge, then new docker images will be created [14:52:24] the idea would be to use staging and only move to production if things look fine in there [14:53:35] sgtm [15:00:55] (03PS3) 10Elukey: Update Python model servers and requirements to KServe 0.8 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/815721 (https://phabricator.wikimedia.org/T311982) [15:11:01] (03CR) 10CI reject: [V: 04-1] Update Python model servers and requirements to KServe 0.8 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/815721 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [15:24:55] (03PS4) 10Elukey: Update Python model servers and requirements to KServe 0.8 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/815721 (https://phabricator.wikimedia.org/T311982) [15:33:46] (03CR) 10CI reject: [V: 04-1] Update Python model servers and requirements to KServe 0.8 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/815721 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [16:09:27] going afk for today, have a nice rest of the day!