[09:19:21] aiko: o/ [09:19:48] do you recall what was the github issue with workers==n in kserve? The one resolved only in 0.8 [09:22:12] is it https://github.com/kserve/kserve/issues/1759 ? [09:23:43] upstream is also tagging 0.9 [09:23:44] https://github.com/kserve/kserve/releases/tag/v0.9.0-rc0 [09:27:41] ouch https://github.com/kserve/kserve/pull/1969 this is not great [09:27:57] in kserve 0.8 there seem to be a tight dependency with knative 1.0, and we have 0.18 [09:32:35] I left a comment to ask if we can upgrade or not [09:34:24] \o [09:34:39] elukey: in hieradata/role/common/ml_k8s/worker/staging.yaml we currently have: [09:34:45] profile::lvs::realserver::pools: [09:34:47] inference: {} [09:34:56] I _think_ that should be inference-staging, no? [09:35:03] elukey: I recall this one https://github.com/kserve/kserve/pull/1984 [09:37:00] klausman: o/ yep I see that we have inference.svc.codfw.wmnet's ip address in the ml-staging2001's loopback [09:37:10] :+1: [09:37:21] switching to staging should change it so that the lvs machinery works [09:37:32] Roger [09:39:18] aiko: ack thanks, I see it listed in https://github.com/kserve/kserve/releases/tag/v0.8.0 [09:41:21] I am not quite sure if change 805329 hould use lvs_setup vs service_setup. [09:41:34] So I went with the latter as a more conservative approach :) [09:44:16] yep yep service setup seems good [09:44:19] the only doubt that I have is [09:44:20] check_tcp_ssl!inference-staging.discovery.wmnet!30443 [09:44:46] elukey: btw, did you attend yesterday's SRE Staff Meeting? If not, I can give youa tl;dr [09:45:36] nope I did not :( [09:47:23] The most interesting part was the talk at the end: in the future, the service catalog will be used to automagically generate config for Prometheus to do checks via the blackbox exporter. The upside being that aggregation of alerts etc (and how to make silences that the cookbooks can't do) a lot easier. Plus more frequent checks, and more resilience/adaptability for external breakage (e.g. [09:47:24] Internet storm would not cause alerts since Prometheus knows about locality of services) [09:48:27] ah all the work that Filippo is doing! [09:53:33] I did mention during the meeting that Prom silences are a bit dangerous in that you can easily and accidentally make them too wide. But that is a risk I will absolutely accept in the quest to get away from Icinga [10:05:34] yep [10:15:24] need to go to a doctor appt, ttl! [10:20:40] <- lunch :) [12:27:00] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Test Ray worker in Kserve - https://phabricator.wikimedia.org/T309624 (10achou) Found out there is another way to pass arguments to model constructor which is `.options()` :) https://github.com/ray-project/ray/blob/f597e21ac8490b2344d66bbe2eceaa4d8210494c/py... [13:38:59] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Test async preprocess on kserve - https://phabricator.wikimedia.org/T309623 (10elukey) >>! In T309623#7998640, @elukey wrote: > One clarification - let's try to time box the amount of time to spend on trying to run revscoring in async mode. I am a little ske... [13:42:12] (03PS1) 10Kevin Bazira: editquality: refactor setting of the HTTP host header into its own method [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/805388 (https://phabricator.wikimedia.org/T309623) [13:43:36] Morning all! [14:43:31] 10Lift-Wing, 10artificial-intelligence, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Migrate articlequality models - https://phabricator.wikimedia.org/T307416 (10calbon) [14:43:33] 10Lift-Wing, 10artificial-intelligence, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Upload articlequality model binaries to storage - https://phabricator.wikimedia.org/T307417 (10calbon) 05Open→03Resolved [14:43:38] 10Lift-Wing, 10artificial-intelligence, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Migrate articlequality models - https://phabricator.wikimedia.org/T307416 (10calbon) [14:43:58] 10Lift-Wing, 10artificial-intelligence, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks): Migrate articlequality models - https://phabricator.wikimedia.org/T307416 (10calbon) 05Open→03Resolved [14:44:14] 10Lift-Wing, 10artificial-intelligence, 10articlequality-modeling, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Create articlequality inference services - https://phabricator.wikimedia.org/T307418 (10calbon) 05Open→03Resolved [14:51:38] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks): Lift Wing proof of concept - https://phabricator.wikimedia.org/T272917 (10achou) [14:52:02] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): API Gateway Integration - https://phabricator.wikimedia.org/T288789 (10achou) 05Open→03Resolved [15:01:24] * elukey brb [15:13:55] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks): Lift Wing proof of concept - https://phabricator.wikimedia.org/T272917 (10calbon) [15:14:20] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): API Gateway Integration - https://phabricator.wikimedia.org/T288789 (10calbon) 05Resolved→03Open p:05Triage→03Unbreak! [15:56:10] (03CR) 10Elukey: editquality: refactor setting of the HTTP host header into its own method (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/805388 (https://phabricator.wikimedia.org/T309623) (owner: 10Kevin Bazira) [16:26:55] * elukey afk! [16:36:29] ditto