[05:33:02] Good morning o/ [09:08:02] morning folks o/ [09:18:48] hey Aiko! [09:20:35] 06Machine-Learning-Team, 13Patch-For-Review: Return response time as part of the logo-detection response object - https://phabricator.wikimedia.org/T367962#9912809 (10kevinbazira) When the [[ https://kserve.github.io/website/0.11/modelserving/v1beta1/custom/custom_model/#arguments | --enable_latency_logging ]]... [09:25:25] aiko: when you have a moment -> https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1048028 [09:25:44] I think it would work, I tried it from a pod in experimental ns [09:26:09] (03PS2) 10Kevin Bazira: logo-detection: show latency metrics in debug mode [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1047478 (https://phabricator.wikimedia.org/T367962) [09:26:52] o/ looking at it [09:31:24] isaranto: is the host header necessary for the transparent config? [09:31:47] 06Machine-Learning-Team, 13Patch-For-Review: Return response time as part of the logo-detection response object - https://phabricator.wikimedia.org/T367962#9912825 (10kevinbazira) For end-users who do not have access to KServe logs but would like to see latency metrics, we have added a boolean `debug` flag to... [09:32:27] \o Morning! [09:33:31] o/ [09:34:25] (03CR) 10Kevin Bazira: "> log the measured latencies so that we have them for debugging" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1047478 (https://phabricator.wikimedia.org/T367962) (owner: 10Kevin Bazira) [09:34:38] Well, I had an exciting morning of chasing a potential identity theft/supply chain attack. Turned out to be nothing, or rather a wild coincidence. But that was a tense few hours. [09:36:24] glad it turned out to be nothing :) [09:37:05] aiko: iirc from testing without the host header I don't get a response back. It is the same thing we do in all the services. for example in revertrisk we always set the host header [09:40:22] isaranto: ohhh I thought we needed the host header because we used http://api-ro.discovery.wmnet [09:41:09] lemme recheck [09:43:59] o/ hi Tobias [09:44:44] aiko: you're right we don't need it, it works fine [09:45:03] so I'll change the patch to just include the force_http flag [09:45:18] ack! [09:45:48] need any help with the model? [09:47:35] should be fine. I just started hehe [09:47:48] will let you know if I need any help! [09:55:25] you reminded me that we should probably remove this part https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/revert_risk_model/model_server/base_model.py#97 [09:55:29] when we move rr to the transparent config [09:58:16] (03PS4) 10Ilias Sarantopoulos: articlequality: add force_http option [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048028 (https://phabricator.wikimedia.org/T360455) [09:58:33] ack! [09:59:42] (03PS5) 10Ilias Sarantopoulos: articlequality: add force_http option [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048028 (https://phabricator.wikimedia.org/T360455) [10:00:11] I updated the patch to just have the force_http flag [10:00:48] isaranto: I am trying to do some tests on the GPU hosts. Do you have a sample query for the llama3 isvc? [10:01:00] (03CR) 10AikoChou: [C:03+1] articlequality: add force_http option [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048028 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos) [10:01:31] klausman: sure, here is an example https://phabricator.wikimedia.org/T354870#9906050 [10:01:44] merci! [10:02:01] De rien! [10:05:36] (03CR) 10Ilias Sarantopoulos: [C:03+1] "Done" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1047478 (https://phabricator.wikimedia.org/T367962) (owner: 10Kevin Bazira) [10:05:46] (03PS6) 10Ilias Sarantopoulos: articlequality: add force_http option [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048028 (https://phabricator.wikimedia.org/T360455) [10:09:18] (03CR) 10Ilias Sarantopoulos: [C:03+2] articlequality: add force_http option [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048028 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos) [10:10:04] (03Merged) 10jenkins-bot: articlequality: add force_http option [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048028 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos) [10:11:42] (03PS3) 10Kevin Bazira: logo-detection: show latency metrics in debug mode [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1047478 (https://phabricator.wikimedia.org/T367962) [10:12:37] (03CR) 10Kevin Bazira: [C:03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1047478 (https://phabricator.wikimedia.org/T367962) (owner: 10Kevin Bazira) [10:13:21] (03Merged) 10jenkins-bot: logo-detection: show latency metrics in debug mode [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1047478 (https://phabricator.wikimedia.org/T367962) (owner: 10Kevin Bazira) [10:14:07] aiko: can I test the new image in experimental/articlequality or are you working on it? [10:31:35] (03PS1) 10Ilias Sarantopoulos: articlequality: add FORCE_HTTP env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048401 (https://phabricator.wikimedia.org/T360455) [10:31:44] lol I forgot to add the env var :) [10:49:06] * isaranto afk - lunch [13:06:12] whenever someone has time I'd like a review here so I can test it https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1048401. thanks! [13:26:21] 06Machine-Learning-Team, 10Observability-Metrics, 10SRE Observability (FY2023/2024-Q4): Gap in metrics rendered from Thanos Rules - https://phabricator.wikimedia.org/T352756#9913414 (10elukey) @herron something interesting: {F55528046} {F55528045} {F55528047} I checked multiple recording rules, and the... [13:26:58] (03CR) 10AikoChou: [C:03+1] articlequality: add FORCE_HTTP env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048401 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos) [13:27:16] Danke! [13:27:39] (03CR) 10Ilias Sarantopoulos: [C:03+2] articlequality: add FORCE_HTTP env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048401 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos) [13:27:40] isaranto: no problem! I'm not testing articlequality [13:27:56] ok I'll test it once the image is created [13:28:26] (03Merged) 10jenkins-bot: articlequality: add FORCE_HTTP env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048401 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos) [13:41:24] Good morning all [13:44:36] good morning Chris! [13:44:44] aiko: ok I tested it and it works :) [14:17:42] nice!! :) [14:17:52] o/ hi Chris [15:08:26] (03PS1) 10AikoChou: articlequality: add predict [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048487 (https://phabricator.wikimedia.org/T360455) [15:20:19] ---^ model predict [15:20:23] currently only returns the predicted value [15:22:56] don't know why I got an error from fastapi when trying to return a dict. need to investigate a bit [15:23:12] https://phabricator.wikimedia.org/P65320 [15:25:03] but returning a value works without issues [15:25:35] (03CR) 10Ilias Sarantopoulos: [C:03+1] "I left a suggestion but it LGTM anyway!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048487 (https://phabricator.wikimedia.org/T360455) (owner: 10AikoChou) [15:31:36] nice aiko! [15:32:39] I wrote a summary of today's struggles with ROCm/vllm https://phabricator.wikimedia.org/T354870#9913769. I'm trying to find: [15:32:40] a) a way to do it [15:32:40] b) a sustainable way to do it (being quite easy to build/update) [15:33:45] aiko: if you try to test the articlequality model server make sure to merge this first https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1048455 [15:34:31] I have manually changed it in experimental ns so you won't have an issue (but if you delete the isvc and reapply it you will) [15:35:07] going afk for the day folks, enjoy the weekend o/ [15:40:14] ack! bye Ilias, have a nice weekend :) [15:45:48] thanks for the write up isaranto! Night! [15:52:21] (03CR) 10AikoChou: articlequality: add predict (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048487 (https://phabricator.wikimedia.org/T360455) (owner: 10AikoChou) [19:25:36] (03PS2) 10AikoChou: articlequality: add predict [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048487 (https://phabricator.wikimedia.org/T360455) [19:34:59] (03CR) 10AikoChou: [C:03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048487 (https://phabricator.wikimedia.org/T360455) (owner: 10AikoChou) [19:35:45] (03Merged) 10jenkins-bot: articlequality: add predict [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1048487 (https://phabricator.wikimedia.org/T360455) (owner: 10AikoChou)