[06:31:12] Good morning [06:59:09] morning! [07:38:32] morning /o [07:44:57] (03Abandoned) 10Gkyziridis: inference-services: edit-check locust tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138409 (https://phabricator.wikimedia.org/T392460) (owner: 10Gkyziridis) [07:57:28] 06Machine-Learning-Team, 13Patch-For-Review: [FIX]: Edit-check peacock detection locust tests - https://phabricator.wikimedia.org/T392460#10763797 (10gkyziridis) **Updated Locust Tests for Edit-check** This version of loading tests for edit-check includes the statistics for malformed instances as well. {P75406} [08:18:37] 06Machine-Learning-Team, 13Patch-For-Review: [FIX]: Edit-check peacock detection locust tests - https://phabricator.wikimedia.org/T392460#10763850 (10isarantopoulos) @gkyziridis Is this running on localhost? I'm asking to ensure there’s no misunderstanding that this represents the actual service and the repor... [08:24:37] (03PS1) 10Gkyziridis: inference-services: edit-check locust tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138678 (https://phabricator.wikimedia.org/T392460) [08:28:57] (03PS2) 10Gkyziridis: inference-services: edit-check locust tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138678 (https://phabricator.wikimedia.org/T392460) [08:30:36] 06Machine-Learning-Team, 13Patch-For-Review: [FIX]: Edit-check peacock detection locust tests - https://phabricator.wikimedia.org/T392460#10763894 (10gkyziridis) >>! In T392460#10763850, @isarantopoulos wrote: > @gkyziridis Is this running on localhost? I'm asking to ensure there’s no misunderstanding that th... [09:36:10] 06Machine-Learning-Team, 13Patch-For-Review: [FIX]: Edit-check peacock detection locust tests - https://phabricator.wikimedia.org/T392460#10764081 (10isarantopoulos) I would suggest trying to parse the response so that we use locust's reporting system properly. I think that something like this would work : [[... [09:37:26] 06Machine-Learning-Team, 13Patch-For-Review: [FIX]: Edit-check peacock detection locust tests - https://phabricator.wikimedia.org/T392460#10764084 (10gkyziridis) **Edit-Check Locust Tests Updated** 200 users at rate of 200 per sec. ` [2025-04-24 09:28:12,262] stat1010/INFO/locust.main: Starting Locust 2.33.2 [... [09:41:31] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10764091 (10OKarakaya-WMF) curl -s localhost:8080/v1/models/articlequality:predict -X POST -d '{"instances": [{"rev_id": 12345, "lang": "en"}, {"rev_id": 128... [11:14:41] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10764393 (10OKarakaya-WMF) I share local locust test results from the prod model and new model below. Note that these scores are from my local, and I'll run... [11:29:58] (03PS5) 10Ozge: feat: updates article quality model with new model. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138337 [11:30:47] (03PS6) 10Ozge: feat: updates article quality model with new model. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138337 [12:02:45] (03PS7) 10Ozge: feat: updates article quality model with new model. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138337 [12:12:07] Hello, following PR is ready for review; https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1138337 . We have an ongoing discussions here but feel free to review https://phabricator.wikimedia.org/T391679 . I’ve also added a single score recently based on the analysis in gitlab PR. I’ve shared local locust test results in the task. Thank you! [12:14:22] (03PS8) 10Ozge: feat: updates article quality model with new model. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138337 [12:15:20] (03PS9) 10Ozge: feat: updates article quality model with new model. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138337 [12:24:05] (03PS3) 10Gkyziridis: inference-services: edit-check locust tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138678 (https://phabricator.wikimedia.org/T392460) [12:36:30] ozge_: if you edit your comment in https://phabricator.wikimedia.org/T391679#10764393 and add the locust results in a code block using triple backtick (```) before and after it will be more readable [12:37:01] (03PS4) 10Gkyziridis: inference-services: edit-check locust tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138678 (https://phabricator.wikimedia.org/T392460) [12:39:59] @isaranto: updated the locust comment. Thank you! [12:40:23] I'm writing a comment on the phabricator about the model etc [12:40:37] very interesting discussion! [12:52:38] (03PS5) 10Gkyziridis: inference-services: edit-check locust tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138678 (https://phabricator.wikimedia.org/T392460) [13:02:36] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10764738 (10isarantopoulos) Hi! @FNavas-foundation tagging you if you can give us a little more information on how the language agnostic article quality mod... [13:04:20] ozge_: I added my thoughts on phabricator. Would it be easy for you to create an additional model (perhaps one that inherits from the existing one) instead of changing everything on the existing one? [13:04:58] I know you have already done great work to change the current service, but I'm thinking it is a way to unblock this and deliver the model in staging and test it instead of getting stuck in the conversation [13:07:35] georgekyz: is the above patch ready for review or shall I wait before testing it? I saw you changed completely the way you parse the response -- seems great btw :D [13:22:36] @isaranto: do you mean we keep the existing ordered model and normalization? [13:23:20] And re-train it maybe with the new features [13:29:36] isaranto: [13:29:36] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10764821 (10OKarakaya-WMF) Just to make the prediction output clear, we return following sample output. We still have label and score, but we lose the norma... [13:29:55] isaranto: Yeap it's ready for review [13:33:51] ozge_: I mean to create a second service side by side with the existing one. keep the Ordered model as is and deploy also the catboost one [13:34:10] jumping in a meeting -- lemme know if anything needs clarification [13:39:32] 06Machine-Learning-Team, 13Patch-For-Review: [FIX]: Edit-check peacock detection locust tests - https://phabricator.wikimedia.org/T392460#10764847 (10gkyziridis) >>! In T392460#10764081, @isarantopoulos wrote: > I would suggest trying to parse the response so that we use locust's reporting system properly. > I... [14:30:19] Sure, clear! @isaranto I think it should be easy to create a new service in deployment charts repo. Should we also keep the existing implementation in the inference service? so create a new folder in inference service/models/articlequalitygb with blubber.yaml, makefile etc. [14:38:25] Or would it be enough to keep the patch in the inference service as it’s and create the new service in the deployment charts repo? [15:01:45] hmm , I didn't really think this through -- in a meeting will respond later [15:02:04] I didnt think about the issues with numpy etc [15:49:09] (03PS6) 10Gkyziridis: inference-services: edit-check locust tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1138678 (https://phabricator.wikimedia.org/T392460) [16:01:42] re:articlequality -- so if we try to add another model for example defining another class etc but using the same image we have the numpy issue (using a kserve fork that allows us to install numoy <2..0.0 which is required by statsmodels) [16:04:05] ozge_: if we update statsmodels to a newer version could we use the same image and kserve 0.15? [16:05:21] no need to respond now , we can discuss it tomorrow! [16:06:55] * isaranto afk [17:48:49] https://www.irccloud.com/pastebin/Z90fZ8WC [17:50:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [17:50:49] Deployment reference-need-predictor-00010-deployment in revision-models at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revision-models&var-deployment=reference-need-predictor-00010-deployment - ... [17:50:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [17:50:51] Hey @isaranto , I’ve tested newest version of statsmodels with kserve 0.15 and I get the same error: raise ValueError(str(bit_generator_name) + ' is not a known ' [17:50:51] ValueError: is not a known BitGenerator module. [18:08:24] I think we can keep kserve_repository for now. One option could be to serve both models in the same endpoint and introduce a new variable in the request model_version. So if no version is specified we can use the old model and if version in the input is set to 2 we can return results from the new model. Each instance in the request can have a model_version. Because we also want to enable request batching. Current [18:08:24] implementation allows one input in the request. The output for each instance will be slightly different based on the version. Alternatively, we can check how we create new endpoint versions v2/models/articlequality:predict or v1/models/articlequality:predictv2 etc. Let’s discuss tomorrow if we can find an easier solution [18:56:23] (03PS1) 10Umherirrender: Use namespaced classes [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1138888 [21:13:53] (03CR) 10Reedy: [C:03+2] Use namespaced classes [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1138888 (owner: 10Umherirrender) [21:45:33] (03Merged) 10jenkins-bot: Use namespaced classes [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1138888 (owner: 10Umherirrender) [21:50:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [21:50:49] Deployment reference-need-predictor-00010-deployment in revision-models at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revision-models&var-deployment=reference-need-predictor-00010-deployment - ... [21:50:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [23:10:49] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [23:10:49] Deployment reference-need-predictor-00010-deployment in revision-models at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revision-models&var-deployment=reference-need-predictor-00010-deployment - ... [23:10:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas