[06:07:36] <georgekyz>	 good morning
[06:23:29] <ozge_>	 good morning
[06:57:41] <isaranto>	 hi folks!
[07:26:57] <wikibugs>	 06Machine-Learning-Team, 06Data-Persistence, 06Growth-Team, 10Improve-Tone-Structured-Task, 07OKR-Work: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11134981 (10achou) I like how @Ottomata separates this into data model and update...
[08:12:49] <ozge_>	 Hello, I've an MR in airflow. https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1643 It removes two wikis from the list. I explained the reasons in the MR's description. We have successfully created models for all other wikis   (314 wikis in total excluding enwiki). 🚀 . Can you take a look when you have time? @kevinbazira 
[08:13:57] <kevinbazira>	 ozge_: o/ 314 wikis 🎉
[08:14:03] <kevinbazira>	 looking ...
[08:29:35] <ozge_>	 🙌
[08:31:24] <isaranto>	 wow!
[08:37:22] <elukey>	 isaranto: o/ I was chatting the other week in here about https://phabricator.wikimedia.org/T390706
[08:37:53] <elukey>	 I think that some work needs to be done on either the SLO target or on the service itself, it depends what you wanna do :)
[08:54:51] <isaranto>	 o/ elukey. thanks for the ping. We didn't really figure out what the issue was on the 24th of July that caused the big drop. But regardless of that I agree that we should revisit the target for latency. Looking at the dashboard https://grafana.wikimedia.org/goto/i0u96M9Hg?orgId=1 the error_rates show us how often the target is violated, right?
[08:54:53] <georgekyz>	 elukey: Hey, we need to discuss about the SLO as a team and decide (probably) to loosen the target. We will update you when we have a decision. Thank you for bringing this up.
[08:55:35] <elukey>	 isaranto: yep exactly
[08:55:38] <isaranto>	 if this is the case the error_rates give us a more realistic target for the latency target
[08:55:48] <elukey>	 georgekyz: ack! 
[08:56:08] <elukey>	 it may be that getting live traffic made the model show its real performances, compared to the other tests done before
[08:56:19] <elukey>	 this is a good way to iterate over an SLO target etc.
[08:57:03] <elukey>	 so at the moment I am only asking to follow up to mark everything as approved before end of quarter, nothing else :)
[08:57:25] <elukey>	 if the new performance data suggests a new SLO target, let's redo it, nothing really problematic
[09:06:21] <isaranto>	 elukey: I think in our case it makes more sense to tweak the required latency (e.g. to 2-3seconds) and keep the 90% target steady (or maybe even change both). 
[09:06:51] <isaranto>	 georgekyz: I think this dashboard will help us understand the actual performance https://grafana.wikimedia.org/goto/IjksR79NR?orgId=1
[09:10:12] <isaranto>	 hmm I wonder why the kserve latency is telling a totally different story https://grafana.wikimedia.org/goto/Gj8_g79Ng?orgId=1
[09:12:38] <elukey>	 there is also the istio sidecar view https://grafana.wikimedia.org/d/zsdYRV7Vk/istio-sidecar?orgId=1&from=now-12h&to=now&timezone=utc&var-cluster=aWotKxQMz&var-namespace=edit-check&var-backend=$__all&var-response_code=$__all&var-quantile=0.5&var-quantile=0.95&var-quantile=0.99
[09:12:46] <elukey>	 that seems inline with the gateway one
[09:14:20] <isaranto>	 georgekyz: we should investigate this --^
[09:15:19] <isaranto>	 in any case the istio dashboards tell the real story. The kserve dashboard just measures the metrics as they are emitted from the service
[09:22:48] <isaranto>	 but the question remains: where is all that time in between spent ¯\\(ツ)//¯ ?
[09:23:59] <isaranto>	 georgekyz: I think this task perfectly aligns with your Ops duty this week <3
[09:24:09] <georgekyz>	 Yeap I am on it 
[09:24:39] <georgekyz>	 isaranto: Should I arrange a meeting for the latency targets ? 
[09:24:44] <isaranto>	 elukey: if you have any clues on this discrepancy plz drop a hint!
[09:25:17] <georgekyz>	 isaranto: Do you have any idea why the kserve dashboard show different things from istio?
[09:25:42] <isaranto>	 georgekyz: I don't think we need a meeting atm. We'll need to investigate the latencies and then communicate with the editing team
[09:26:06] <isaranto>	 georgekyz: I have no idea. this is what I suggested that you'd investigate
[09:26:28] <isaranto>	 I'm opening a task about it
[09:29:31] <elukey>	 isaranto: does the model use pre-process at all? Because its latency is really constant/low
[09:29:41] <elukey>	 usually that is the expensive part
[09:30:20] <georgekyz>	 elukey: Yes the model has a pre-process step 
[09:30:52] <georgekyz>	 But the preprocess step is not a ver heavy operation in this case
[09:31:09] <elukey>	 okok
[09:31:25] <elukey>	 the other thing is that in the istio dashboard the p50 use case is zero, not sure why
[09:32:08] <isaranto>	 yeah all the pre-process is some manipulation to bring it to the proper format + some validation
[09:37:19] <wikibugs>	 (03CR) 10Nik Gkountas: Allows filtering suggestions based on size (032 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1183155 (owner: 10Sbisson)
[09:50:56] <wikibugs>	 06Machine-Learning-Team: Review Tone Check Latency SLO and its targets - https://phabricator.wikimedia.org/T403378 (10isarantopoulos) 03NEW
[09:52:24] <wikibugs>	 06Machine-Learning-Team: Review Tone Check Latency SLO and its targets - https://phabricator.wikimedia.org/T403378#11135601 (10isarantopoulos)
[09:52:45] <georgekyz>	 isaranto: Thnx for creating the ticket
[11:54:25] <wikibugs>	 (03PS3) 10Sbisson: Allows filtering suggestions based on size [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1183155
[11:54:53] <wikibugs>	 (03CR) 10Sbisson: Allows filtering suggestions based on size (032 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1183155 (owner: 10Sbisson)
[13:43:14] <wikibugs>	 (03PS1) 10Gkyziridis: edit-check: Update locust tests. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1183684 (https://phabricator.wikimedia.org/T400460)
[13:49:11] <wikibugs>	 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Apply the Tone Check model to published articles, to learn whether we can build a pool of high-quality structured tasks for new editors - https://phabricator.wikimedia.org/T392283#11136518 (10achou) @Ottomata Thanks for the input! <3  First I wa...
[14:09:50] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 07Essential-Work, 13Patch-For-Review: Fix locust load test for edit-check - https://phabricator.wikimedia.org/T400460#11136583 (10gkyziridis) ==Update==  Locust tests for edit-check:      - Hit model on staging under `edit-check` namespace   - Use the latest request sc...
[14:16:10] <isaranto>	 klausman: o/ could you help georgekyz figure out the discrepancy between istio & kserve container latencies as described here https://phabricator.wikimedia.org/T403378?
[14:16:23] <klausman>	 taking a look
[14:16:27] <isaranto>	 we're trying to review the SLO targets
[14:16:29] <isaranto>	 thanks!
[14:27:09] <klausman>	 To clarify: the question is why kserve reports unrealistically low latencies when compared to the more realistic-looking Istio numbers?
[14:38:10] <georgekyz>	 yes 
[14:53:55] <klausman>	 Well, looking at the dashboard definition linked in the ticket, I notice the metric used is `request_preprocess_seconds_bucket`
[14:54:17] <klausman>	 So that is only the preprocess time, not end-to=end
[14:54:22] <klausman>	 s/=/-/
[14:55:01] <klausman>	 ah, the predict metrics were folded away %-)
[14:56:01] <klausman>	 One thing I don't know about these metrics is whether any of them contains the latency of our mediawiki API backend calls.
[15:01:29] <isaranto>	 this model doesnt have any mwapi calls. it performs inference on the input passed in the request
[15:01:42] <klausman>	 Right.
[15:02:06] <klausman>	 So we should normally be somewhere <300ms typically (preprocess+inference).
[15:02:26] <klausman>	 But we're well over 1000ms often enough that the SLO is toast
[15:03:19] <georgekyz>	 yeap
[15:04:00] <georgekyz>	 is anybody running any tests for edit-check right now on staging ?
[15:04:07] <klausman>	 let me see if I can find metrics about what Istio is doing. My gut tells me that istio is slow, not the inference service.
[15:04:31] * isaranto nods
[15:04:46] <georgekyz>	 right now the 90 percentile is around 10s
[15:04:46] * isaranto afk back in 1h
[15:04:50] <georgekyz>	 without running anything 
[15:04:59] <georgekyz>	 https://grafana-rw.wikimedia.org/d/G7yj84Vnk/istio?forceLogin=true&from=now-1h&orgId=1&refresh=30s&timezone=utc&to=now&var-backend=$__all&var-cluster=D-2kXvZnk&var-namespace=edit-check&var-quantile=0.90&var-response_code=$__all&viewPanel=panel-3-clone-0
[15:05:28] <elukey>	 georgekyz: could you put in the task an example of curl http call for the service? I'd like to test one thing
[15:05:29] <klausman>	 Another option is try and dig into logstash, see if Istio logs sth like a timeout or somesuch
[15:05:40] <elukey>	 also what do you mean "without running anything" ?
[15:05:50] <elukey>	 isn't edit check already used by some wikis?
[15:09:53] <wikibugs>	 06Machine-Learning-Team: Review Tone Check Latency SLO and its targets - https://phabricator.wikimedia.org/T403378#11136758 (10gkyziridis) Hey @klausman, you can hit the model on staging via: `lang=Shell curl -s -X \ POST "https://inference-staging.svc.codfw.wmnet:30443/v1/models/edit-check:predict" \ -H "Host:...
[15:12:45] <georgekyz>	 elukey: yes it is deployed and it is used for wikis, but what is strange is that when I was running the tests the latency was around 700ms (which is pretty good and normal) but suddenly you can see a spike going to 9seconds for 5 mins
[15:13:21] <georgekyz>	 I am talking only for staging.
[15:15:30] <elukey>	 so for staging, I can see some cpu usage spikes https://grafana.wikimedia.org/d/hyl18XgMk/kubernetes-container-details?orgId=1&var-datasource=000000026&var-site=codfw&var-cluster=k8s-mlstaging&var-namespace=edit-check&var-pod=edit-check-predictor-00007-deployment-5cfcc7476-mcl8q&var-container=$__all&from=now-6h&to=now&timezone=utc
[15:15:45] <elukey>	 that ended up with a tiny throttling, nothing dramatic
[15:17:01] <elukey>	 the other thing to remember is that kserve uses fastapi, that is based on a single thread ioloop
[15:17:32] <elukey>	 so when the predict starts using cpu for a bit, the rest is stalled (including accepting new connections etc..)
[15:19:05] <elukey>	 so I am wondering this - 700ms of latency for a single call is indeed great, but in our setup (not multi process) it will block all the other calls. Maybe kserve registers the relatively fast predict time, but istio gets queued up
[15:19:50] <klausman>	 That's a good hypothesis. I've tried to find an istio metric for "queue length" or similar, but have found nothing
[15:20:41] <elukey>	 there is also the knative queue proxy container in the middle to consider, but it hasn't given us problems so far (or better, I don't remember any)
[15:21:47] <elukey>	 georgekyz: the curl call that you provided is very fast, always below 100ms, but you mentioned 700ms - are there more expensive calls that you know of?
[15:21:59] <elukey>	 because with those it may be pretty easy  to trigger the queueing, if any
[15:22:45] <georgekyz>	 elukey: 700ms was shown in graphana for the 90 precentile latency during my locust tests (which took around 3 mins)
[15:24:43] <georgekyz>	 elukey: The example I posted is very very easy and really small due to the low number of tokens in the 'original_text' and 'modified_test'. When I am running the load_tests I am using a higher number of tokens between 15 - 90. 
[15:32:43] <elukey>	 ack perfect
[15:33:05] <elukey>	 another thing that I noticed is that the kubernetes pod/containers details dashboard shows a lot of short lived pods
[15:33:20] <elukey>	 and the autoscaler confirms it https://grafana.wikimedia.org/d/c6GYmqdnz/knative-serving?orgId=1&from=now-24h&to=now&timezone=utc&var-cluster=aWotKxQMz&var-knative_namespace=knative-serving&var-revisions_namespace=edit-check&viewPanel=panel-24
[15:33:33] <elukey>	 I am also wondering if the stalls are due to istio waiting for a pod to come up
[15:34:37] <elukey>	 as a test, would it be ok to leave 3 pods running as baseline? Rather than just one
[15:35:34] <georgekyz>	 yeah sure that seems nice
[15:36:12] <elukey>	 or maybe tune the autoscaling to be less sensitive, if the scale ups are not needed
[15:36:59] <klausman>	 I'd go with the extra pods first, that should make for a really good signal. Whereas the autoscaler is always a bit fiddly in my experience
[15:37:14] <elukey>	 +1 makes sense
[15:37:38] <elukey>	 the main issue with these pods is that they may take a while to spin up, and I think requests are queued up until the pod is ready
[15:38:26] <elukey>	 (going afk, I'll check tomorrow, have a good rest of the day folks!)
[15:38:44] <klausman>	 \o
[15:38:53] <georgekyz>	 To be honest... I have a strong feeling that the autoscaling is behind of all these issues... but it is just a feeling. It is mainly because we start seeing all these issues after we deployed the autoscaling
[16:59:22] <isaranto>	 ack. I agree to set minReplicas to 3 and re-check. Another thing I see is that  autoscaling target is set to 15 rps . this means that a new pod would be spawned when we are at 75% of that (if I remember correctly) which means ~11 rps. looking at istio we haven't had so many rps to this service over a month so I perhaps something is not set right wrt to that
[17:09:17] <isaranto>	 another thing I noticed is that the agent container is missing from both prod and staging -- it only exists in the experimental namespace which means that the batcher isn't working correctly and increased latencies would be indeed expected
[17:09:29] * isaranto cries 
[17:10:06] <isaranto>	 anyway, now we know that a couple of things are off so we can investigate them properly. I'll update the task as well
[17:12:23] <isaranto>	 actually I'll open another task for the batcher, cause although these 2 might be related it is a bug we need to investigate separately
[17:33:27] <isaranto>	 but the batcher issue is definitely the reason for the difference in the load tests that you experienced George
[17:39:37] <wikibugs>	 06Machine-Learning-Team: Kserve batcher doesn't seem to be properly configured for edit-check - https://phabricator.wikimedia.org/T403423 (10isarantopoulos) 03NEW
[17:42:01] * isaranto afk