[08:19:22] 10Machine-Learning-Team, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes: Import new knative-serving version (k8s 1.23 dependency for ML) - https://phabricator.wikimedia.org/T323793 (10elukey) [08:23:37] hello :) [08:23:40] from https://github.com/knative/serving/releases/tag/knative-v1.8.0 [08:23:55] "Bump min-version to k8s 1.23" [08:23:56] lol [08:24:14] they are still aggressively dropping support [08:25:56] need to run an errand, will be back in ~1h, ttl! [08:29:41] 10Machine-Learning-Team, 10Discovery-Search: Create Model Card for Search MLR - https://phabricator.wikimedia.org/T323794 (10Gehel) [09:25:11] aggressively BUT the phrasing is nice :) [09:28:38] the kserve website still recommends knative 1.0-1.4 even for k8s 1.24 (for now) [09:59:43] isaranto: o/ [10:00:00] yeah but see https://github.com/kserve/kserve/pull/2431 [10:00:07] I think that the matrix map is not up to date [10:00:44] /o yeah seems like 0.10 will have 1.72. when 0.9 was released only 1.5 or 1.6 was there [10:01:56] elukey: how do u run load test with benthos? did u use wrk? [10:03:52] isaranto: nono I just run benthos and let it contact the inference endpoint, it is not a lot of traffic but the variations of rev-id cause a big change in latency [10:04:00] and sometimes it is enough to see errors [10:04:08] connections piled up etc.. [10:04:36] with wrk we used only one rev-id at the time, less variation and more predictable latency in the response (easier) [10:05:53] ah ok then. Is there any way to record latencies or we just observe them through grafana? [10:08:09] for the moment I just used grafana [10:10:13] ack. will do the same for now [10:16:52] isaranto: keep in mind that you are going to notice https://phabricator.wikimedia.org/T322196 when testing [10:17:17] it is something that we haven't fully solved it, I believe it depends heavily on the istio/envoy version that we run in the sidecar [10:17:31] elukey: o/ I'm gonna run tests we talked yesterday [10:17:57] aiko: o/ ack! good luck :) [10:18:05] and morning! :) [10:18:28] morning :) [10:19:02] elukey: thanks will keep an eye. now I am running some tests on prod editquality goodfaith. Hi aiko! [10:23:43] elukey: q - is there grafana chart I can check for test spark -> mwapi [10:24:23] isaranto: hiii Ilias :) [10:25:03] aiko: it depends, what do you want to check? [10:25:54] elukey: latencies maybe? something to compare with spark -> liftwing [10:26:56] that will be difficult to isolate, there are specific dashboards for mw api in grafana but related to the whole traffic that they manage.. you can probably track the timings on your side, easier [10:28:03] okok no prob [10:28:30] where is the one for the whole traffic? [10:29:35] https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard :) [10:31:19] thanks! [10:36:56] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+2] tests: Replace assertEmpty with assertSame [extensions/ORES] - 10https://gerrit.wikimedia.org/r/860655 (owner: 10Umherirrender) [10:57:05] (03Merged) 10jenkins-bot: tests: Replace assertEmpty with assertSame [extensions/ORES] - 10https://gerrit.wikimedia.org/r/860655 (owner: 10Umherirrender) [11:37:44] * elukey lunch! [12:07:43] the issue can't be reproduced in tests of spark -> mwapi. everything worked fine [12:08:00] one difference I noticed is that when given the same test set ~1000 rev_ids [12:08:32] spark -> mwapi - I need to wait 12 minutes to get the result [12:09:57] spark -> lift wing only took 1.3 minutes to run, but 354 rev_ids returned were problematic [12:11:14] https://grafana.wikimedia.org/d/zsdYRV7Vk/istio-sidecar?from=now-3h&orgId=1&to=now&var-backend=All&var-cluster=codfw%20prometheus%2Fk8s-mlstaging&var-namespace=experimental&var-quantile=0.5&var-quantile=0.95&var-quantile=0.99&var-quantile=0.75&var-response_code=All [12:12:34] the peak around 11:54 [12:17:56] * klausman lunch [14:47:42] aiko: interesting! [14:53:52] aiko: are we 100% sure that our code (that runs on liftwing) makes the same api calls that you have tested with the last udf? I mean, is there a possibility that for $bug/$reason/etc.. we maybe make a mw api call with a different number of parameters and those trigger the missing response? [16:15:04] 10Machine-Learning-Team: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) Checked en-wiki-revscoring-editquality-goodfaith with benthos and wrk: with wrk for rev_id: 132421 with `wrk --timeout 5s -s inference.lua https://inference.svc.eqiad.wmnet:30443/v1... [16:30:16] I ran some tests but will rerun on monday for more concrete results. on Monday I'd like to discuss it a bit to get an understanding of previous tests. heading out for the weekend! /o [16:30:58] have a good weekend! [16:49:26] \o [16:54:43] elukey: 100% sure the api calls are the same [16:55:58] elukey: the only difference is that one uses mwapi.AsyncSession with asyncio.gather and another one uses mwapi.Session [17:00:17] weird [17:03:18] can't explain.. sigh [17:06:00] aiko: not sure if I have asked, but if you hit the inference endpoint with the same requests but outside spark all works [17:06:07] right? [17:06:43] elukey: yep [17:10:07] aiko: another test that I have in mind, to isolate further the problem - does it reproduce if you use a single rev-id all the times? [17:11:49] elukey: use spark? [17:13:30] yes yes using spark [17:13:41] IIRC you hit the inference api with multiple requests right? [17:13:49] (different ones I mean, not the same) [17:14:17] yes [17:15:01] yeah so we could try to see if it happens also with specific requests, to reduce the repro use case [17:15:24] I recall that you mentioned that sometimes some requests lead to missing responses and sometimes not [17:15:37] but I'd like to understand if this is something related to traffic volume or not [17:15:40] you meant hit the inference api with multiple requests but all the same rev_id? [17:15:48] exactly yes [17:15:56] like we do for wrk [17:16:13] ok I can do that [17:32:15] aiko: going afk for the weekend, let's restart on monday :) [17:32:20] I test a rev_id that was problematic in the last test, and hit the inference api 1000 requests.. it's no problem I got results [17:32:36] ah so same rev-id no issue? [17:32:37] I'm gonna try some other rev_ids [17:32:43] yes [17:32:49] mistery [17:32:59] have a good weekend folks! [17:33:19] have a nice weekend Luca! :)