[03:30:57] 10Machine-Learning-Team, 10Data-Engineering, 10Observability-Logging, 10observability, 10Event-Platform Value Stream (Sprint 03): Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10lmata) [07:18:14] elukey: yeah that makes sense! I think we should try to pass a max_asyncio_workers to the model server [07:18:27] good morning :) [07:18:30] elukey: currently we don't do it, so it is assigned by the formula min(32, utils.cpu_count()+4) [07:18:41] morning! [07:18:43] I am still trying to figure out if those threads are used or not :) [07:18:59] in theory asyncio works on a single thread [07:20:06] I am also checking https://docs.aiohttp.org/en/stable/client_quickstart.html [07:20:16] Don’t create a session per request. Most likely you need a session per application which performs all requests altogether. [07:20:19] More complex cases may require a session per site, e.g. one for Github and other one for Facebook APIs. Anyway making a session for every request is a very bad idea. [07:21:32] I checked the editquality's model-server code, I think that we do use a new session every time [07:24:21] yeah we do :( [07:31:07] in theory it is easy to refactor the code, for example we can create a session in __init__ and reuse it [07:31:18] but I am wondering what happens if the session closes for some reason [07:59:00] (03PS1) 10Elukey: WIP - editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 [07:59:41] (03CR) 10CI reject: [V: 04-1] WIP - editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (owner: 10Elukey) [08:01:12] the CI error is unrelated [08:01:30] aiko: --^ the above wip patch works, but not 100% sure if it is the right move for aiohttp [08:17:53] 10Machine-Learning-Team, 10Data-Engineering, 10Observability-Logging, 10observability, 10Event-Platform Value Stream (Sprint 03): Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10fgiunchedi) >>! In T319214#8321555, @Ottomata wrote: > By the way, any Event Platform produ... [08:18:53] (03PS2) 10Elukey: WIP - editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 [08:20:09] (03CR) 10jenkins-bot: WIP - editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (owner: 10Elukey) [08:24:00] (03PS3) 10Elukey: WIP - editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 [08:24:08] elukey: I'm thinking maybe we could have something like this https://github.com/kserve/kserve/blob/release-0.8/python/kserve/kserve/model.py#L93 if there is no session, we create one because the session might be closed accidentally? [08:24:58] (03CR) 10CI reject: [V: 04-1] WIP - editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (owner: 10Elukey) [08:25:05] (03PS4) 10Elukey: WIP - editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 [08:26:19] aiko: good point for the code. Do you think it is different from the solution above --^? IIUC the http client for kserve is created only if the attribute is None [08:26:22] (03CR) 10CI reject: [V: 04-1] WIP - editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (owner: 10Elukey) [08:26:26] (so basically only upon first usage) [08:29:28] aiko: or maybe there is a is_closed() method to check [08:29:30] let's ses [08:29:33] *see [08:29:38] that would be perfect for the use case [08:29:52] Mmm yeah it is created when the attribute is none [08:29:59] if the TCP socket breaks etc.. then I am pretty sure that aiohttp session is done [08:30:17] mmm or not, in theory there is a connection pool [08:30:26] so no it should stay open in theory [08:31:25] oh ok [08:32:38] https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client.py#L981 [08:32:40] nice [08:34:08] and do we need to close the session at the end by ourself? [08:35:05] if we don't use the context manager yes [08:35:09] ahh I saw you've added it [08:35:32] testing a new version based on your suggestions, I'll update the patch in a min :) [08:39:28] (03PS5) 10Elukey: WIP - editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 [08:39:32] aiko: --^ [08:39:58] (03CR) 10CI reject: [V: 04-1] WIP - editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (owner: 10Elukey) [08:40:26] (bbiab) [08:42:20] Morning! [08:42:51] Looking at that CI failur, I am entirely unable to find a concrete error message. Does anyone know where to find it? [09:27:19] klausman: morning :) https://phabricator.wikimedia.org/T321035 [09:27:36] there is a missing dockerfile error in the CI job [09:27:45] I think they are working on it [09:28:05] I see, so not a problem with the change itself but the CI [09:28:44] yep yep [09:30:11] Hello everyone, some Time ago I started labeling far right hate symbols on wikimedia-commons images, using your labelstudio instance. Is there any interest on your site to use this data? I also can contribute userexperiene in regards to labeling commons images. best wishes Wolf aka Bindestrich [09:35:21] (03PS6) 10Elukey: editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (https://phabricator.wikimedia.org/T320374) [09:35:58] (03CR) 10CI reject: [V: 04-1] editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:36:44] Bindestrich: Hi! There is definitely interest, but we have so far not advanced in any decision about what to use for data labelling etc.. and what to do with labelled data (we are still trying to migrate away from ORES to a new platform). We can ping chrisalbon later on to have a follow up! [09:38:42] aiko: sent a new version of the code review, still failing for unrelated CI issues, lemme know your thoughts [09:38:52] (when you have time of course, no rush) [09:50:44] (03CR) 10AikoChou: [C: 03+1] editquality: improve aiohttp and asyncio performances (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:54:11] (03CR) 10Elukey: editquality: improve aiohttp and asyncio performances (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:54:32] (03PS7) 10Elukey: editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (https://phabricator.wikimedia.org/T320374) [09:54:56] (03CR) 10Elukey: editquality: improve aiohttp and asyncio performances (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [09:55:46] (03PS8) 10Elukey: editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (https://phabricator.wikimedia.org/T320374) [09:56:26] aiko: thanks for the review, also added another logging entry (so that we know in the logs if the session is re-created) [09:59:49] elukey: yeah that's good! I'm thinking do we really need to define a property for it, or we just check if the session closed before we use it (so in preprocess, before mw_http_cache) [10:00:49] elukey thank you for your answer, I reached out to chrisalbon via Twitter , whats the best way to contanct you all/him? I dont use irc much. [10:13:20] Bindestrich: ah he'll read later on during the day on IRC :) [10:13:46] aiko: property looks tidier in my opinion, it doesn't really hurt performances to have it [10:14:45] ok if I merge? [10:15:37] elukey: ok ok no problem :) [10:16:12] thanksss [10:16:15] (03CR) 10Elukey: [C: 03+2] editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [10:16:59] aiko: will run the benthos tests with the new aiohttp connection pool alone to see if it changed anything, and then I'll try another test with doubled/tripled asyncio workers [10:17:07] (03Merged) 10jenkins-bot: editquality: improve aiohttp and asyncio performances [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/843869 (https://phabricator.wikimedia.org/T320374) (owner: 10Elukey) [10:30:58] elukey: ack! [10:51:25] new docker image published, going to set it manually on ml-serve-codfw [11:10:28] so far the latencies for enwiki-goodfaith are way better [11:10:41] but let's see in an hour if it is the same [11:16:16] <- lunch [11:45:05] * elukey lunch [12:20:10] aiko: https://gitlab.com/quantlane/libs/aiodebug looks very nice [13:50:01] mmmm with more asyncio threads it doesn't seem to be better than before, I see the usual breakages/spikes in latency [13:50:15] I am going to rollout the new docker image to editquality-goodfaith: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/843951/ [13:50:33] I'll just test the new client sessions for aiohttp with benthos (on all goodfaith models) [14:50:07] one piece of info that I keep forgetting is that knative activator sits between istio and the isvc pods for low traffic [14:50:24] I mean the activator pods act as reverse proxy I think [15:56:38] (03PS5) 10AikoChou: outlink: add code to send events to EventGate [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/828481 (https://phabricator.wikimedia.org/T315994) [16:06:08] 10Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Preferences, 10Moderator-Tools-Team (Kanban): When ORES quality filters are selected in mobile web, entries should be highlighted - https://phabricator.wikimedia.org/T314026 (10eigyan) @Jdlrobson thank you for your feedback. I was hoping you might be able t... [16:12:14] (03CR) 10AikoChou: "Updated the event to follow the revision-score event schema. The research team would like to see this model used where the existing ORES a" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/828481 (https://phabricator.wikimedia.org/T315994) (owner: 10AikoChou) [16:14:17] (03PS4) 10AikoChou: events.py: fix prediction type in the revision_score_event [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/842414 [16:22:59] elukey: ^^^ I updated the outlink code sending events and a fix to the prediction type in event [16:23:17] elukey: when you have a moment could you take a look? no rush :) [16:27:44] aiko: I was about to log off! Ok if I do it tomorrow morning? [16:53:30] sure! no problem [16:56:50] elukey: have a nice rest of the day :) [18:04:08] 10Machine-Learning-Team, 10Data Engineering Planning, 10Observability-Logging, 10observability, 10Event-Platform Value Stream (Sprint 03): Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10EChetty)