[11:21:43] * elukey lunch! [15:32:47] the kserve-container has the following limits [15:32:49] Limits: [15:32:49] cpu: 1 [15:32:49] memory: 2Gi [15:32:49] Requests: [15:32:52] cpu: 1 [15:32:54] memory: 2Gi [15:35:18] there seems to be a way to change it (via InferenceService settings) but I am wondering what's best, if to beef-up a pod or increase the replicas [15:36:13] o/ [15:37:17] o/ [15:37:24] maybe 2 cpus is the minimum in our case [15:37:30] elukey: good question, should we scale out or scale up? [15:37:44] accraze: probably both :D [15:37:56] i only increased the number of replicas on the old minikf sandbox during load tests [15:38:10] but yeah maybe a bit of both to start out [15:42:22] increasing cpu makes sense due to how max async workers is computed [15:47:02] so I am reading https://www.tornadoweb.org/en/stable/guide/running.html#processes-and-ports [15:47:36] and IIUC the "workers" (not async_workers) setting for kserve, by default #cpu, forks processes [15:47:42] each one with one ioloop [15:47:56] (and in the kserve use case, a thread pool executor) [15:49:17] in my head, when a request comes in it its one of the ioloops and gets processed [15:49:30] sorry, poorly written [15:49:52] a request comes in, and if we have 2 cpus/workers, then it is handled by one of them [15:50:10] in turn, the process uses an ioloop with a thread pool [15:50:54] https://github.com/kserve/kserve/blob/master/python/kserve/kserve/model_server.py#L130 [15:51:54] when the code reaches the point of calling the model, I guess that it is all cpu time right? (so the thread pool basically is not very useful due to GIL) [15:52:20] ahh i see what you are saying [15:52:36] yeah i don't understand why there is a thread pool now [15:56:40] interesting, there is come code committed post-0.7 that renames KFModel [15:57:20] https://github.com/kserve/kserve/blob/release-0.7/python/kserve/kserve/kfmodel.py [15:57:39] there seems to be some coroutine-like code [15:58:37] yeah it looks like they are using asyncio syntax in the KFModel class [16:00:08] and there seems to be a async http client as well [16:00:16] I am wondering how the model is called then [16:00:34] I was convinced that it was basically in a format that allowed code to be executed [16:01:05] the async http client is for communicating w/ transformers i think [16:01:11] and explainers [16:01:20] they all inherit the KFModel class [16:01:53] okok [16:03:10] it looks like we could try moving our mw-api calls to a function with the async annotation and then just do `await` on it [16:03:27] in the transformer? [16:03:37] yeah, but not sure if that will help much [16:03:40] but in theory it is already executed in the ioloop no? [16:04:06] exactly, i doubt it would improve performance [16:05:04] what I fear is that we cannot really avoid that "heavy" (to be quantified) cpu time spent in the score() function [16:07:33] ah yep that's the big one [16:07:53] and the main trouble is that the thread pool will likely block and suffer during those moments [16:08:30] `score` could probably use a bit more memory too [16:21:01] very interesting graph [16:21:02] https://grafana.wikimedia.org/d/hyl18XgMk/kubernetes-container-details?orgId=1&from=now-6h&to=now&var-datasource=eqiad%20prometheus%2Fk8s-mlserve&var-namespace=revscoring-editquality&var-pod=enwiki-damaging-predictor-default-tcqns-deployment-6777975sg6pj [16:21:12] queue-proxy is heavily cpu-throttled [16:21:18] the limits are super low [16:21:21] I am going to raise it [16:28:04] (if there is an easy way) [16:35:29] I am testing new values on the fly :) [16:53:58] better now [16:53:59] https://grafana.wikimedia.org/d/hyl18XgMk/kubernetes-container-details?orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fk8s-mlserve&var-namespace=revscoring-editquality&var-pod=enwiki-goodfaith-predictor-default-92g8c-deployment-54f6865vxw6&var-container=All [16:57:30] nice one! [17:33:20] going afk for the weekend folks, have a nice rest of the day and weekend :) [17:33:59] 10Machine-Learning-Team, 10observability: Improve ORES observability - https://phabricator.wikimedia.org/T299137 (10elukey) @Halfak Hi! If you have time I'd like to ask you a question about how to track down the errors behind `score_errored`: https://grafana.wikimedia.org/d/HIRrxQ6mk/ores?viewPanel=72&orgId=1&... [17:35:06] have a good weekend elukey! [21:06:04] hmmm running into some troubles building the draftquality-transformer image [21:07:34] i need to install nltk to get stopwords and also then call setuptools to install the transformer module into /opt/lib/python/site-packages [21:08:13] was doing the setuptools call in builder command in the base `build` variant of the blubberfile [21:08:57] but now with loading revscoring in the transformer, i need to install nltk and run the downloader first [21:09:50] might need to include an install.sh script like we've done in the past that handles everything and use that in the builder command [21:56:23] 10Machine-Learning-Team, 10observability: Improve ORES observability - https://phabricator.wikimedia.org/T299137 (10Halfak) These are edits to Wikidata erroring. They might be edits to regular wiki pages. The damage and item quality models were made to assess edits to entities (items, properties) so it error...