[05:26:24] Good morning! [06:32:21] (ErrorBudgetBurn) firing: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [06:48:04] (03PS1) 10Ilias Sarantopoulos: readability: validate json input [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981715 (https://phabricator.wikimedia.org/T352834) [06:49:02] (03PS2) 10Ilias Sarantopoulos: readability: validate json input [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981715 (https://phabricator.wikimedia.org/T352834) [06:56:05] (03PS3) 10Ilias Sarantopoulos: readability: validate json input [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981715 (https://phabricator.wikimedia.org/T352834) [07:00:42] (03PS1) 10Ilias Sarantopoulos: ci: add isort to pre-commit to order imports [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981717 [07:03:58] (03PS1) 10Ilias Sarantopoulos: ci: refactor import with isort [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981718 [07:07:30] (03PS2) 10Ilias Sarantopoulos: ci: refactor imports with isort [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981718 [07:08:35] (03PS2) 10Ilias Sarantopoulos: ci: add isort to pre-commit to order imports [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981717 [07:10:52] (03CR) 10Ilias Sarantopoulos: "This would have to be merged first before change https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/981717 in " [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981718 (owner: 10Ilias Sarantopoulos) [07:12:31] o/ I added isort to pre-commit and sent two different patches (one with the changes that are needed and one that adds isort to the list of pre-commit actions) [08:08:39] (03CR) 10Kevin Bazira: [C: 03+1] ci: add isort to pre-commit to order imports [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981717 (owner: 10Ilias Sarantopoulos) [08:19:15] (03CR) 10Kevin Bazira: [C: 03+1] "This is a good one as it also follows most of the WMF Python coding conventions for imports:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981718 (owner: 10Ilias Sarantopoulos) [08:45:04] 10Machine-Learning-Team: Optimize response performance for the article-descriptions model-server - https://phabricator.wikimedia.org/T353127 (10kevinbazira) [09:20:14] 10Machine-Learning-Team, 10Moderator-Tools-Team, 10Research, 10Temporary accounts, 10Trust and Safety Product Team: RevertRisk model readiness for temporary accounts - https://phabricator.wikimedia.org/T352839 (10kostajh) [09:21:25] hello folks! [09:32:42] o/ Luca! [09:53:20] 10Machine-Learning-Team, 10Wikipedia-Android-App-Backlog (Android Release - FY2023-24): Migrate Machine-generated Article Descriptions from toolforge to liftwing. - https://phabricator.wikimedia.org/T343123 (10isarantopoulos) Regarding latency: As things are at the moment dropping down to ms latency is somethi... [09:54:46] kevinbazira: let's sync here about how to tackle the performance issues [09:57:41] isaranto: thank you for the suggestions you shared on the task. I am going to try them and share updates on the findings. [09:58:16] let's discuss first, perhaps some of them are not worth trying [10:01:05] ok, which suggestion is not worth trying? [10:04:08] I'm just saying that it worth to have a discussion before jumping and implementing things that might take a while. For example I don't know if there is another model which is smaller and has similar results with this one, especially since it is a use case that has already been tested. So perhaps it is not worth to try to find a smaller model at the moment [10:05:16] trying out cpu/memory is a good starting point for sure though [10:09:37] suggestion before starting - is it something that we can replicate locally? [10:09:55] for example, using docker's restrictions [10:10:09] there is always the latency related to reaching (via Internet) the mw api [10:10:26] but those 14s are a lot [10:11:35] the other thing that we can do is try to make a lot of requests, and watch https://grafana.wikimedia.org/d/hyl18XgMk/kubernetes-container-details [10:11:50] if the problem is CPU/Memory related it should be more clear [10:13:16] profiling the code is a good way to start to see if there any visible bottlenecks that we can improve (other than mwapi and model prediction). [10:13:51] then predict step takes most of the time (approx 13s) and preprocess is ~400ms so we should focus on the first part to start with [10:13:55] I mean predict [10:16:01] isaranto: sure sure, I'll profiling the code and then look into CPU/RAM options. for the smaller model, I haven't had a chance to look into it yet, as you have just suggested it now. I'll let you know which options I am exploring as we move along. [10:16:14] elukey: yep, I am using a container on the ml-sandbox to run the local tests. [10:20:24] kevinbazira: there is also another thing to consider, namely OMP_NUM_THREADS [10:20:54] we currently set it to 1, so even if we change the number of CPUs we'll likely not gain anything [10:21:49] there are some suggestions in https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#utilize-openmp [10:22:16] but I am wondering why by default it is not auto-setup (with the number of cpus stated in the cgroup) [10:22:24] did we investigate this during the past days? [10:24:34] I don't remember us investigating this recently. [10:24:36] thanks elukey, I'll definitely keep OMP_NUM_THREADS in mind as I explore what option will work best. [10:25:32] I think that we should probably figure out a good strategy for these use cases, keeping OMP_NUM_THREADS and cpu count in sync via helmfile is surely super error prone [10:26:03] +1 [10:27:54] Morning! [10:28:39] klausman: o/ [10:28:47] o/ Tobias! [10:29:20] elukey: on the mesh upgrade: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/980904 Did you update the modules using sextant? [10:29:51] isaranto: yep! [10:30:03] otherwise it is super error prone [10:30:09] ok, thanks! wanted to now for future upgrades! [10:31:06] It is very weird, I don't see any issue or similar related to cgroups in pytorch's gh [10:31:18] and it is impossible that we are the first ones using a cgroup [10:31:28] so I am missing something obvious [10:32:21] (ErrorBudgetBurn) firing: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [10:43:46] (03PS1) 10Ilias Sarantopoulos: outlink: validate json input in transformer [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/982043 (https://phabricator.wikimedia.org/T352834) [10:45:29] I set the outlink patch as WIP until I test it first. In that one the change is in the transformer so I want to be sure it works [10:58:11] (03CR) 10Klausman: readability: validate json input (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981715 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [10:59:04] * isaranto afk lunch and errand [11:04:51] (03CR) 10Klausman: [C: 03+1] ci: add isort to pre-commit to order imports [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981717 (owner: 10Ilias Sarantopoulos) [11:46:10] * elukey lunch! [12:09:41] * klausman lunch and errands [12:38:05] * isaranto back quicker than I thought! [13:12:53] (03CR) 10Ilias Sarantopoulos: readability: validate json input (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981715 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [13:19:05] (03CR) 10Klausman: [C: 03+1] readability: validate json input (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981715 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [14:00:54] (03CR) 10Ilias Sarantopoulos: [C: 03+2] readability: validate json input [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981715 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [14:01:41] (03Merged) 10jenkins-bot: readability: validate json input [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981715 (https://phabricator.wikimedia.org/T352834) (owner: 10Ilias Sarantopoulos) [14:03:28] (03PS10) 10Ilias Sarantopoulos: nllb: add cpu optimized version [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980015 (https://phabricator.wikimedia.org/T351740) [14:05:18] (03PS11) 10Ilias Sarantopoulos: nllb: add cpu optimized version [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980015 (https://phabricator.wikimedia.org/T351740) [14:05:58] I opened up the nllb patch for reviews https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/980015 [14:15:18] (03PS12) 10Ilias Sarantopoulos: nllb: add cpu optimized version [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980015 (https://phabricator.wikimedia.org/T351740) [14:16:03] you can w8 if you plan to review as I'm testing something else to check latencies and I'll write back here [14:24:29] ok all good it is ready for review. I'm also reporting on the task [14:25:00] (03CR) 10Klausman: nllb: add cpu optimized version (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980015 (https://phabricator.wikimedia.org/T351740) (owner: 10Ilias Sarantopoulos) [14:27:00] (03PS13) 10Ilias Sarantopoulos: nllb: add cpu optimized version [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980015 (https://phabricator.wikimedia.org/T351740) [14:27:10] (03CR) 10Ilias Sarantopoulos: nllb: add cpu optimized version (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980015 (https://phabricator.wikimedia.org/T351740) (owner: 10Ilias Sarantopoulos) [14:32:21] (ErrorBudgetBurn) firing: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [14:38:21] Sorry all I’m sick and out today [14:38:36] It isn’t too bad, I should be back tomorrow [14:38:46] (Famous last words) [14:42:27] take it easy, Chris <3 [14:45:55] o/ Chris , hope you feel better soon <3 [14:50:34] isaranto: one qs - instead of https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/981715, should we simply inject content-type application/json in the api-gateway? [14:51:16] (03CR) 10Klausman: nllb: add cpu optimized version (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980015 (https://phabricator.wikimedia.org/T351740) (owner: 10Ilias Sarantopoulos) [14:51:30] that would be better. but then what about internal requests? [14:51:31] (an idea to avoid changing everything) [14:51:58] in theory folks should know to use the content-type, most of the issues come from outside [14:52:12] but it may be confusing [14:52:20] for readability it is ok since it isn't being used, but for other models like revertrisk we did it this way to avoid issues [14:52:40] I'd rather signal to external users hpw to do it right, unless not setting the C-T: header is really pervasive. [14:56:39] 10Machine-Learning-Team, 10Patch-For-Review: Deploy ctranslate2 version of nllb-200 - https://phabricator.wikimedia.org/T351740 (10isarantopoulos) I have added support to the llm model server for a ctranslate2 using the code sample found in the [[ https://forum.opennmt.net/t/nllb-200-with-ctranslate2/5090 | Op... [15:01:56] elukey: we have already added it to other model servers that's why I added it to readability as well. Since we don't accept other content types it makes sense to validate json input. if we find an alternative though it is easy to change [15:05:41] +1 [15:32:36] (ErrorBudgetBurn) resolved: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [15:52:06] (03PS14) 10Ilias Sarantopoulos: nllb: add cpu optimized version [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980015 (https://phabricator.wikimedia.org/T351740) [15:52:21] (03CR) 10Ilias Sarantopoulos: nllb: add cpu optimized version (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980015 (https://phabricator.wikimedia.org/T351740) (owner: 10Ilias Sarantopoulos) [16:00:00] (03CR) 10Klausman: [C: 03+1] nllb: add cpu optimized version (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/980015 (https://phabricator.wikimedia.org/T351740) (owner: 10Ilias Sarantopoulos) [16:00:15] * elukey bbiab! [16:02:07] (03PS3) 10Ilias Sarantopoulos: ci: refactor imports with isort [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981718 [16:13:07] (03PS3) 10Ilias Sarantopoulos: ci: add isort to pre-commit to order imports [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/981717 [16:14:13] Tobias thanks for the review and the valuable input! [16:16:56] Going afk folks, have a nice evening! [16:32:47] \o [16:33:07] taking a short break before the SRE staff meeting [17:08:47] afaics libgomp (the openmp implementation that we use via pytorch) picks up the number of threads from the cpu listed, but I didn't see a mention of cgroups :( [17:10:09] AIUI, the right syscall to use is nproc, which would return _available_ CPUs. I wonder why libgomp doesn't. [17:11:47] er, get_nprocs(3) [17:11:53] n;proc is the cmdline tool [17:12:11] klausman: I think nproc is not right, it checks host's cpus [17:12:31] just checked with docker [17:12:47] I thought nproc respects cgroups, oh well [17:13:36] https://github.com/pylint-dev/pylint/issues/5809 Oh look... [17:21:03] we can probably add a function in the shared code that sets OMP_NUM_THREADS based on the cgroup v2's max.cpu file for example [17:21:25] so for things that use pytorch we'll be covered [17:23:44] I'll file a proposal tomorrow [17:36:08] did we ever decide which machine the MI100 will go into? [17:36:51] the idea was to use one of the ml-staging nodes [17:37:22] the workers are a bit crowded, I am wondering if we can even drain [17:38:12] hmm. I think even if this means some downtime, it isn't _too_ bad since it's only staging, but I'll try to keep it short. [17:39:07] if we can drain, we can shutdown one of the ml-staging workers so Papaul will be able to work on the node anytime [17:39:16] rather than syncing with them [17:39:38] if it becomes a problem for the team we can bring it online again [17:39:53] I'll have a look at resources etc tomorrow, maybe drain rightaway to see what happens. [17:41:15] (coordinating here, of course). [17:41:36] sure [17:45:43] have a nice rest of the day folks! [17:47:45] \o