[06:11:27] Good morning :D [06:41:14] I added the env var OMP_NUM_THREADS for readability https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/985088 [06:43:48] 10Machine-Learning-Team: Deploy ctranslate2 version of nllb-200 - https://phabricator.wikimedia.org/T351740 (10isarantopoulos) 05Open→03Resolved [06:44:05] 10Machine-Learning-Team: Deploy the recommendation-api-ng on LiftWing - https://phabricator.wikimedia.org/T347015 (10isarantopoulos) [06:44:08] 10Machine-Learning-Team: Create external endpoint for recommendation-api-ng hosted on LiftWing - https://phabricator.wikimedia.org/T347263 (10isarantopoulos) 05Open→03Resolved [06:44:12] 10Machine-Learning-Team, 10ORES: Add deprecation warnings to ORES-related repositories on Github - https://phabricator.wikimedia.org/T349632 (10isarantopoulos) 05Open→03Resolved [08:30:33] 10Lift-Wing, 10Machine-Learning-Team: Investigate increase p99 latencies in ml-serve-eqiad - https://phabricator.wikimedia.org/T352958 (10isarantopoulos) I deployed the solution with the configuration that rewrites specific hosts but I still see some requests in [[ https://grafana.wikimedia.org/d/n3LJdTGIk/kse... [08:40:00] mooorning :) [08:42:16] (03PS1) 10Ilias Sarantopoulos: revertrisk: log revid and lang in preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985105 (https://phabricator.wikimedia.org/T352958) [08:42:38] hi Aiko! [08:44:34] readability uses transformers that depend on torch, maybe torch uses OpenMP [08:44:45] hi Ilias :) [08:45:51] spot on! that should be it [08:47:54] these types of models use 2 models - a transformers model for preprocessing the inputs and another one for the actual prediction [08:49:46] same will happen with rr-multilingual [08:56:50] yes [09:01:24] isaranto: for logging revid and lang, we have something here https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/revert_risk_model/model_server/base_model.py#102 [09:02:03] yes, but this is in case there is an error. If the request goes through but takes too long we have no clue [09:03:06] in the kserve log, I saw a problematic case ERROR:root:An error has occurred while fetching info for revision: 399508 (lzh). Reason: Cannot connect to host zh-classical.wikipedia.org:443 ssl:default [Connection reset by peer] [09:04:58] so we missed lzh -> zh-classical, and there maybe more cases, but only when it caused a problem we would know [09:09:12] yeah, I got your point. it'll help us debug other issue [09:11:47] (03CR) 10AikoChou: [C: 03+1] revertrisk: log revid and lang in preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985105 (https://phabricator.wikimedia.org/T352958) (owner: 10Ilias Sarantopoulos) [09:11:59] a good catch we should add that as well [09:12:26] I can create a script and score all wikis to check [09:12:40] or we can also watch errors and fix them :) [09:14:40] I think it's fine to watch errors and fix them. it seems not happen as much as before [09:15:20] (03CR) 10Kevin Bazira: [C: 03+1] revertrisk: log revid and lang in preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985105 (https://phabricator.wikimedia.org/T352958) (owner: 10Ilias Sarantopoulos) [09:27:07] Yeah I agree [09:36:21] 10Machine-Learning-Team, 10Moderator-Tools-Team, 10Research, 10Temporary accounts, 10Trust and Safety Product Team: RevertRisk model readiness for temporary accounts - https://phabricator.wikimedia.org/T352839 (10kostajh) >>! In T352839#9413084, @diego wrote: > Are you proposing to add the "user status"... [09:54:45] (03CR) 10Ilias Sarantopoulos: [C: 03+2] revertrisk: log revid and lang in preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985105 (https://phabricator.wikimedia.org/T352958) (owner: 10Ilias Sarantopoulos) [10:00:07] (03Merged) 10jenkins-bot: revertrisk: log revid and lang in preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985105 (https://phabricator.wikimedia.org/T352958) (owner: 10Ilias Sarantopoulos) [10:17:25] when u have a minute https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/985114 [10:30:07] +1ed! [10:35:38] (03PS1) 10Ilias Sarantopoulos: revertrisk: add lzh redirect [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985122 (https://phabricator.wikimedia.org/T352958) [10:35:48] thanks! [10:36:33] regarding the above, I'll check if there are other redirects we're missing and add them [10:43:38] I deployed revertrisk! [11:38:28] * isaranto afk lunch! [12:46:50] * aiko lunch 2 [13:53:18] Morning all! [13:54:43] o/ Chris! [13:56:04] o/ [13:56:11] I need coffee [14:16:25] aiko: I was checking the latest errors and I found lzh again [14:16:32] so we can proceed with that one [14:19:06] ack! [14:19:19] (03CR) 10AikoChou: [C: 03+1] revertrisk: add lzh redirect [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985122 (https://phabricator.wikimedia.org/T352958) (owner: 10Ilias Sarantopoulos) [14:43:33] (03CR) 10Ilias Sarantopoulos: [C: 03+2] revertrisk: add lzh redirect [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985122 (https://phabricator.wikimedia.org/T352958) (owner: 10Ilias Sarantopoulos) [14:49:32] 10Machine-Learning-Team: Run load tests for the article-descriptions isvc - https://phabricator.wikimedia.org/T353952 (10kevinbazira) [15:00:37] I don't like that we have to make 2 patches for these types of changes (one in inference services repo and one in deployment charts). What I want to add is a configmap that we can attach to the pods that will read this config from the values.yaml and will save it into the file python/config.yaml. That way we can just override values from deployment-charts repo [15:01:07] but this needs some work and testing so not today ¯\_(ツ)_/¯ [15:06:28] 10Machine-Learning-Team: Run load tests for the article-descriptions isvc - https://phabricator.wikimedia.org/T353952 (10kevinbazira) I ran load tests using most languages supported by the model with 3 beams set based on T343123#9380779. All the inputs utilized for the request payload can be found in: P54507. Be... [15:09:31] (03CR) 10Ilias Sarantopoulos: [V: 03+2 C: 03+2] revertrisk: add lzh redirect [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985122 (https://phabricator.wikimedia.org/T352958) (owner: 10Ilias Sarantopoulos) [15:22:15] Maybe make a ticket for that and we can discuss after the new year? [15:24:51] yep! [15:26:19] on the redirects configuration patch: a new image build wasn't triggered because we need to also change CI triggers! [15:26:27] I did so here -> https://gerrit.wikimedia.org/r/c/integration/config/+/985167 [15:59:20] (03PS1) 10Kevin Bazira: test: add load test script and input for article-descriptions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985127 (https://phabricator.wikimedia.org/T353952) [16:18:13] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "LGTM! Nice!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985127 (https://phabricator.wikimedia.org/T353952) (owner: 10Kevin Bazira) [16:28:34] (03CR) 10Kevin Bazira: [C: 03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985127 (https://phabricator.wikimedia.org/T353952) (owner: 10Kevin Bazira) [16:33:46] logging off folks, have a great time and cu soon <3 [16:41:11] Enjoy the holidays o/ [16:48:57] o/ happy holidays Ilias! see u next year :D [16:55:20] Happy new year all! [17:15:49] (03CR) 10Kevin Bazira: [V: 03+2 C: 03+2] test: add load test script and input for article-descriptions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/985127 (https://phabricator.wikimedia.org/T353952) (owner: 10Kevin Bazira) [18:54:58] 10Machine-Learning-Team, 10Moderator-Tools-Team, 10Research, 10Temporary accounts, 10Trust and Safety Product Team: RevertRisk model readiness for temporary accounts - https://phabricator.wikimedia.org/T352839 (10diego) Ok! I understand. Currently, Revert Risk uses several [[ https://gitlab.wikimedia.or... [22:59:40] 10Machine-Learning-Team, 10artificial-intelligence: LLM that specializes in assisting Wikimedia/MediaWiki technical contributors - https://phabricator.wikimedia.org/T353974 (10Novem_Linguae)