[02:24:46] Good morning! :) [02:50:51] wait. what time is it?!! [04:54:58] 10Machine-Learning-Team, 10Research: Upgrade xgboost in knowledge_integrity - https://phabricator.wikimedia.org/T350389 (10fkaelin) @isarantopoulos the python 3.8 is merged, and here is the [[ https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/merge_requests/27 | MR ]] for the xgboost bump. [04:59:15] aiko: o/ I was about to ask the same :) [06:59:51] best exchange ever :D [06:59:54] good morning :D [07:57:43] morning! This way we have 24h coverage 😛 [09:18:11] (03CR) 10Ilias Sarantopoulos: "Nice work! I added a couple of comments mainly around code readability and maintenance. Feel free to implement whatever you like from the " [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/970831 (https://phabricator.wikimedia.org/T343123) (owner: 10Kevin Bazira) [10:10:29] isaranto: o/ one thing that we discussed on Tuesday when you were not online - I am not 100% sure what is the state of Kserve 0.11 + bullseye upgrade in production, I think that we are half way through its completion [10:10:57] IIRC we didn't finish the work before the offsite, and it may be a problem if we have to do other rollouts [10:11:29] (there were also performance concerns for xgboost-based models) [10:11:44] do you have time to check the state of production? [10:13:11] otherwise I can try to track down and see what is needed [10:18:28] 10Machine-Learning-Team, 10artificial-intelligence, 10Bad-Words-Detection-System, 10revscoring: Add language support for Malay language (ms) - https://phabricator.wikimedia.org/T349968 (10elukey) @Hakimi97 Hi! The ML team doesn't plan to support/expand revscoring-based models in the future, we and the Rese... [10:22:34] elukey: you're right. I'll do the checks to report the status. iirc the servers we haven't done are the open tasks related to xgboost and catboost models. I'll report back once I check [10:23:33] (03PS1) 10Elukey: Update KServe model servers from 0.11.1 to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/974944 [10:24:07] isaranto: we should figure out if we can stabilize perf regressions and then complete the rollout [10:24:25] I also filed a change to upgrade to 0.11.2, it contains some CVE fixes :( [10:24:32] should be a light upgrade though [10:26:28] (03CR) 10CI reject: [V: 04-1] Update KServe model servers from 0.11.1 to 0.11.2 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/974944 (owner: 10Elukey) [10:33:52] ahhh they didn't publish 0.11.2 on pypi [10:34:37] sigh [10:35:05] (03CR) 10Elukey: "Pypi doesn't have the latest version yet, sigh." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/974944 (owner: 10Elukey) [10:36:40] also 0.12 is going to be out in dec: https://github.com/kserve/kserve/issues/3195 [10:37:04] at this point we can proceed with 0.11.1 and then upgrade to 0.11.2 later on [10:37:06] One day, we'll catch up :) [10:44:58] 10Machine-Learning-Team, 10artificial-intelligence, 10Bad-Words-Detection-System, 10revscoring: Add language support for Malay language (ms) - https://phabricator.wikimedia.org/T349968 (10Hakimi97) >>! In T349968#9336577, @elukey wrote: > @Hakimi97 Hi! The ML team doesn't plan to support/expand revscoring-... [10:51:58] 10Machine-Learning-Team, 10artificial-intelligence, 10Bad-Words-Detection-System, 10revscoring: Add language support for Malay language (ms) - https://phabricator.wikimedia.org/T349968 (10elukey) @Hakimi97 in the following link you can find some examples: https://api.wikimedia.org/wiki/Lift_Wing_API/Refer... [10:55:35] the GH action that publishes to pypi failed on the kserve repo https://github.com/kserve/kserve/actions/runs/6878455485/job/18708429594 [10:56:34] lovely.. [10:56:41] do you mind to open a github issue? [11:01:15] on it! [11:02:00] yuzisun should be aware as he's manually triggered the re-runs of the action [11:02:11] super [11:07:15] https://github.com/kserve/kserve/issues/3252 [11:09:16] <3 [11:12:32] (03PS6) 10Kevin Bazira: article-descriptions: add article-descriptions model server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/970831 (https://phabricator.wikimedia.org/T343123) [11:22:47] (03CR) 10Kevin Bazira: article-descriptions: add article-descriptions model server (039 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/970831 (https://phabricator.wikimedia.org/T343123) (owner: 10Kevin Bazira) [11:32:49] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Sprint 3 (Growth Team)), 10User-notice: Deploy "add a link" to 15th round of wikis - https://phabricator.wikimedia.org/T308141 (10Sgs) >>! In T308141#9335723, @Quiddity wrote: > Just to confirm for Tech News purposes: Is this releasing next week //even th... [11:36:18] 10Machine-Learning-Team, 10Data-Engineering, 10Edit-Review-Improvements-Integrated-Filters, 10Growth-Team, and 2 others: Integration of Revert Risk Scores to Recent Changes as a filter - https://phabricator.wikimedia.org/T329071 (10kostajh) There is active work on this in {T348298} [11:46:33] * isaranto lunch! [11:50:44] 10Machine-Learning-Team, 10observability: Istio recording rules for Pyrra - https://phabricator.wikimedia.org/T351390 (10elukey) [11:51:17] no joy with slos :( --^ [12:06:40] * elukey lunch [12:58:50] Good morning all [13:15:03] 10Machine-Learning-Team, 10artificial-intelligence, 10Bad-Words-Detection-System, 10revscoring: Add language support for Malay language (ms) - https://phabricator.wikimedia.org/T349968 (10Hakimi97) @elukey I have tried using Python for both Revert Risk Language Agnostic and Multilingual Revert Risk models,... [13:20:58] ^^^ that feels so good to see [13:37:32] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Sprint 3 (Growth Team)), 10User-notice: Deploy "add a link" to 15th round of wikis - https://phabricator.wikimedia.org/T308141 (10Trizek-WMF) And (IIRC) when this code will be backported, activation is a config change, which is not impacted by the absence... [13:51:37] 10Machine-Learning-Team: Test the kserve batcher for Revert Risk LA isvc - https://phabricator.wikimedia.org/T348536 (10achou) [14:20:23] 10Machine-Learning-Team: Test the kserve batcher for Revert Risk LA isvc - https://phabricator.wikimedia.org/T348536 (10achou) Update: * When using a kserve batcher, the payload is expected to be `Request:{"instances": []}` and `Response:{"predictions": []}`. Therefore, I need to change the model server code to... [14:45:22] I checked the production state regaridn kserve 0.11. The board reflects the current status. Tasks referring to kserve upgrade that are in blocked have all code related changes set to updated but the production apps have not been synced. To avoid an accidental sync I suggest we just roll back/revert to the previous image until we have new versions of xgboost/catboost. There is also outlink task in "Ready to go" column for [14:45:23] which nothing has been done so it is not inconsistent [15:00:34] yes yes I agree, let's get to a stable state [15:01:22] there is also a mixup with the security updates for bullseye. So I'll open a patch in inf services first and set kserve to 0.10 again so that everything will be consistent and we can follow up again after the xgboost merge [15:10:08] (03PS1) 10Ilias Sarantopoulos: revert kserve upgrades [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/975008 (https://phabricator.wikimedia.org/T347551) [15:21:00] 10Machine-Learning-Team, 10observability: Istio recording rules for Pyrra - https://phabricator.wikimedia.org/T351390 (10herron) One option that comes to mind is relabeling with something like labelkeep to ingest only the labels we want/need on the prometheus side. That'd let us cut down without modifying the... [16:18:48] I created the patch to revert kserve upgrades for the aformentioned model servers [16:18:50] (03CR) 10AikoChou: [C: 03+1] revert kserve upgrades [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/975008 (https://phabricator.wikimedia.org/T347551) (owner: 10Ilias Sarantopoulos) [16:43:53] (03CR) 10Elukey: [C: 03+1] revert kserve upgrades [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/975008 (https://phabricator.wikimedia.org/T347551) (owner: 10Ilias Sarantopoulos) [16:50:59] (03CR) 10Ilias Sarantopoulos: [C: 03+2] revert kserve upgrades [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/975008 (https://phabricator.wikimedia.org/T347551) (owner: 10Ilias Sarantopoulos) [16:51:50] going afk folks, will deploy the above patch first thing in the morning (hope not too early this time 😜). cu tomorrow! [17:00:41] (03Merged) 10jenkins-bot: revert kserve upgrades [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/975008 (https://phabricator.wikimedia.org/T347551) (owner: 10Ilias Sarantopoulos) [17:01:20] (03PS7) 10Kevin Bazira: article-descriptions: add article-descriptions model server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/970831 (https://phabricator.wikimedia.org/T343123) [17:11:45] (03PS8) 10Kevin Bazira: article-descriptions: add article-descriptions model server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/970831 (https://phabricator.wikimedia.org/T343123) [17:13:47] kids dropped off at school! [17:34:58] i heard y'all are working on an LLM. where i can learn more about it? [18:54:41] lol hey ragesoss, right now we working on the engineering side of hosting. NLLB-200 is hosted right now with GPUs, but it is only accessible via the internal API because we only have GPUs in eqiad and the api gateway can't target a specific datacenter [18:55:18] But we already have budget for a good number of GPUs, they will go in both data centers and then we should be open for business so to speak [18:56:47] If NLLB-200 was hosted, it means you could basically translate whatever you wanted between 200 languages [18:57:22] s/hosted/put into production [18:57:28] technically it is already hosted [19:02:35] i'm especially interested in getting some intermediate things from it, like embeddings. not sure if it would be relevant for my project (an ML system to generate a list of articles in an arbitrary topic, without trying to replicate the not-super-flexible categorization we can get via WikiProjects or Categories), but maybe? [19:03:38] and also, just seeing the code for how you put it together and trained it. [21:52:12] 10Machine-Learning-Team, 10Research: Upgrade xgboost in knowledge_integrity - https://phabricator.wikimedia.org/T350389 (10MunizaA) Knowledge Integrity [[ https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/releases/v0.5.0 | v0.5.0 ]] has been released which now depends on `xgboost` 2.x. Upgradin... [21:54:26] 10Machine-Learning-Team: Increased latencies with Kserve 0.11.1 (cgroups v2) - https://phabricator.wikimedia.org/T349844 (10MunizaA) [21:54:28] 10Machine-Learning-Team, 10Research: Upgrade xgboost in knowledge_integrity - https://phabricator.wikimedia.org/T350389 (10MunizaA) 05Open→03Resolved