[03:22:40] (ErrorBudgetBurn) firing: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [06:59:37] Good morning! [07:22:40] (ErrorBudgetBurn) firing: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [07:43:21] 10Machine-Learning-Team: Deploy ctranslate2 version of nllb-200 - https://phabricator.wikimedia.org/T351740 (10isarantopoulos) The resources used by the pod have been increased to 4 cpus and a request with the above size takes ~5s while a request with the size of a sentence ~1s. [08:05:16] hello folks :) [08:05:34] I have removed the pyrra pilot but clearly some stuff is still registered [08:09:32] z/10 [08:09:35] err :) [08:20:03] isaranto: o/ I removed the 004 and 003 revisions for nllb-gpu in staging [08:20:08] 005 is now up [08:31:10] Hey,thanks! [09:03:05] 10Machine-Learning-Team: Optimize response performance for the article-descriptions model-server - https://phabricator.wikimedia.org/T353127 (10kevinbazira) @Isaac, bumping the RAM does not change much as shown in the request below that was run on 1 CPU and 8GB memory, it returns results similar to 1 CPU and 4GB... [09:18:56] folks I am doing some tests with RR-agnostic in staging [09:27:40] ok! slo related or sth else? (just asking out of curiosity) [09:29:07] I am trying to figure out if we can specify stuff like "en.wikipedia.org" in http connections without the need of explicitly stating api-ro.discovery.wmnet [09:29:25] I recall that at the time I didn't find a way, but I always promised to recheck [09:29:31] it would simplify a lot things for us [09:32:26] because at the time we didn't use plain HTTP connections within our model server [09:36:23] ack [09:37:41] u mean making external requests from within the pod? [09:40:06] basically we now have a istio virtual service that is able to read HTTP Host headers like "en.wikipedia.org", proxying them to api-ro.discovery.wmnet:443 via TLS [09:40:16] but we don't use it, we specific api-ro directly [09:43:24] ok [09:46:37] I found a config that works but it takes 4 minutes in preprocess [09:46:38] ahahahah [09:46:42] not sure what's happening [09:53:28] "I don't mind waiting" [09:53:32] :D [09:55:29] 10Machine-Learning-Team, 10Moderator-Tools-Team, 10Research, 10Temporary accounts, 10Trust and Safety Product Team: RevertRisk model readiness for temporary accounts - https://phabricator.wikimedia.org/T352839 (10kostajh) I guess this may be more of a question for Research team, cc @diego [10:24:40] morning! [10:24:51] o/ [10:36:49] \o [10:45:47] (03PS1) 10Ilias Sarantopoulos: enable revertrisk [extensions/ORES] - 10https://gerrit.wikimedia.org/r/983681 [10:46:12] isaranto: sorry do you need revert risk in staging up and running to test stuff? [10:46:15] forgot to ask [10:47:02] no go ahead and do whatever! I'm deploying to patchdemo which can only contact API GW so I'm using the production one [10:47:34] (03PS2) 10Ilias Sarantopoulos: enable revertrisk [extensions/ORES] - 10https://gerrit.wikimedia.org/r/983681 (https://phabricator.wikimedia.org/T348298) [10:48:40] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Growth-Team, 10Wikipedia-Android-App-Backlog, and 2 others: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298 (10PatchDemoBot) Test wiki on [[ https://patchdemo.wmflabs.org | Patch demo ]] by ISaran... [10:48:59] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.41-notes (1.41.0-wmf.22; 2023-08-15), 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10PatchDemoBot) Test wiki on [[ https://patchdemo.wmflabs.org | Patch demo ]] by ISarant... [10:49:13] 10Machine-Learning-Team, 10MW-1.41-notes (1.41.0-wmf.25; 2023-09-05): fiwiki RC filters classify all edits as 'very likely bad faith' - https://phabricator.wikimedia.org/T343308 (10PatchDemoBot) Test wiki on [[ https://patchdemo.wmflabs.org | Patch demo ]] by ISarantopoulos-WMF using patch(es) linked to this... [10:50:30] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Growth-Team, 10Wikipedia-Android-App-Backlog, and 2 others: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298 (10PatchDemoBot) Test wiki **created** on [[ https://patchdemo.wmflabs.org | Patch demo ]... [11:13:05] ok so I found some interesting details [11:13:16] the short answer is that we cannot easily do it now sigh [11:13:34] it would require some invasive changes [11:14:21] isaranto: I started checking this to help with the http redirect issue for revert risk, I noticed that the redirection happens in mwapi async right? Not super easy to fix [11:22:41] (ErrorBudgetBurn) firing: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [11:23:44] 10Machine-Learning-Team: Improve Istio's mesh traffic transparent proxy capabilities for external domains accessed by Lift Wing - https://phabricator.wikimedia.org/T353622 (10elukey) [11:24:21] yes that's what I remember [11:24:53] okok then is more difficult that anticipated [11:25:04] we'd need to operate in mwapi's code [11:27:55] as stop-gap we can go ahead with the map, sorry I didn't get it was that deep into our code [11:29:37] unless there is a trick to make it work on our side [11:29:49] but it is KI that calls mwapi [11:29:54] I'll check a bit later today [11:36:46] 10Machine-Learning-Team: Improve Istio's mesh traffic transparent proxy capabilities for external domains accessed by Lift Wing - https://phabricator.wikimedia.org/T353622 (10elukey) A first step/test could be to modify the KServe's storage initializer Docker image with uuid 1337, deploy it to staging and see if... [11:44:55] I mean I'll check if it's doable and I can open a patch to KI [11:47:55] I'm trying to figure out the issue with RR in ores extension without success so far [12:01:14] Goood morning all from a rainy SF [12:09:04] morning! [12:12:02] * elukey lunch! [12:13:37] Good morning Chris! [12:19:22] * aiko lunch 2 [12:27:50] * isaranto lunch 3 [13:37:53] What is more difficult than we anticipated? [13:40:42] 10Machine-Learning-Team: Optimize response performance for the article-descriptions model-server - https://phabricator.wikimedia.org/T353127 (10kevinbazira) I have explored dynamic quantization on the ML sandbox and this almost halved the response time we had in T353127#9406015: | **CPUs** | **Response Time** |... [13:40:46] something to do with how the database table is initialized. Im running it locally now but will ask for help if I don't figure it out [13:42:49] isaranto: o/ [13:42:50] I have found a way to do quantization in the article-descriptions model-server and the 8CPU response time has been brought down from 0m3.707s to 0m1.859s: https://phabricator.wikimedia.org/T353127#9412764 [13:43:40] kevinbazira: nice work! 👏 [13:44:49] did you use https://pytorch.org/docs/stable/generated/torch.ao.quantization.quantize_dynamic.html or something else? [13:45:36] yes, that's exactly what I used. [14:10:58] Awesome! Nice Kevin [14:21:20] 10Machine-Learning-Team: Optimize response performance for the article-descriptions model-server - https://phabricator.wikimedia.org/T353127 (10Isaac) @kevinbazira thanks for checking on the RAM and these other experiments! > bumping the RAM does not change much Bummer but in some ways I'm glad the RAM usage is... [15:01:29] 10Machine-Learning-Team, 10Moderator-Tools-Team, 10Research, 10Temporary accounts, 10Trust and Safety Product Team: RevertRisk model readiness for temporary accounts - https://phabricator.wikimedia.org/T352839 (10diego) Hi @kostajh , I'm not sure if I'm understanding the question. Are you proposing to ad... [15:21:25] 10Machine-Learning-Team: Optimize response performance for the article-descriptions model-server - https://phabricator.wikimedia.org/T353127 (10elukey) > It looks like you were able to get it to the ~2sec range too which is great to see and brings us to parity with the Cloud VPS instance at least. I am very cu... [15:21:32] kevinbazira: o/ --^ [15:22:41] (ErrorBudgetBurn) firing: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [15:23:19] One thing that we may need to set in our guidelines is to agree on what is the testing set, it looks like your work is way closer than what Research currently run in VPS [15:23:31] *to what [15:24:05] I manually bumped article-descriptions' cpus in staging to match the 8 assigned to the cloud vps instance [15:24:26] I don't see the sub-2s performance from the cloud VPS API though [15:34:06] elukey: o/ [15:36:31] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Investigate increase p99 latencies in ml-serve-eqiad - https://phabricator.wikimedia.org/T352958 (10elukey) I opened T353622 to track the work to have a more flexible Istio mesh setup (like not having to state api-ro.discovery.wmnet in our code explic... [15:39:47] yes, I agree. a step in our process should be to benchmark with a defined existing (cloudVPS or any other) prototype API endpoint request vs LiftWing model-servers to conclude that parity has been met. [15:43:33] definitely, otherwise we keep chasing performance goals and are not met elsewhere.. [15:44:23] it would be also nice to figure out a good way to push back after some point, it feels that we should provide some help in optimizing but if a model doesn't perform as expected it may need more work from the requestor [16:00:28] really interesting https://arxiv.org/pdf/2312.09993.pdf [16:00:52] chrisalbon: this is the second time I see the Leonardo HPC mentioned, the first time was for the Mistral 7B model [16:01:13] they have a ton of Nvidia GPUs deployed and anybody can apply to use it [16:01:39] https://leonardo-supercomputer.cineca.eu/hpc-system/ [16:02:08] innnnnteresting [16:02:23] I can reach out and see, good find [16:02:23] I am not sure if in the future we'll want to fine-train a LLM, but it could be something to keep in mind [16:02:37] the super computer is in Bologna btw :) [16:02:52] I think 100% we will want to fine tune an LLM [16:02:58] many LLMS [16:16:33] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Growth-Team, 10Wikipedia-Android-App-Backlog, and 2 others: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298 (10isarantopoulos) Turns out the ores extension was long broken on the beta cluster. I ma... [16:18:11] elukey, ottomata: o/ mw config for mediawiki.page_prediction_change.rc0 is deployed. can anyone help with restarting eventgate main's pods? [16:20:31] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Growth-Team, 10Wikipedia-Android-App-Backlog, and 2 others: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298 (10isarantopoulos) I noticed the order of the filters isn't as I expected it to be (Rever... [16:21:41] 10Machine-Learning-Team, 10revscoring: tag being wrongly counted as a ref tag - https://phabricator.wikimedia.org/T353661 (10Gabinaluz) [16:22:23] aiko: sure! lemme check [16:28:26] aiko: done! [16:32:59] thanks!!! :) [16:36:11] 10Machine-Learning-Team, 10ORES: Add deprecation warnings to ORES-related repositories on Github - https://phabricator.wikimedia.org/T349632 (10isarantopoulos) The PR for articlequality has been added as well https://github.com/wikimedia/articlequality/pull/176 [16:39:42] o/ Whenever someone has time I'll need a review to merge the deprecation warning on GH https://github.com/wikimedia/articlequality/pull/176 [16:47:07] 10Machine-Learning-Team, 10artificial-intelligence, 10revscoring: tag being wrongly counted as a ref tag - https://phabricator.wikimedia.org/T353661 (10isarantopoulos) Hi @Gabinaluz! Thanks for spotting the issue above! The Machine Learning team is offering limited support for these models. This... [16:52:25] isaranto: ready to go [16:53:12] grazie signore [16:53:27] haha, I bet it sounds really silly being that formal [17:20:08] 10Machine-Learning-Team: Optimize response performance for the article-descriptions model-server - https://phabricator.wikimedia.org/T353127 (10isarantopoulos) @Isaac Ctranslate2 supports specific models out of the box (nllb is one of them). Although mBART is supported and I was able to convert it, there are a l... [17:22:55] isaranto: yes :D [17:25:21] logging off folks, have a nice rest of day/evening! [17:31:13] o/ [17:31:35] night isaranto! [17:31:43] going afk in a bit as well o/ [17:34:03] aiko: elukey fwiw, you can also use a different eventgate if you prefer while you dev! eventgate-analytics-external does not require restarts, and messages only go to kafka jumbo (not kafka main) [17:34:22] just set destination_event_service accordinly [17:34:54] ack! [18:32:01] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Growth-Team, 10Wikipedia-Android-App-Backlog, and 2 others: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298 (10Ladsgroup) Just put it at the top of the array in `OresModels`, that should be enoughTM [19:22:41] (ErrorBudgetBurn) firing: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [19:23:37] Every time I see these I have a little heart attack [20:56:41] 10Machine-Learning-Team: Optimize response performance for the article-descriptions model-server - https://phabricator.wikimedia.org/T353127 (10Isaac) > Ctranslate2 supports specific models out of the box (nllb is one of them). Although mBART is supported and I was able to convert it, there are a lot of custom f... [23:22:41] (ErrorBudgetBurn) firing: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn