[06:59:40] Morning folks! [07:52:09] 10Machine-Learning-Team: Upgrade model servers to kserve 0.11.2 - https://phabricator.wikimedia.org/T351633 (10isarantopoulos) All revscoring models have been deployed on ml-staging. While running some load tests on ml-staging for enwiki-goodfaith I noticed some increased latencies compared to older load tests.... [07:52:20] hey! I'll be afk for ~1h [07:53:07] I ran some load tests and witnessed the above. I want to verify if there is an issue or not before proceeding further (it may be because of different inputs) [07:54:57] * isaranto afk be back in 1h [08:17:17] (03PS1) 10AikoChou: revert-risk: add batch_model.py and USE_BATCHER env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/977135 (https://phabricator.wikimedia.org/T348536) [09:15:18] hello folks [09:24:44] o/ [09:26:22] elukey: I saw some increased latencies in ml-staging for revscoring --^ [09:26:41] investigating if this is an issue or it was just because of different input [09:27:21] isaranto: o/ so from 0.11.1 to 0.11.2 ? [09:27:48] the CVE that we are fixing is related to HTTP/2, so some webserver-related go packages changed [09:27:57] but in theory we shouldn't observe anything impactful [09:27:59] hopefully [09:32:26] 10Machine-Learning-Team: Upgrade model servers to kserve 0.11.2 - https://phabricator.wikimedia.org/T351633 (10isarantopoulos) Running the same test on production gave the following results, so it doesnt seem to be such a big difference (although there is some) ` wrk -c 4 -t 2 --timeout 2s -s revscoring.lua http... [09:34:53] yes from 0.11.1 to 0.11.2. [09:34:53] I have some results saved from old load testing perhaps they were using different revids. I'm digging in phabricator a bit to verify [09:35:08] when I ran the same test on brought results were more similar to staging [09:41:17] I'm going to finish the other stuff that I'm supposed to do today and then jump on this again, otherwise I think it may take the rest of my day (investigating, running load tests etc) [09:42:00] elukey: but we can see if we can deploy langid model in order to check the SLO dashboards. would that work? [09:42:33] isaranto: I have only added the revscoring dashboards for the pilot, but I can wait, no problem [09:42:49] ok. I was about to ask that :) [09:47:55] Monring! [09:48:15] elukey: I'll take care of art-desc push (and possible RAM increase) [09:51:28] ack! [09:52:11] check also the capacity left in the staging workers, I think that we don't have a lot [09:52:15] unless we clean up a bit [09:53:06] Ack. [09:53:23] As predicted, 4Gi was not enough. Judging from the numbers I saw last time, trying qith 12Gi [09:53:30] (kubectl edit) [09:56:22] Ok, with 12Gi it's now running. Looking at the graphs, maybe 10Gi would have been enough, but I also don't want to cut it too close. [09:56:38] yes yes makes sense, I am curious to know if it is only needed for loading or not. Requests could be kept lower in case, with a higher limit [09:56:53] (plus, we don't know yet how much RAM it uses during operation) [09:57:11] (we'd also need to follow up with Isaac why we need such amount of RAM) [09:57:40] kevinbazira: article-descriptions runs in staging/experimental now, feel free to test whether it works correctly [09:58:12] It still has some RX network usage (~10Mi/s), not sure what that is about [09:59:31] klausman: thank you. let me check and also communicate to Isaac and Seddon to test this isvc. [09:59:32] Probably doenloading from S3, now that I think of it [10:05:28] kevinbazira: before pinging Isaac and Seddon we should run some basic tests and report what are the current performance penalties, so they know what to work on etc.. [10:05:51] sure sure [10:05:52] otherwise I think that they will only check if the results are consistent [10:05:54] okok [10:05:59] I think I know what the issue is. If it is what I'm thinking It is similar to what we had with other LLMs: [10:05:59] During model load a copy of the model is being created so we require 2x memory usage [10:06:32] using the HF memory calculator https://huggingface.co/spaces/hf-accelerate/model-memory-usage I added this on the model discussion https://huggingface.co/facebook/mbart-large-cc25/discussions/4 [10:08:25] kevinbazira: try to add low_cpu_mem_usage=True on from_pretrained as discussed here -> https://huggingface.co/docs/transformers/main_classes/model#large-model-loading [10:09:23] we went through the same process with the llm model servers https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/llm/model-server/model.py#45 [10:10:00] isaranto: thanks, I'll add the low_cpu_mem_usage parameter. [10:10:19] additionally iirc you may need to install accelerate as a requirement. Let me know if you need more information or have any further questions on that [10:11:10] model seems to be close to 5GB by itself + another 600MB for the bert model we use [10:19:02] I think in that case, 8Gi might be fine as a base size to experiment with? [10:21:36] the memory metrics in grafana should tell us what we'd need [10:24:39] aye [10:25:00] I just meant for the update patch to include the above changes for testing [11:07:41] ok, I run the command below and the article-descriptions model-serever is returning an error. are containers able to access wikidata on LiftWing or we have to explecitly set this the way we did for the rec-api? [11:07:41] ``` [11:07:42] $ time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-descriptions:predict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}' -H "Host: article-descriptions.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1 [11:07:42] {"error":"An error occurred while fetching info for title from the Wikidata API, please contact the ML-Team if the issue persists."} [11:07:42] real 2m10.064s [11:07:42] user 0m0.017s [11:07:42] sys 0m0.000s [11:07:43] ``` [11:10:45] ahem we discussed this yesterday during the hack time :) [11:17:05] ahyes, external calls [11:19:24] Mh, but the istio-upgrade-to-https stuff is not isvc-specific, is it? [11:20:24] exactly, it is mesh wide [11:20:35] And wikidata is in helmfile.d/admin_ng/values/ml-serve.yaml [11:21:04] it is in the virtual service's host headers [11:21:14] but the service allowed is api-ro-discovery.wmnet [11:21:41] kevinbazira: does the isvc contact localhost and set the right Host: header? [11:23:01] It should have the right WIKI_UL according to helmfile.d/ml-services/experimental/values-ml-staging-codfw.yaml [11:23:06] WIKI_URL* [11:23:37] (the isvcs don't contact localhost:port, those are the serviceops-like services) [11:23:45] sorry, my bad. [11:23:47] (like ores-legacy and rec-api-ng) [11:23:57] yes yes I am just adding suggestions here and there :) [11:24:13] elukey it uses `https://wikidata.org` here: https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/article-descriptions/model-server/model.py#L187 [11:24:21] as I said we need to work on real use cases to have these things resinate properly [11:25:08] ok so this needs another bit of information [11:25:18] in theory, the istio mesh should be able to transparently handle this use case [11:25:41] so you use wikidata.org directly, and behind the scenes the proxy contacts api-ro [11:25:49] but I was never able to make it work [11:26:14] what works is setting api-ro.discovery.wmnet and wikidata.org as host header [11:26:18] at least now [11:26:34] ok, let me do that [11:26:38] From the logs" [11:26:40] we could try to experiment to add a specific rule for wikidata [11:26:40] mwapi.errors.ConnectionError: Cannot connect to host wikidata.org:443 ssl:default [Connection reset by peer] [11:27:39] yep this makes sense, since the services entries in ml-serve.yaml (the services allowed to be contacted by the mesh) do not list wikidata.org [11:27:56] but only api-ro.discovery.wmnet [11:28:24] the virtual service (basically the proxy config for envoy) lists wikidata among the host headers that are recognized when calling api-ro [11:28:29] ack. My main worry with experimenting with being able to contact wikidata "directly" is that we might break the existing uses of it in the revscoring isvcs [11:28:49] in theory no, we don't use wikidata.org directly anywhere [11:29:16] So we would allow both, wikidata via api-ro, and "directly" [11:30:10] In essenca another virtual service like api-ro and eventgate, which only allows the wikidata host header patterns, and then wiring up like the other two [11:30:35] we'd also need another destination rule and service entry [11:30:46] yes, that's what I mean by "wire up" [11:31:09] The network-level (IP level) stuff might already be covered [11:32:51] it should yes, since envoy should contact api-ro.discovery.wment in any case [11:33:27] at the time (a long time ago) I recall that I struggled with it, so I ended up using api-ro + host-header [11:33:44] but we could try to experiment now and see if it works [11:33:54] I mean, it is somewhat simpler config-wise, but more complex on the isvc code side [11:33:54] then we eval what's best and choose [11:34:50] yep yep it is always a trade-off :) [11:35:00] anyway, I am available as consultant [11:35:05] You can't get rid of complexity, just hide it somewhere else :) [11:36:50] I'd be willing to give the wikidata-direct approach a try, but maybe not on a Friday [11:41:29] klausman: we have ml-staging-codfw's value.yaml specific, so we could copy/modify in there the net_istio bits [11:41:48] Mh, good point [11:41:50] or create them manually via kubectl apply for example [11:42:09] this is the workflow to follow independently of the day of the week [11:44:01] I'm always a bit hesitant to work on the open heart, as it wear [11:45:01] but I gotta run to my doc's appt (and later lunch), bbiab [11:51:09] 10Machine-Learning-Team: Upgrade model servers to kserve 0.11.2 - https://phabricator.wikimedia.org/T351633 (10isarantopoulos) I see a drop in performance (not as big as the one we experienced with other servers). This is for articlequality. On top we have the new version in staging and in the bottom the current... [11:51:29] elukey: I'm holding off production deployment till monday if that's ok with you until I figure out what the issue is and if there is an issue [11:52:51] I'm also thinking that it would be a good idea to commit load test results in the inference-services repo in separate markdown files for each service. That way we will have a single source of truth for the latencies (recorded alongside specific inputs) and know what to test [11:53:15] pretty much like a runbook: run XYZ commands, get results etc [11:53:29] I'll add a task and we can discuss it if needed [11:53:35] * isaranto going afk for lunch [12:39:07] * elukey lunch! [13:18:47] (03PS1) 10Kevin Bazira: article-descriptions: update wikidata host header in model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/976965 (https://phabricator.wikimedia.org/T343123) [13:21:58] (03PS1) 10Kevin Bazira: article-descriptions: update wiki host headers in model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/977226 (https://phabricator.wikimedia.org/T343123) [13:56:23] folks I created https://wikitech.wikimedia.org/wiki/Analytics/Cluster/AMD_GPU#Do_we_have_Nvidia_GPUs_available? [13:56:48] sorry renamed to https://wikitech.wikimedia.org/wiki/Analytics/Cluster/AMD_GPU#Do_we_have_Nvidia_GPUs? [13:57:06] hopefully it summarize what we discussed, and it should be a baseline for further improvements [14:04:40] nicely put. Thanks for this. it puts in writing what we have been discussing here and there and it is a nice reference to point to (and reassess if needed) [14:07:21] (I am adding a few more things) [14:13:40] done :) [14:29:42] 10Machine-Learning-Team, 10serviceops: Bump istio Docker images to Bookworm - https://phabricator.wikimedia.org/T351933 (10elukey) [14:44:24] bookworm! 😅 [14:49:44] 10Machine-Learning-Team, 10serviceops: Bump istio Docker images to Bookworm - https://phabricator.wikimedia.org/T351933 (10elukey) Tried to build with golang 1.21 and got: ` /go/pkg/mod/github.com/lucas-clemente/quic-go@v0.28.0/internal/qtls/go120.go:6:13: cannot use "The version of quic-go you're using can't... [14:54:50] isaranto: sigh --^ [14:56:12] I'm pretty sure any bookworm upgrade is going to be exciting! [14:56:41] just checked and python version is 3.11 over there... [14:56:58] that is great, it should give us some benefit [14:57:00] in theory [14:57:28] yeah it is nice but upgrading the servers will be interesting. especially if we have to upgrade revscoring [14:59:05] bullseye EOL is 7/2024 so we'll need to plan early next year [15:06:13] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "LGTM BUT:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/977226 (https://phabricator.wikimedia.org/T343123) (owner: 10Kevin Bazira) [15:07:48] isaranto: about --^ in theory if you use WIKI_URL=https://wikidata.org it should work outside LW no? [15:08:02] (basically what we do in other isvcs) [15:10:12] iiuc a call to wikidata is made and then a call to wikipedia as well as part of the same request [15:10:39] so one of the requests will fail as it will have the wrong url [15:10:48] ahhh now I get it, you mean wiki_url would override both [15:10:50] yes yes [15:10:56] let's not merge it then [15:11:16] your proposal sounds better in this case [15:11:29] we have a similar thing in revertrisk but there we only access one host [15:11:38] yep yep [15:11:46] so what we were discussing earlier on with Tobias may help [15:11:51] we can merge it and solve this issue in the other patch, as soon as we reach an agreement [15:12:00] if we find a way for Istio to proxy "wikidata.org" directly, we'll be ok [15:12:07] no need for WIKI_URL [15:12:17] (since it would be transparently proxied to api-ro on LW) [15:12:59] I followed the conversation earlier but I have a question: shouldn't we have the same failure in revertrisk? [15:13:21] it is calling www.wikidata.org (www is the only difference afaik) [15:13:48] kevinbazira: --^ [15:14:44] isaranto: not if WIKI_URL is provided, in case it uses it [15:14:46] (03CR) 10Kevin Bazira: [C: 03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/977226 (https://phabricator.wikimedia.org/T343123) (owner: 10Kevin Bazira) [15:15:11] it could be a solution for this use case, running it locally I mean [15:15:22] if WIKI_URL is not provided (so not mandatory) one can use the fqdn [15:15:29] Aiko already thought about this use case :) [15:15:32] (03Merged) 10jenkins-bot: article-descriptions: update wiki host headers in model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/977226 (https://phabricator.wikimedia.org/T343123) (owner: 10Kevin Bazira) [15:15:59] (03PS15) 10Ilias Sarantopoulos: add revertrisk model to the list of models [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) [15:17:38] (03CR) 10CI reject: [V: 04-1] add revertrisk model to the list of models [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos) [15:19:35] would fqdn work? this is what I'm wondering I guess it would otherwise we would have issues in revertrisk [15:20:41] locally? Yes [15:20:56] wikidata has its own api, but behind the scenes is mapped to the mw api servers [15:21:09] (maybe I am missing the use case) [15:23:35] (03CR) 10Ilias Sarantopoulos: add revertrisk model to the list of models (0313 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos) [15:24:33] (03PS16) 10Ilias Sarantopoulos: add revertrisk model to the list of models [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) [15:25:13] isaranto: IIUC in the revert-risk model-server, `www.wikidata.org` is set as `mw_host` by the get_mediawiki_host() method when the `api-ro.discovery.wmnet` has not been set as shown here: [15:25:13] https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/18327eecd3823f2eb6b921ecb456aeee60aea235/revert-risk-model/model-server/model.py#L75-L78 [15:26:31] in essence the revert-risk model-server ends up using `api-ro.discovery.wmnet` when running on LiftWing. [15:26:33] kostajh: Thanks a lot for taking the time to review my changes <3. I updated the ores-extension patch. I've left the comments/issues open so you can review if they are resolved or not. If you want me to resolve trivial ones lemme know (here or DM anyway you prefer) or if you want me to clarify anything else ofc [15:26:40] (03CR) 10CI reject: [V: 04-1] add revertrisk model to the list of models [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) (owner: 10Ilias Sarantopoulos) [15:27:02] isaranto: thanks! will try to have a look on Monday [15:27:04] kevinbazira: we should do the same on article-description, not urgent but a follow up is worth [15:27:17] kostajh: anytime! not in a hurry [15:27:18] why doesn't CI love me on a friday afternoon? [15:27:46] elukey: sure sure [15:27:47] it is like asking why puppet should do what you have in mind on a friday [15:27:52] I stopped trying [15:27:54] :D [15:28:56] (03PS17) 10Ilias Sarantopoulos: add revertrisk model to the list of models [extensions/ORES] - 10https://gerrit.wikimedia.org/r/971547 (https://phabricator.wikimedia.org/T348298) [15:35:32] I'm updating my article-desc patch now.. [15:35:53] so what shall I do, include a similar solution as aiko's or two env vars? [15:36:22] I'd vote for Aiko's [15:37:37] leaving earlier today folks, have a nice rest of the day and weekend! <3 [15:37:54] ciao, have a nice weekend! [15:51:03] kevinbazira: do you have a sample request for article-desc I can use? [15:53:55] nevermind silly question! [15:57:09] isaranto: it's not silly. here is the request I've been using: [15:57:09] ``` [15:57:09] {"lang": "en", "title": "Clandonald", "num_beams": 2} [15:57:09] ``` [15:57:57] Thank you! I said it was silly cause I could jsut pickup any article but better to verify this as well [15:58:17] sure sure :) [16:13:01] (03PS2) 10Ilias Sarantopoulos: revert-risk: add batch_model.py and USE_BATCHER env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/977135 (https://phabricator.wikimedia.org/T348536) (owner: 10AikoChou) [16:52:30] (03CR) 10Ilias Sarantopoulos: revert-risk: add batch_model.py and USE_BATCHER env var (034 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/977135 (https://phabricator.wikimedia.org/T348536) (owner: 10AikoChou) [16:53:45] aiko: I left some thoughts on the BATCHER patch. Don't know if they help but I'm happy to discuss more around it on Monday. [16:54:27] I also had this wild thought that the revertrisk class could directly extend the KI_module but didn't dive into it to see if it is feasible [17:00:26] 10Machine-Learning-Team, 10Observability-Alerting: Lift Wing alerting - https://phabricator.wikimedia.org/T346151 (10isarantopoulos) We have 2 alerts related to Lift Wing at the moment - ORESFetchScoreJobKafkaLag: when this fires it means that there is a lag between messages landing in Kafka topics and message... [17:00:34] 10Machine-Learning-Team, 10Observability-Alerting: Lift Wing alerting - https://phabricator.wikimedia.org/T346151 (10isarantopoulos) 05In progress→03Resolved [17:03:42] 10Machine-Learning-Team: Document load test results - https://phabricator.wikimedia.org/T351939 (10isarantopoulos) [17:04:30] 10Machine-Learning-Team: Enable local runs for article-descriptions model - https://phabricator.wikimedia.org/T351940 (10isarantopoulos) [17:07:19] I added some tasks in unsorted so I don't forget them (I'll make sure to enrich their descriptions) [17:08:16] I couldn't get article-desc to run locally yet. Model server starts but I'm having some issues with making requests (async requests fail). will continue monday [17:08:25] going afk as well have a nice weekend! [19:20:40] isaranto: o/ thanks for reviewing the batcher patch. It was definitely helpful. I would love to discuss more on Monday. (will be in Germany then :) [19:21:32] Have a safe trip back! [19:24:24] Have a wonderful weekend! :) [21:04:45] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10Chinese-Sites, 10CommRel-Specialists-Support (Oct-Dec-2023): Support languages whose add-a-link models were not published - https://phabricator.wikimedia.org/T309263 (10RZamora-WMF)