[06:37:32] (03CR) 10Elukey: revert-risk: handle unsupported edit types for wikidata model (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924912 (https://phabricator.wikimedia.org/T333125) (owner: 10AikoChou) [07:10:40] hello folks [08:03:55] FYI, installing containerd security updates on ml-serve* (already running on 2001) [08:07:55] moritzm: Will the install trigger a restart? Not asking because of disruption, but rather if any action is needed from us afterward. [08:07:59] Also: g'mornin' [08:09:08] not needed, this doesn't affect running containers, this only affects new ones being started [08:09:18] ack, thx! [08:23:55] need to run some errands, back i na bit! [08:25:07] klausman: o/ can you open today the task for the api-gateway rate limits? [08:25:16] so we can start thinking about options etc. [08:25:35] Yep! [08:25:38] super [08:36:55] 10Machine-Learning-Team: Investigate ad-hoc traffic class for API GW rate limits applied to Inference services as used by WME - https://phabricator.wikimedia.org/T338121 (10klausman) [08:38:34] 10Machine-Learning-Team: Investigate ad-hoc traffic class for API GW rate limits applied to Inference services as used by WME - https://phabricator.wikimedia.org/T338121 (10klausman) [08:43:15] (03CR) 10Kevin Bazira: feat: reduce llm memory footprint (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/926507 (https://phabricator.wikimedia.org/T333861) (owner: 10Ilias Sarantopoulos) [10:07:11] <- lunch and groceries [12:40:16] 10Machine-Learning-Team, 10API Platform: Investigate ad-hoc traffic class for API GW rate limits applied to Inference services as used by WME - https://phabricator.wikimedia.org/T338121 (10elukey) [12:45:23] 10Machine-Learning-Team, 10API Platform: Investigate ad-hoc traffic class for API GW rate limits applied to Inference services as used by WME - https://phabricator.wikimedia.org/T338121 (10elukey) I think that we should try to figure out, form the API Platform perspective, what is be best road to follow, since... [13:07:31] klausman: what next steps do you have in mind for --^ ? [13:07:50] I am poking Hugh to find out what's involved in the ad-hoc class approach [13:07:52] I think that it is critical to find a solution, otherwise people can't migrate [13:08:49] let's try to find on phabricator if there are previous tasks in which people asked for different tiers/classes [13:10:13] also let's ask in the api-platform slack channel, not only Hugh [13:10:25] He's on PTO 'til the 12th [13:10:29] Bill Pirkle may be able to give us some guidance [13:10:42] (Hugh, that is) [13:10:51] sure, do you have time to follow up on slack? [13:10:56] Yep [13:17:15] so the tiers are defined in wmf-config/CommonSettings.php [13:17:22] maybe we could just add a class in there [13:18:22] as defined in https://phabricator.wikimedia.org/T246271 [13:19:26] 10Machine-Learning-Team, 10API Platform: Investigate ad-hoc traffic class for API GW rate limits applied to Inference services as used by WME - https://phabricator.wikimedia.org/T338121 (10elukey) I see that rate limits tiers were defined in T246271, and they live in `wmf-config/CommonSettings.php`. Should we... [13:28:35] Question is how those tiers are tied to services [13:29:48] judging from https://phabricator.wikimedia.org/T246271 it seems that they were created for the MVP 3 years ago [13:30:44] THm. It seems those classes are applied unconditionally per-service, so Preferred==25k is independent of the service being called. [13:31:07] The class of a non-anon client is determined by their token (and can be changed on mwmaint). [13:31:40] SO I am not quite sure how to make a new class that is "Like preferred, but the limit is X" [13:31:48] But _only_ on LW services [13:36:01] probably we could have another class for 200k req/hour, calling it in another way ("internal-high"?) and use it [13:37:51] Do we atm use the existing classes at all? [13:37:55] I mean, by name [13:38:21] I didn't get the question [13:39:13] So in helmfile.d/services/api-gateway/values.yaml we refer to the limit for anon users as 10k. [13:39:20] by using `anon_limit` [13:39:44] Hrm. Let me turn this over in my head. [13:39:55] this is the confusing bit, those are API-Gateway's limits [13:39:59] not MW ones [13:40:08] separate buckets [13:40:11] So your proposal is to have WME generate a new token, tell us it's audience ID, and then move it to this new 200k tier? [13:40:34] (audience ID is what you use on mwmaint) [13:41:19] yes exactly [13:42:24] Yeah, that should work. Who is "custodian" of `wmf-config/CommonSettings.php`? [13:44:36] a lot of folks, I think that api-platform should probably give us the green light first [13:44:54] ack. I'll explain this plan to Bill, see what he has to say [13:46:29] let's do it on slack so it is visible [13:46:45] I have been doing so in a PM, oops [13:47:16] I'll c7P [13:47:20] er c&p [13:49:16] super [13:55:20] dcops is moving two gpus to ml-serve1001 [13:55:28] so we'll be able to test LLMs with AMD gpus soon [13:55:29] What time window? [13:55:36] now [13:55:57] Neato. Do we have to add silences/downtimes? [13:56:10] already drained, downtimed and turned off [13:56:27] Excellent [14:06:25] klausman: ok to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/927197 ? [14:07:17] Wait, so DSE would have 0 GPUs? [14:07:45] aj, you moved it to per-host files [14:36:50] aiko: nice work on the changeprop change! Hugh is out of the office, so I added Kamila (the other SRE in api-platform).. After their review we can merge and deploy :) [14:37:17] elukey: 200k, right? Or do we want to add more margin? [14:37:39] klausman: I'd say 250k, and then we set 200 at the api gateway level (so we have some margin) [14:37:47] ack, will do [14:38:00] back in a bit [14:53:10] (03PS3) 10AikoChou: revert-risk: handle unsupported edit types for wikidata model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924912 (https://phabricator.wikimedia.org/T333125) [15:15:22] (03CR) 10AikoChou: revert-risk: handle unsupported edit types for wikidata model (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924912 (https://phabricator.wikimedia.org/T333125) (owner: 10AikoChou) [15:16:40] elukey: yay! thanks for the help :D [15:44:56] 10Machine-Learning-Team, 10Epic: Experiment with GPUs in the Machine Learning infrastructure - https://phabricator.wikimedia.org/T333462 (10elukey) [15:46:28] ok gpus on ml-serve1001 are working! [15:48:35] but of course we have LLMs only on staging.. [15:49:29] should have thought about it [15:49:34] uffff [15:49:52] (but ml-staging is in codfw so no opportunity to move gpus in there, right) [15:52:07] 10Machine-Learning-Team, 10Spike: [Spike] Run models and frameworks on AMD GPU and identify challenges - https://phabricator.wikimedia.org/T334583 (10elukey) @isarantopoulos we have two GPUs on ml-serve1001, ready to be tested :) We can probably think about rolling out again the experimental namespace in prod... [15:52:32] at this point we may need to have exprimental in production (again), so that we'll be able to experiment in there too [15:52:35] thoughts? [15:56:27] Fine by me [15:56:49] as long as we only use it for things that can't be tested in staging-exp [15:58:47] yep yep [15:58:49] filed https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/927235 [15:59:41] in theory the diff should look ok [15:59:59] LGTM [16:00:12] Do we need private repo stuff for this as well? [16:03:57] yep, already there, we didn't remove it from before [16:05:28] ah, excellent :) [16:08:11] deploying bloom models to ml-serve-eqiad [16:13:41] ack [16:23:42] both bloom models are now running on ml-serve-eqiad [16:24:05] tomorrow I'll talk to Ilias to run them on a GPU, and see performance differences (hopefully) [16:24:49] sgtm [16:25:05] going afk, have a nice rest of the day folks [16:26:19] 10Machine-Learning-Team, 10Spike: [Spike] Run models and frameworks on AMD GPU and identify challenges - https://phabricator.wikimedia.org/T334583 (10elukey) After a chat with Tobias we thought to restore the experimental namespace in prod, but limited to bloom/llm models for now (we can do it with the new tem... [16:34:50] \o same here