[07:20:28] 10Machine-Learning-Team, 10API Platform, 10Anti-Harassment, 10Cloud-Services, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10dcausse) [08:22:26] 10Lift-Wing, 10Machine-Learning-Team: Move Revert-risk language agnostic model from staging to production - https://phabricator.wikimedia.org/T332998 (10elukey) Thanks! Updated https://api.wikimedia.org/wiki/API_reference/Service/Lift_Wing/Get_reverted_risk_language_agnostic_prediction [08:26:50] 10Machine-Learning-Team, 10API Platform, 10Anti-Harassment, 10Cloud-Services, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10kostajh) [08:46:12] klausman: o/ - just for confirmation - all deployed right? [08:46:32] You mean the pod increase and GW limit bump? yep [08:46:46] super :) - if you have time, can you check if the autoscaling works etc..? [08:46:57] I'll give it a go :) [08:47:03] I am wondering if we can scale up to 15 pods, or even ten, without hitting limit ranges [08:47:06] thanks [09:10:47] Just deployed this monster -> https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/922512 (mostly watching, others deployed it :) ) [09:13:39] \o/ [09:13:43] congrats Ilias :) [09:28:06] elukey: I wrote a small http loadtest (of course) and with 100 threads, i can essily bump into the APIGW rate limit without the pods even breaking a sweat [09:28:49] (cf. [09:28:51] https://grafana.wikimedia.org/d/-D2KNUEGk/kubernetes-pod-details?orgId=1&var-datasource=eqiad%20prometheus%2Fk8s-mlserve&var-namespace=revertrisk&var-pod=revertrisk-language-agnostic-predictor-default-00002-deploxwntq&var-pod=revertrisk-language-agnostic-predictor-default-00002-deplom44zg&var-pod=revertrisk-language-agnostic-predictor-default-00002-deplolhv64&var-pod=revertrisk-language-agnos [09:28:53] tic-predictor-default-00002-deplo8j2hd&var-pod=revertrisk-language-agnostic-predictor-default-00002-deplo6d5hd) [09:28:57] urgh, grafana urls [09:33:47] okok nice! [10:02:06] I did get a lot of 500s just now. Investigating [10:44:59] * elukey lunch! [13:00:02] so folks I am trying to use the recommendation-api service deployed on wikikube from stat1004, but it doesn't seem really healthy [13:06:07] no ok it kinda words [13:06:10] *works [13:18:59] ok this is VERY confusing [13:19:00] https://gerrit.wikimedia.org/g/mediawiki/services/recommendation-api [13:19:02] vs [13:19:14] https://gerrit.wikimedia.org/r/plugins/gitiles/research/recommendation-api [13:20:20] Do u have a sample call? [13:20:24] for reference [13:21:01] no idea, the service deployed on wikikube doesn't answer to any query indicated by https://gerrit.wikimedia.org/r/plugins/gitiles/research/recommendation-api [13:21:13] but it makes sense, since the docker image is from https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/recommendation-api/ [13:21:17] that is a nodejs service [13:21:58] so crazy [13:35:28] elukey: there's some.... interesting things going on with the APIGW ratelimit for personal tokens. The rate limit configured in the APIGW is only a fallback, personal tokens have a builtin limit of 5k rq/h. Increasing that is possible, but I haven't managed to get it to work yet. So the actual rate limit is 5k (since anon access doesn't work with POSTs) [13:36:24] I've been chatting with Hugh about it, and his opinion is that the ratelimiting infra in the GW is outdated dsign-wise and needs reevaluation if it's fit for the intended purpose. [13:52:10] 10Machine-Learning-Team: Host open source LLM (bloom, etc.) on Lift Wing - https://phabricator.wikimedia.org/T333861 (10isarantopoulos) This is interesting [[ https://github.com/huggingface/transformers/pull/13466 | issue ]] related to resources (cpu memory) needed when loading a model. It loads the model 2 time... [13:53:57] klausman: wait wait, I am not getting the 5k limit.. built-in means that they are hardcoded somewhere in the api-gateway code? [13:54:29] 10Machine-Learning-Team: Host open source LLM (bloom, etc.) on Lift Wing - https://phabricator.wikimedia.org/T333861 (10isarantopoulos) Will continue this investigation and will try to deploy [[ https://huggingface.co/tiiuae/falcon-7b-instruct | falcon-7b-instuct ]] which seems like a strong ready-to-use base mo... [13:56:55] also anon access works with post, we added the support recently [14:24:43] 10Machine-Learning-Team, 10Patch-For-Review: Create ORES migration endpoint (ORES/Liftwing translation) - https://phabricator.wikimedia.org/T330414 (10isarantopoulos) Putting together a list of missing things from ores-legacy service related to ORES - Swagger UI: enrich the UI to bring it closer to https:/... [14:28:25] elukey: I still got 401s [14:28:31] httpCode":401,"httpReason":"Jwt is not in the form of Header.Payload.Signature with two dots and 3 sections" [14:28:43] --^ I added some things about ores legacy - I can work on these next week [14:28:55] Oh, I think I know what the prob is [14:30:49] Yea, sending an empty-ish auth header is not the same as not sending one :) [14:31:13] confirm rl 10k for anon, and I was able to bump my own token to "Preferred", i.e. 25k [14:31:30] I still have to wait for the ratelimit reset on the hour to test that one fully, tho [14:33:46] elukey: hmmm. do we have k8s-internal rate limits? [14:46:25] klausman: yes, 100 reqs/s for each pod [14:46:45] (03PS38) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) [14:47:06] ah, It seems I may be hitting that [14:47:44] it is easy to spot, you should see "local rate limit" or similar [14:47:59] (you code reviewed that change btw :) [14:48:19] (03CR) 10CI reject: [V: 04-1] feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [14:51:49] klausman: at this point if we don't solve the problem we have to suggest the use of anon requests and raise its limit [14:52:22] Working on it with Hugh. [14:52:51] elukey: wait, body == local_rate_limited means k8s rate limit? Then I'm hitting that [14:52:57] yes [14:53:34] envoy keeps track of a per-ip-buckets, throwing 429s if we get to that stage (100 rps/s) [14:53:43] that is a lot, how are you hitting it? [14:56:43] (03PS39) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) [14:56:51] 100 threads, hitting the service as fast as it will go without giving 429s, but with a min delay between requests of 1ms [14:57:09] if I get a 429, that thread will double the delay, with an upper bound of 1s. [14:58:14] (03CR) 10CI reject: [V: 04-1] feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [14:59:03] With 10 workers going as fast as it goes (1ms delay), running for 5s, I get 435 200s, and no 429. [14:59:27] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10PatchDemoBot) Test wiki on [[ https://patchdemo.wmflabs.org | Patch demo ]] by ISarantopoulos-WMF using patch(es) linked to this task... [14:59:37] So that works out to 87rps [15:00:32] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10PatchDemoBot) Test wiki **created** on [[ https://patchdemo.wmflabs.org | Patch demo ]] by ISarantopoulos-WMF using patch(es) linked t... [15:03:35] klausman: keep in mind that the rate limit is per-ip [15:03:47] I can increase that to about 50 or so, with linear rps increase. [15:10:01] (03PS40) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) [15:10:19] (03CR) 10Ilias Sarantopoulos: "Switched to internal endpoint" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [15:11:48] klausman: let's focus on the rate limit class problem - Hugh says that a way around it could be to create an ad-hoc class in mediawiki, shall we open a task? [15:12:26] (03CR) 10CI reject: [V: 04-1] feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [15:13:44] (03PS41) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) [15:15:46] (03CR) 10CI reject: [V: 04-1] feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [15:16:00] (03PS2) 10AikoChou: revert-risk: handle unsupported edit types for wikidata model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924912 (https://phabricator.wikimedia.org/T333125) [15:16:12] (03PS42) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) [15:17:58] (03CR) 10CI reject: [V: 04-1] feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [15:18:04] argh.. [15:18:28] (03CR) 10AikoChou: revert-risk: handle unsupported edit types for wikidata model (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924912 (https://phabricator.wikimedia.org/T333125) (owner: 10AikoChou) [15:18:53] seems like some more work is needed for ores extension LW patch. CI makes some calls to the API so I'll need to mock some calls as LW internal endpoint is not reachable [15:20:54] :( [15:20:56] keep going! [15:38:56] elukey: yep (re: rate limits) [15:41:44] ok super, lemme know if you have bandwidth to work on it too [15:41:53] shouldn't be too long hopefully [15:42:17] but we'll need to understand how to add the new limit to mediawiki and gather consensus from the community in doing so [15:42:36] I think we may need to go full anon for this, but we'll see [15:43:51] I'll have a chat about some of the tech detail with Hugh soonish. (I'd say tomorrow, but I think Hugh might be out then) [15:44:48] klausman: sure, let's outline the options etc.. in the task, but since we need to let WME to test our service I'd say that we give us a timeline - if by Monday / Tuesday we don't have a clear path let's bump anon to 200k/hour [15:44:52] does it make sense? [15:45:08] yep [15:47:58] super [15:48:08] leaving for the long weekend folks, talk with you on Monday! o/ [15:49:46] \o [15:50:43] elukey: also, happy Republic Day! [16:04:11] Have a nice long weekend! [17:18:01] (03PS43) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) [17:20:05] (03CR) 10CI reject: [V: 04-1] feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [17:41:05] (03PS44) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T319170)