[06:08:18] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 7th round of wikis - https://phabricator.wikimedia.org/T304551 (10kevinbazira) [08:04:34] morning :) [08:13:55] Good morning! [08:15:48] o/ [08:16:11] I am starting to review knative serving's changelog since 0.18.1 (our version) that was released in late 2020 [08:16:14] lol [08:17:01] ack [08:17:41] elukey: in order to enable MP I just set the env var ASYNCIO_USE_PROCESS_POOL to true right? [08:18:11] + ASYNCIO_AUX_WORKERS [08:18:20] isaranto: yes correct [08:18:48] isaranto: ah no something more I am afraid, namely the container's cpu/memory limits [08:19:03] otherwise we create processes but they will not really be used [08:19:08] (or at least, not fully) [08:19:41] we'd need to verify how to add new limits settings to an InferenceService resource [08:19:50] it shoul be straightforward in theory [08:22:19] elukey: gotcha! I will start with the limits u point out n the previous task. thanks! [08:24:25] elukey: btw does the deployment charts repo get synced automatically as well or only manually? I was thinking just manually changing the limits and en vars in staging so I could quickly enable/disable MP but I was wondering if they would be synced/overwritten at some point [08:28:58] isaranto: the repo needs expicit helmfile sync actions to apply changes, we can definitely do some manual overrides, but I'll have to do them (not a problem, but for the moment you cannot edit any resource_ [08:29:30] 👍 [09:29:06] elukey: could u patch the en-revscoring-editquality-goodfaith pod with the stuff mentioned in this patch? https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/861345 [09:31:50] I mean in m-staging. I am not really sure if the specific patch would work as is, just pasting it here to communicate the stuff I want to change [09:32:05] isaranto: the patch seems to work, see the output of the CI job :) [09:32:19] Since it is a template, I'll manually patch the isvc resources [09:32:27] do you need both zh and enwiki? [09:34:04] elukey: sure please do both. I will run stuff with MP and then sync so that I can run without MP [09:34:59] isaranto: one thing to consider is also the amount of resources that we want to test/allocate, since 5 cpus is a lot for a single pod [09:35:25] nproc on mlstaging2001 says 72, with hyperthreading on [09:35:33] same hw for the "prod" workers as well [09:40:32] isaranto: pods up! [09:48:51] elukey: thanks! the env vars are there but I still see the old resources [09:48:51] ``` [09:48:51] resources: [09:48:51] limits: [09:48:51] cpu: "1" [09:48:52] memory: 2Gi [09:48:52] requests: [09:48:53] cpu: "1" [09:48:53] memory: 2Gi [09:48:54] ``` [09:48:54] in the kserve-container. I think the patch is not setup correctly. could u perhaps edit the pod manually this time? [09:51:50] yes my bad sorry, I forgot those bits :D [09:53:10] new pods are coming up [09:54:15] should be good now [09:55:17] perfect! Thanks Luca 😃 [10:17:19] 10Machine-Learning-Team, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Import new knative-serving version (k8s 1.23 dependency for ML) - https://phabricator.wikimedia.org/T323793 (10elukey) Things to keep in mind and to review: https://knative.dev/docs/serving/... [10:30:16] ok now knative yamls to import in our templates, 4k+6k long :D [10:55:02] Morning! [10:55:27] 10Machine-Learning-Team, 10ORES, 10Advanced Mobile Contributions, 10Growth-Team, and 3 others: 'Highlight likely problem edits' preference doesn't select any filters in mobile web - https://phabricator.wikimedia.org/T318683 (10Samwalton9) a:05eigyan→03None Taking this off our list of active work. We re... [10:55:33] 10Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Preferences, 10Moderator-Tools-Team: When ORES quality filters are selected in mobile web, entries should be highlighted - https://phabricator.wikimedia.org/T314026 (10Samwalton9) a:05eigyan→03None [11:32:41] * elukey lunch! [14:05:31] Hey all [14:05:39] Hey Chris [14:15:02] o/ [14:16:54] Why is it so cold every morning [14:20:36] /o here it is afternoon but it is still cold (for here ofc) [14:22:53] I was in the mountains yesterday but somehow the house feels much colder ¯\_(ツ)_/¯ [15:11:35] 10Lift-Wing, 10Machine-Learning-Team, 10Epic: API Gateway Integration - https://phabricator.wikimedia.org/T288789 (10klausman) [15:12:03] 10Lift-Wing, 10Machine-Learning-Team: Configure LW Inference services on API GW config - https://phabricator.wikimedia.org/T323916 (10klausman) [15:52:03] 10Machine-Learning-Team, 10Patch-For-Review: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) Enabled MP and ran on ml-staging with benthos for 5 minutes for revscoring-editquality-goodfaith: for en wiki ` Total 5 minute duration: 50.0% 509.65ms... [15:56:44] 10Machine-Learning-Team, 10Discovery-Search: Create Model Card for Search MLR - https://phabricator.wikimedia.org/T323794 (10MPhamWMF) > The template seems to be more oriented towards models that can be reused for other applications (which is the case of the models deployed on LiftWing). The Search MLR models... [16:06:47] * elukey bbiab [16:16:07] trying to undo the patches to the pods but `helmfile -e ml-staging-codfw diff/sync`does nothing... any other way to force deploy the charts? I tried to run `kubectl apply` to the result of `helmfile -e ml-staging-codfw template` but I don't have permissions [16:41:36] isaranto: I need to revert it manually, I didn't really applied your patch since helm is needed to generate templates etc.. I can revert if you want [16:43:04] elukey: yes, plz revert it whenever u can [16:44:41] isaranto: done, pods are spinning up :) [17:00:28] isaranto: ok if I kill/restart enwiki-goodfaith in staging? [17:00:32] or are you testing it? [17:01:10] elukey: sure go ahead. I am testing but I can wait until the restart [17:02:04] ah snap two pods are running for enwiki-goodfaith [17:02:11] I think that knative didn't like the test that we did [17:02:14] lemme clean up [17:06:08] done! [17:06:17] elukey: o/ testing two rev_ids with 1000 requests (500 for each) reproduced the issue, 19 missing responses! [17:07:06] aiko: nice! [17:11:53] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Explore ingress filtering for Lift Wing - https://phabricator.wikimedia.org/T300259 (10elukey) Tested also metrics: ` elukey@ml-staging2002:~$ sudo nsenter -t 1653602 -n curl -s localhost:15000/stats/prometheus | grep rate_limit # TYPE envoy_http_loc... [17:11:59] 10Machine-Learning-Team, 10Discovery-Search: Create Model Card for Search MLR - https://phabricator.wikimedia.org/T323794 (10calbon) I think the priority can be low for you all. The goal isn't to make it possible for other people to reuse the models, the goal is for the community and other stakeholders to ha... [17:20:47] I think I found a good initial rate-limit compromise for lift wing, tests went really good [17:21:06] (one less blocker for the MVP) [17:49:54] (03PS1) 10AikoChou: WIP - adding multilingual revertrisk model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/861434 (https://phabricator.wikimedia.org/T323613) [17:52:47] (03CR) 10CI reject: [V: 04-1] WIP - adding multilingual revertrisk model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/861434 (https://phabricator.wikimedia.org/T323613) (owner: 10AikoChou) [18:02:14] * elukey afk! [18:02:15] o/ [18:02:23] have a nice evening/rest-of-the-day folks [18:15:50] heading out as well \o