[08:04:26] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): API Gateway Integration - https://phabricator.wikimedia.org/T288789 (10elukey) The api-gateway config should now allow us to: 1) Add the inference service among the discovery_endpoints (reusing their config rather than creating a new cluster config in envoy... [08:05:05] good morning :) [10:03:12] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): API Gateway Integration - https://phabricator.wikimedia.org/T288789 (10elukey) A lot of interesting info in https://wikitech.wikimedia.org/wiki/API_Gateway about rate limiting and authentication/authorization. Assuming that OAuth 2.0 works fine in API-Gatew... [10:34:22] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): API Gateway Integration - https://phabricator.wikimedia.org/T288789 (10elukey) The current API-Gateway implementation offers a global rate-limit, but with changes to `deployment-charts` it can be customized to offer a per-service limiting. We'll collaborate... [11:23:24] * elukey lunch! [14:47:34] Morning all [14:47:45] o/ [15:21:26] just deployed the network rules for kserve [15:21:34] if it doesn't work anymore you know why :D [15:58:50] 10Machine-Learning-Team, 10DC-Ops, 10ops-codfw: (Need By: TBD) rack/setup/install ml-serve200[5-8] - https://phabricator.wikimedia.org/T294945 (10RobH) [15:59:02] 10Machine-Learning-Team, 10DC-Ops, 10ops-codfw: (Need By: TBD) rack/setup/install ml-serve200[5-8] - https://phabricator.wikimedia.org/T294945 (10RobH) [15:59:39] 10Machine-Learning-Team, 10DC-Ops, 10ops-codfw: (Need By: TBD) rack/setup/install ml-serve200[5-8] - https://phabricator.wikimedia.org/T294945 (10RobH) a:03Papaul [16:00:52] 10Machine-Learning-Team, 10DC-Ops, 10ops-codfw: (Need By: TBD) rack/setup/install ml-staging200[12] - https://phabricator.wikimedia.org/T294946 (10RobH) [16:00:53] our servers! [16:01:13] 10Machine-Learning-Team, 10DC-Ops, 10ops-codfw: (Need By: TBD) rack/setup/install ml-staging200[12] - https://phabricator.wikimedia.org/T294946 (10RobH) [16:01:30] 10Machine-Learning-Team, 10DC-Ops, 10ops-codfw: (Need By: TBD) rack/setup/install ml-staging200[12] - https://phabricator.wikimedia.org/T294946 (10RobH) a:03Papaul [16:01:58] o/ [16:04:20] o/ [16:14:19] elukey: i saw your note about global rate-limiting on api-gateway, i was hoping that wasn't the case :( [16:15:09] iiiuc, the way it is currently setup, a user would get rate-limited on all models, not on a per-model basis..? [16:20:53] 10Machine-Learning-Team, 10DC-Ops, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install ml-serve100[5-8] - https://phabricator.wikimedia.org/T294949 (10RobH) [16:21:15] 10Machine-Learning-Team, 10DC-Ops, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install ml-serve100[5-8] - https://phabricator.wikimedia.org/T294949 (10RobH) [16:21:29] 10Machine-Learning-Team, 10DC-Ops, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install ml-serve100[5-8] - https://phabricator.wikimedia.org/T294949 (10RobH) a:03Jclark-ctr [16:23:48] accraze: yeah, also the rate limit is global among all services afaics [16:24:47] we'll need to work with platform to add the features that we need, but I also suspect that we'll have to implement stricter policies on our side [18:15:27] folks can you give me your Wikitech usernames? [18:15:38] so I can list them as project admins for the VPS project [18:16:31] I can probably check LDAP [18:20:58] elukey: `Klausman` [18:25:59] created https://phabricator.wikimedia.org/T294964 [18:26:22] accraze: can you check --^ and see if it makes sense? (from the use case point of view) [18:26:43] klausman: perfect thanks [18:28:05] feel free to change the task as needed [18:28:08] going afk o/ [18:29:51] sounds good, see ya elukey! [21:46:17] elukey: accraze chrisalbon hey hey, I don't know if you know about the stretch migration in wmcs. https://wikitech.wikimedia.org/wiki/News/Stretch_deprecation This is gonna affect ores as basically everything is on stretch there (https://os-deprecation.toolforge.org/). I don't mind upgrading them, it's not that hard but the problem is that the pickles for them are the same as production and production is also on stretch. [22:11:16] Amir1: hey! thanks for the reminder -- we've been discussing this off-and-on for the past couple of weeks, not sure if we've decided on a plan of action yet [22:12:29] we ran into issues packaging different model classes for lift wing -- i think we'd only be able to upgrade from stretch to jessie and have everything still work [22:13:21] the pickle issue is a big concern, also some of the model classes did not play nice with newer versions of python (3.7+) [22:37:00] noted. Let me know if I can help with anything [22:37:08] specially since it seems quite fun