[06:38:29] Good morning [06:54:12] morning folks! [07:07:10] morning morning o/ [07:09:08] both the article-country and outlink isvcs get every page-change from Wikipedia and run predictions for them using: `minReplicas: 1` and `maxReplicas: 5` in LW prod. [07:09:17] adding the rrla event-stream to prod is a WIP in: https://gerrit.wikimedia.org/r/1133742 [07:09:32] given that rrla is a high traffic isvc, it already uses: `minReplicas: 5` and `maxReplicas: 15`. [07:09:50] isaranto: klausman: what would be the ideal bump on these resources to enable the rrla isvc to sustain the new traffic for this event stream running predictions for every Wikipedia article page-change? [07:16:31] the service seems to be able to handle many more requests than it currently is. in istio I see 12-20 rps and p95 latency is below 300ms. This is ongoing with 5-6 replicas so I assume that the current settings are more than enough [07:38:19] (03CR) 10Ilias Sarantopoulos: [C:03+1] "Thanks for working on this kevin. LGTM with a minor suggestion!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133416 (https://phabricator.wikimedia.org/T386100) (owner: 10Kevin Bazira) [07:38:50] isaranto: super! thank you for the clarification. the patch is ready for review. [07:41:14] great, I've 1'd! Let's keep an eye on it while deploying [07:54:08] (03PS3) 10Kevin Bazira: Makefile: add support for edit-check [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133416 (https://phabricator.wikimedia.org/T386100) [07:55:41] (03CR) 10Kevin Bazira: Makefile: add support for edit-check (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133416 (https://phabricator.wikimedia.org/T386100) (owner: 10Kevin Bazira) [07:59:27] okok... I am going to deploy now and keep an eye on it, although changeprop has not yet been set to send traffic to LW production. I'll also monitor the resources closely when we reach that part. [08:16:13] (03CR) 10Ilias Sarantopoulos: [C:03+1] Makefile: add support for edit-check (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133416 (https://phabricator.wikimedia.org/T386100) (owner: 10Kevin Bazira) [08:17:32] (03CR) 10Kevin Bazira: [C:03+2] Makefile: add support for edit-check [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133416 (https://phabricator.wikimedia.org/T386100) (owner: 10Kevin Bazira) [08:18:24] (03Merged) 10jenkins-bot: Makefile: add support for edit-check [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133416 (https://phabricator.wikimedia.org/T386100) (owner: 10Kevin Bazira) [08:22:19] the rrla model-server that supports event streams is now live in LW prod: https://phabricator.wikimedia.org/P74590 [08:22:45] \o/ [10:53:33] (03PS7) 10AikoChou: edit-check: add SHAP values [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133426 (https://phabricator.wikimedia.org/T387984) [10:54:12] ---^ ready for review [10:55:20] klausman: o/ lemme know when you apply the rate limit removal [11:03:21] (03PS1) 10Ilias Sarantopoulos: add docker compose for edit check cpu [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133863 [11:08:20] (03PS1) 10Ilias Sarantopoulos: fix: fix edit-check blubber file [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133867 [11:09:51] aiko: I added the 2 patches above. I kinda messed up the docker image when I added unit tests :D [11:10:26] the other patch makes it easier to run the model with torch on cpu to avoid any issues some of us (me!) experience with apple silicon [11:14:25] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Ozge Karakaya - https://phabricator.wikimedia.org/T390855#10707360 (10Jelto) [11:14:37] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Ozge Karakaya - https://phabricator.wikimedia.org/T390855#10707361 (10Jelto) 05Open→03In progress p:05Triage... [11:28:20] isaranto: I'm finishing up something else, how about in 15m or so? [11:30:47] I'm going for lunch and then meetings so feel free to do it whenever you can and I can test it in the evening [11:36:10] Thanks! [11:38:22] ack, will proceed. [11:48:40] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Ozge Karakaya - https://phabricator.wikimedia.org/T390855#10707471 (10Jelto) a:03Jelto This need approval from:... [11:48:56] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Ozge Karakaya - https://phabricator.wikimedia.org/T390855#10707473 (10Jelto) [12:18:51] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Ozge Karakaya - https://phabricator.wikimedia.org/T390855#10707587 (10isarantopoulos) I approve [12:43:03] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Ozge Karakaya - https://phabricator.wikimedia.org/T390855#10707701 (10Jelto) [12:45:28] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team: Make airflow-dag for addalink training pipeline output compatible with deployed model - https://phabricator.wikimedia.org/T388258#10707704 (10Ottomata) > the models need to be available in a specific format (mostly pickled dictionaries are saved as sqlite f... [12:47:11] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team: Make airflow-dag for addalink training pipeline output compatible with deployed model - https://phabricator.wikimedia.org/T388258#10707711 (10Ottomata) Also: - What is the data need to train this model? - What are the inputs to this model at runtime? [12:48:59] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team: Make airflow-dag for addalink training pipeline output compatible with deployed model - https://phabricator.wikimedia.org/T388258#10707714 (10Ottomata) Last fall, I did a solo deep dive to try and understand the (old, pre-airflow?) AddALink data and model d... [12:49:05] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team: Make airflow-dag for addalink training pipeline output compatible with deployed model - https://phabricator.wikimedia.org/T388258#10707715 (10Ottomata) BTW, is https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/ actively used? [12:56:13] isaranto: ratelimit change all pushed [12:58:03] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Ozge Karakaya - https://phabricator.wikimedia.org/T390855#10707756 (10Jelto) I reached out... [12:59:42] (03CR) 10Kevin Bazira: [C:03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133863 (owner: 10Ilias Sarantopoulos) [13:51:09] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team: Make airflow-dag for addalink training pipeline output compatible with deployed model - https://phabricator.wikimedia.org/T388258#10708029 (10Michael) >>! In T388258#10707715, @Ottomata wrote: > BTW, is https://analytics.wikimedia.org/published/datasets/one... [15:22:30] (03PS2) 10Ilias Sarantopoulos: fix: fix edit-check blubber file [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133867 [15:24:53] (03CR) 10Ilias Sarantopoulos: [C:03+2] fix: fix edit-check blubber file [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133867 (owner: 10Ilias Sarantopoulos) [15:58:32] going afk folks, cu tomorrow! [15:59:46] o/ [20:55:48] (03PS1) 10Umherirrender: Add documentation to undocumented parameterless functions [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1134062 [22:14:55] (03CR) 10Reedy: [C:03+2] Add documentation to undocumented parameterless functions [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1134062 (owner: 10Umherirrender) [22:58:29] (03Merged) 10jenkins-bot: Add documentation to undocumented parameterless functions [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1134062 (owner: 10Umherirrender) [23:14:12] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES: Run PurgeScoreCache.php with the 'old' option on all wikis that have ORES enabled - https://phabricator.wikimedia.org/T391055 (10jsn.sherman) 03NEW [23:17:02] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES: Run PurgeScoreCache.php with the 'old' option on all wikis that have ORES enabled - https://phabricator.wikimedia.org/T391055#10711117 (10jsn.sherman) @Ladsgroup I've been poking around ORES and recent changes related things and thought this might be of int...