[06:28:35] good morning :) [06:37:40] 10Machine-Learning-Team, 10ORES: ORES server does not start due to flask dependency conflicts - https://phabricator.wikimedia.org/T309862 (10elukey) Hi! Thanks a lot for your interest in the ML team :) ORES is our current platform but we are building a new one called "Lift Wing", that will be entirely on Kube... [06:41:25] very nice pull request - https://github.com/wikimedia/ores/pull/359 [06:41:34] added Aiko and Kevin to it --^ [06:43:23] 10Machine-Learning-Team, 10ORES: ORES gives internal error on an invalid model_info parameter - https://phabricator.wikimedia.org/T279271 (10elukey) Thanks a lot for the pull request! My team is going to review it and get back to you :) [06:52:33] kevinbazira: o/ I merged your changes for articlequality isvcs [06:53:11] thanks for the review elukey. will deploy soon. [07:54:17] ah lovely, I am adding the code to send a revision score event to eventgate in our model.pys [07:54:34] and I was convinced that we fetched content from the mw api directly [07:54:44] but no, we use revscoring [07:54:45] sigh [07:56:56] so the main issue is that to construct a revision-score event, I'd need rev-id metadata [07:57:06] that afaics are not available from revscoring [07:57:16] so the quick solution would be to make another HTTP call to the mw api [07:57:21] increasing the latency [08:01:05] also, using the async mwapi stuff may not be straightforward [08:17:01] the ideal scenario would be to [08:17:11] 1) get the mw content via async api [08:17:19] 2) pass it to revscoring for feature extraction etc.. [08:17:27] 3) use metadata to create the revision-score event [08:26:27] but from what I can see it seems very difficult to achieve the result [08:47:07] ok I discovered something interesting [08:47:33] mmm no nevermind [08:51:06] good morning folks :) [08:51:28] hello aiko :) [08:58:54] aiko: one question from the editquality model.py code [08:59:01] (just to understand if I got it correctly) [08:59:25] when we use the revscoring extractor, behind the scenes it makes a call to the mw api [09:00:22] from what I can see, we do [09:00:32] 1) self.extractor.extract(rev_id, self.model.features) to get the list of features for the base use case [09:00:46] 2) if extended_output is true, we do the same call but with a trimemd list of features [09:01:00] so, IIUC, in the latter we call the mw api twice [09:01:05] is it the right understanding? [09:02:33] I am trying to see if we can call the mw api once (via async/await), and then instruct revscoring to just use that content [09:14:43] elukey: yes, that's correct, in the latter case we call mw api twice. [09:15:14] not sure if we can add base features list and trimmed features list together and call self.extractor.extract only once [09:18:06] https://github.com/wikimedia/revscoring/blob/master/revscoring/extractors/api/extractor.py#L58 seems it is a custom class ~revscoring.dependents.dependent.Dependent [09:30:43] aiko: yeah I am wondering if we could use the same list without trimming, not sure if it is needed or not [09:31:10] the other thing that I am wondering is if we could leverage the "cache" field of the extractor to pass what it is retrieved by the mw api [09:33:16] something like [09:33:38] 1) we get the rev-id content/metadata from the mw api from a regular HTTP call, not from revscoring [09:33:47] 2) we create the cache extractor parameter [09:34:07] 3) we pass it to the revscoring extractor function, that hopefully will use it (without calling the mwapi) [09:34:30] 4) data in 1) could be re-used to create the mediawiki-revisionscore event as well [09:50:51] 10Machine-Learning-Team, 10ORES: ORES server does not start due to flask dependency conflicts - https://phabricator.wikimedia.org/T309862 (10Gethan) Thanks for all the details on Lift Wing. I will go through it. Wish you all the best for the migration. I may review a few more tasks in ORES or revscoring to m... [09:51:01] I found something like [09:51:02] values = solve(features, cache={revision.text: "I think it is stupid."}) [09:57:04] caches : `dict` [09:57:04] A rev_id-->cache pairs of call-specific pre-computed values to [09:57:07] inject [09:57:13] this is the pydoc for the extractor [09:57:29] it doesn't really explain how the cache should look like [09:57:41] but IIUC it should be something that replaces the mw api call [09:58:09] so, if I am right, we could even have a separate transfomer that calls the mw api and that encodes its result into a cache dict [09:58:20] it would simplify our life a lot [09:58:23] does it make any sense? [10:22:12] yep, it makes sense. that would be nice if we can use the cache field [10:31:19] I wish there was an example [10:38:54] * elukey lunch! [13:08:34] elukey: one question: you mentioned we currently have some restrictions for memory/cpu in production. I wonder what are they specifically? how many cpu and how large memory for each pod? [13:11:41] aiko: so in k8s we don't assign cpus directly, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ [13:12:19] I am still not 100% familiar with how cpus are assigned to pods, but basically it is cpu time (shared among pods) [13:12:32] there is a request (when the pod is created) and a limit (maximum amount ) [13:12:49] IIRC we use kserve's defaults, lemme see [13:15:25] so the kserve container [13:15:28] Limits: [13:15:28] cpu: 1 [13:15:28] memory: 2Gi [13:15:28] Requests: [13:15:28] cpu: 1 [13:15:30] memory: 2Gi [13:15:33] aiko: --^ [13:15:43] these can be changed in case [13:17:19] elukey: I see.. thanks! :) [14:52:16] starting deployment for svwiki & trwiki articlequality isvcs [14:53:24] ack! [14:53:36] the work on revscoring's cache for the extractor is a bit of a mess [14:53:38] sigh [15:00:07] yep ... we are bound to run into hairy challenges as we work towards feature parity [15:00:10] both eqiad and codfw deployments have been completed successfully. [15:00:24] super [15:00:42] checking pods now ... [15:00:49] kevinbazira: the main issue with revscoring atm is that we can't use http async conns to the mw api [15:00:57] unless we change something in it [15:03:07] the code that runs on the pod now is totally blocking, and it doesn't play well with the kserve architecture [15:06:35] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Create the ml-serve-staging k8s cluster - https://phabricator.wikimedia.org/T302195 (10elukey) Completed the basic networking work (calico, eventrouter, coredns) + BGP config. Next step: https://wikitech.wikimedia.org/wiki/Ku... [15:08:05] OMW ... we'll find ourselves changing so many little things on revscoring. [15:08:05] all new pods are up and running. [15:08:05] NAME READY STATUS RESTARTS AGE [15:08:05] svwiki-articlequality-predictor-default-k8rwc-deployment-5rgfp4 3/3 Running 0 5m52s [15:08:05] trwiki-articlequality-predictor-default-lkmb5-deployment-5jg9vn 3/3 Running 0 5m50s [15:08:15] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Test async preprocess on kserve - https://phabricator.wikimedia.org/T309623 (10elukey) [15:12:00] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Test async preprocess on kserve - https://phabricator.wikimedia.org/T309623 (10elukey) While working on another task, I had to check some details of how revscoring is currently handling http connections to the mw api. For T301878 it would be nice to make a s... [15:12:08] kevinbazira: yeah but this one seems very big, I added a note to the related task [15:16:12] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks): Send score to eventgate when requested - https://phabricator.wikimedia.org/T301878 (10elukey) @Ottomata Hi! I am getting back to this task, and after some preliminary checks it seems that calling the eventgate endpoint is the best way to go with ks... [15:22:32] all right logging off for today o/ [15:22:33] away afk!