[07:09:43] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES: EnWiki Recent Changes Page no longer displays damaging filters - https://phabricator.wikimedia.org/T331045 (10elukey) [07:21:05] good morning :) [07:25:31] isaranto: o/ another bot/app is https://github.com/SWViewer/tool-swviewer [07:38:53] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES: EnWiki Recent Changes Page no longer displays damaging filters - https://phabricator.wikimedia.org/T331045 (10elukey) I don't see the damaging filter in https://en.wikipedia.org/wiki/Special:RecentChanges, really strange. I quickly checked in SAL and gerrit... [07:42:04] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream, 10Research: Proposal: Create a stream end point for Revision Risk Model - https://phabricator.wikimedia.org/T326179 (10elukey) Just to clarify - the ML team is happy to support any input/output schema that is reasonable. We are try... [08:04:02] 10Machine-Learning-Team, 10Data-Engineering, 10Edit-Review-Improvements-Integrated-Filters, 10Event-Platform Value Stream, and 2 others: Integration of Revert Risk Scores to Recent Changes as a filter - https://phabricator.wikimedia.org/T329071 (10elukey) Thanks a lot! > Regarding the jobs, the reason ore... [08:54:49] o/ thanks Luca [08:55:13] I'll wrap up what I'm doing today and spend Monday documenting the bots/tools that use ORES [08:57:06] isaranto: ack! I just rolled out a change to remove the usage of /v2/scores from our infra (now we use /v3/scores) [08:57:13] to remove some noise [08:59:18] 10Machine-Learning-Team, 10Patch-For-Review: Implement new mediawiki.revision-score streams with Lift Wing - https://phabricator.wikimedia.org/T328576 (10elukey) >>! In T328576#8638356, @Ottomata wrote: > I know this is more work, and maybe not worth it since we want to eventually deprecate these ORES models (... [09:11:00] 10Machine-Learning-Team: [nsfw] Upgrade python and debian in docker image - https://phabricator.wikimedia.org/T329612 (10elukey) New Docker images deployed to staging (+ httpbb testing) and production. [09:13:13] 10Machine-Learning-Team: [revertrisk] Upgrade python from 3.7 to 3.9 in docker images - https://phabricator.wikimedia.org/T328439 (10elukey) Double checked after the prod upgrade to k8s 1.23: ` root@deploy2002:~# kubectl exec revertrisk-multilingual-predictor-default-00001-deploymenttmv5r -n experimental -- cat... [09:15:01] 10Machine-Learning-Team: [outlink] Upgrade python from 3.7 to 3.9 in docker images - https://phabricator.wikimedia.org/T328438 (10elukey) Double checked in prod after the upgrade to 1.23: ` root@deploy2002:~# kubectl exec outlink-topic-model-predictor-default-00001-deployment-76cx8kkc -n articletopic-outlink --... [09:16:23] 10Lift-Wing, 10Machine-Learning-Team: Deploy revert-risk multilingual model to production - https://phabricator.wikimedia.org/T325218 (10elukey) Models deployed to production as well, all good! [09:17:37] aiko, isaranto - I have took the liberty to complete some of the tasks that were waiting for the 1.23 upgrade [09:18:23] Wow,thank u [09:18:40] You are on 🔥 ⭐ [09:18:54] well you and Aiko did all the work, I just moved to done :D [09:19:17] one thing that I am wondering is if we should finish https://phabricator.wikimedia.org/T329032 during the next weeks [09:19:30] together with importing the new control plane [09:19:36] so that we don't have mixed versions [09:32:19] Agreedo [09:32:32] I mean yeah we should get it out of the way [09:32:39] going to see if I can send a couple of code changes today, it should be easy enough [09:48:29] (03PS1) 10Elukey: revscoring: remove unnecessary aiohttp session cleanup [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894004 (https://phabricator.wikimedia.org/T329032) [09:48:31] (03PS1) 10Elukey: revert-risk: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894005 (https://phabricator.wikimedia.org/T329032) [09:48:33] (03PS1) 10Elukey: nsfw: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894006 (https://phabricator.wikimedia.org/T329032) [09:48:35] (03PS1) 10Elukey: outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) [09:52:20] (03CR) 10CI reject: [V: 04-1] nsfw: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894006 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [10:09:56] changes not ready for a review yet :) [10:10:18] do u need me to take on one of these? [10:25:21] nono [10:25:55] I'll do more tests and add folks when they are ready [10:54:18] isaranto: I am thinking about the revscoring-translation service, and the more I do the more I think that we shouldn't use istio/knative for it.. we should probably use what people follow to add a new service to wikikube [10:54:29] so we don't need to mess with the mesh etc.. [10:54:42] but we keep it confined for kserve pods [10:54:46] does it make sense? [10:55:09] yeah, sure . it doesnt need istio anyway [10:55:14] I mean too much complexity [10:55:35] the only caveat is that to call inference.w.o we'll need a sidecar with a reverse proxy [10:55:45] this is something that serviceops provides in their scaffold stuff [10:55:50] plus metrics etc.. [10:56:07] we'd deploy the service on Lift Wing, but with ServiceOps' standards [10:57:36] I'll follow up with the other sres to figure out what they think about it [10:58:40] (03PS2) 10Elukey: revert-risk: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894005 (https://phabricator.wikimedia.org/T329032) [10:58:42] (03PS2) 10Elukey: nsfw: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894006 (https://phabricator.wikimedia.org/T329032) [10:58:44] (03PS2) 10Elukey: outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) [11:01:55] (03CR) 10CI reject: [V: 04-1] outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [11:05:09] ah of course [11:12:06] 10Machine-Learning-Team: Upgrade Kserve's k8s control plane to 0.10 - https://phabricator.wikimedia.org/T331114 (10elukey) [11:41:11] * elukey lunch! [11:50:48] aa , trying to install packages on deployment server on virtualenv gave me a headache :D [11:50:53] * isaranto goes for lunch [12:37:32] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10cmooney) a:03cmooney [12:55:07] 10Machine-Learning-Team, 10Language-Team, 10serviceops-radar: Hosting machine request for machine translation - https://phabricator.wikimedia.org/T329971 (10akosiaris) [12:55:20] 10Machine-Learning-Team, 10Language-Team, 10serviceops-radar: Hosting machine request for machine translation - https://phabricator.wikimedia.org/T329971 (10akosiaris) Adding some things for transparency. We had a meeting that panned out the next few steps (some need to be done regardless of where we 'll hos... [14:04:43] isaranto: mmm we shouldn't install anything on the deployment server, what is the use case? [14:04:59] on stat then? [14:05:10] i want to try out the api im building [14:05:19] i was trying a virtualenv [14:05:46] yes stat100x is better [14:05:58] it is meant for experiments etc.. the deployment server is just to deploy [14:06:30] (it is not a big host IIRC, so people tend not to execute anything on it etc..) [14:06:40] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 11th round of wikis - https://phabricator.wikimedia.org/T308136 (10kevinbazira) [14:08:07] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 11th round of wikis - https://phabricator.wikimedia.org/T308136 (10kevinbazira) a:05kevinbazira→03None @kostajh, we published datasets for all 21/22 models that passed the evaluation in this round. [14:14:13] elukey: I can't install pip packages from a remote index in stat boxes. any idea? [14:14:21] are they blocked? [14:14:52] (a workaround is to download wheels and upload them manually - but...) [14:15:00] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10MoritzMuehlenhoff) [14:17:36] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10jbond) [14:17:53] isaranto: I think that you need the http proxy [14:17:56] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10ayounsi) [14:18:12] I managed to do it on deployment(install wheels on virtualenv- access app through ssh tunneling ) . but I'll stop and remove everything related [14:18:18] * isaranto being a good boy [14:18:24] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10MoritzMuehlenhoff) [14:18:58] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10MoritzMuehlenhoff) [14:31:39] yes, forgot about the proxy. done! working on stat4 then. I made my life difficult for no reason :D [14:33:37] 10Machine-Learning-Team, 10ORES, 10Analytics-Radar, 10Data Pipelines, and 2 others: Discuss Wikistats integration for ORES - https://phabricator.wikimedia.org/T184479 (10lbowmaker) 05Open→03Declined Marking this as declined for now. Looking at the history it seems like nothing happened since 2019 and t... [14:44:52] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Infrastructure-Foundations, and 8 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10lbowmaker) [14:45:25] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10lbowmaker) [14:51:02] isaranto: thanks, sorry for the extra move but we'll surely not raise eyebrows in the SRE area :D [14:51:19] :D [14:51:39] no no it makes sense [14:55:16] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10elukey) [14:57:36] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Edit-Review-Improvements-Integrated-Filters, 10Event-Platform Value Stream, and 2 others: Integration of Revert Risk Scores to Recent Changes as a filter - https://phabricator.wikimedia.org/T329071 (10lbowmaker) [14:57:54] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10fgiunchedi) [14:59:35] it seems like it works :) [14:59:54] needs a lot of unit testing though [15:01:34] \o/ [15:03:05] (03PS3) 10Elukey: outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) [15:10:27] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10herron) [15:17:35] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10BTullis) [15:22:12] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10hnowlan) [15:22:45] going afk for a bit to visit the co-working/office, I'll check later on for pings :) [15:35:34] for the endpoint first results I see is that for single calls it takes as much time as liftiwing takes for a call. for 2 calls double etc. so async calls will help a lot [15:38:14] 10Machine-Learning-Team, 10artificial-intelligence, 10Research ideas, 10Research-Backlog, 10Wiki-Loves-Monuments: General image classifier for commons - https://phabricator.wikimedia.org/T155538 (10Miriam) [16:21:16] ending a week where it has taken me approx 4hours debugging typos.. lifwing instead of liftwing and revid instead of rev_id.. 🤦‍♂️ [17:04:41] I updated the patch but it is not ready for review https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/892998 [17:04:41] More to follow on Monday. enjoy the weekend! [17:04:47] \o [17:04:50] ccccccteedgnbrffreruvgjilhbibjluilvijgivhjhu [17:05:07] Gah. leaning on kbds is not great :) [17:05:25] haha [17:05:31] interesting sequence though [17:06:04] I am sure it contains the answer to fife, the universe and all the rest [17:06:07] life* [17:06:12] what is it with kbds and me today [17:25:42] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10colewhite) [17:33:52] have a good weekend folks! [17:38:29] didnt leave yet.... [17:40:27] I was super curious to try with async, so I used aiohttp instead of requests and for an ORES call that needs 5 liftwing calls with async the time drops from ~ 5.5 seconds to ~1.5 sec [17:41:10] on the other hand we need to try to see what inference batcher would do in kserve. anyway that's all folks. bye again! [17:42:55] bye all! [17:49:30] isaranto: nice work! [17:49:45] did you use asyncio gather? [17:49:53] anyway we can chat about it on monday, super curious [17:50:10] yes [17:55:36] \o/ [17:56:20] seems like it is quite stable. e.g. for an ORES request that requires 15 liftwing calls it takes 1.5 sec while the synchronous one takes 9 seconds [17:56:43] I updated the patch if you want to take a look https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/892998/8/ores-migration/app/main.py#94 [17:56:43] https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/892998/8/ores-migration/app/liftwing/response.py#73 [17:56:49] more on Monday