[06:42:43] (03CR) 10Elukey: [C: 03+2] events: support multiple source events (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/888190 (https://phabricator.wikimedia.org/T328576) (owner: 10Elukey) [09:36:10] 10Machine-Learning-Team: Upgrade the link recommendation algorithm from Spark 2 to Spark 3. - https://phabricator.wikimedia.org/T323493 (10MGerlach) In my opinion, one of the main issues for the migration from spark2 to spark3 will be the following (there might be other issues though): Currently, for the spark j... [10:26:50] * elukey lunch! [10:47:34] 10Machine-Learning-Team, 10Epic: Lift Wing improvements to get out of MVP state - https://phabricator.wikimedia.org/T333453 (10achou) [10:48:51] 10Machine-Learning-Team, 10Epic: Lift Wing improvements to get out of MVP state - https://phabricator.wikimedia.org/T333453 (10achou) [10:48:53] 10Lift-Wing, 10Machine-Learning-Team: Investigate Explainer for Revert-Risk model - https://phabricator.wikimedia.org/T330131 (10achou) [10:51:23] 10Lift-Wing, 10Machine-Learning-Team: Move Revert-risk language agnostic model from staging to production - https://phabricator.wikimedia.org/T332998 (10achou) [10:51:25] 10Machine-Learning-Team, 10Epic: Lift Wing improvements to get out of MVP state - https://phabricator.wikimedia.org/T333453 (10achou) [10:51:56] 10Lift-Wing, 10Machine-Learning-Team: Move Revert-risk multilingual model from staging to production - https://phabricator.wikimedia.org/T333124 (10achou) [10:51:58] 10Machine-Learning-Team, 10Epic: Lift Wing improvements to get out of MVP state - https://phabricator.wikimedia.org/T333453 (10achou) [10:56:25] 10Machine-Learning-Team, 10CirrusSearch, 10Discovery-Search (Current work): Add outlink topic model predictions to CirrusSearch indices - https://phabricator.wikimedia.org/T328276 (10achou) [10:56:27] 10Machine-Learning-Team, 10Epic: Migrate ORES clients to LiftWing - https://phabricator.wikimedia.org/T312518 (10achou) [10:57:41] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Event-Platform Value Stream: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10achou) [10:57:45] 10Machine-Learning-Team, 10CirrusSearch, 10Discovery-Search (Current work): Add outlink topic model predictions to CirrusSearch indices - https://phabricator.wikimedia.org/T328276 (10achou) [11:31:58] o/ [11:33:23] so I deployed fastapi on minikube with both ways (manual chart creation and generation with sextant). the manual worked immediately while for the sextant one I had to change 2 things that are probably errors [11:34:00] I'll open a new patch in deployment charts and I'll write a comment/review on the sextant repo about my experience [11:35:06] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 10): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10achou) > I guess this is more of the question: Do we want to ever be able to do this? If... [11:56:58] my pov is at the moment I wouldn't prefer to use a tool that is not ready. My suggestion would be to migrate when it is ready. [11:56:58] BUT since I already tried it I'll open the patch to get some feedback and if it works lets go with it [11:57:49] ofc my opinion has to do with my lack of expertise in this area as I see a lot of boilerplate stuff that I think they are not needed so I may be totally wrong [12:39:16] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Event-Platform Value Stream: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10Ottomata) [12:45:53] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 10): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10Ottomata) > Updating the schema later to do this will not be easy, as it would be an inc... [13:04:37] isaranto: nono it is a very nice feedback, especially for SRE folks, and it will help the design of the tool in the future. Thanks a lot! [13:30:51] 10Machine-Learning-Team, 10SRE, 10serviceops, 10Language-Team (Language-2023-April-June ), 10Service-deployment-requests: New Service Deployment Request: NNLB-200 for machine translation - https://phabricator.wikimedia.org/T329971 (10Pginer-WMF) [13:31:05] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 10): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10achou) > what are the user use cases for having multiple classifications / embedding pre... [13:43:10] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10Ladsgroup) [14:38:09] The patch is ready (sort of :smile:) https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/904777 [14:38:09] At least for the first review. I did proceed with the chart built with sextant [14:39:02] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 11): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10JArguello-WMF) [14:41:10] isaranto: nice! So I had a quick check and I think that we'll need to create some custom stuff [14:41:21] for example, the lamp configs are not ok for our use case IIUC [14:41:33] we don't need any php-fpm, but probably uvicorn [14:41:41] we don't need any of these them [14:41:53] *these [14:42:03] they are created automatically but never used [14:42:24] ok then they are surely something to report to SRE [14:43:00] but beside lamp, we need all that's created [14:43:09] do we need the mesh? [14:43:37] yeah IIUC it is to have a local sidecar with envoy, to be able to proxy to various endpoints (like the inference one) [14:43:52] they don't use istio like us, the proxy part it is explicity [14:44:10] you have to connect to envoy in localhost, and every port has its own service associted [14:44:13] *associated [14:44:21] we get for free configuration/metrics/etc.. [14:44:43] I added support for it at the time with https://gerrit.wikimedia.org/r/c/operations/puppet/+/894014 [14:45:08] the istio ingress, network policies, service, etc.. are all needed [14:46:04] isaranto: the only bit that I am not sure is how we run the Python code, do we need a specific awsgi daemon or just python to run the uvicorn server? [14:48:42] python code runs through uvicorn in docker image https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/.pipeline/ores-migration/blubber.yaml#41 [14:49:36] we just need fastapi with uvicorn [14:53:07] nice [14:53:16] TIL https://spectrum.ieee.org/python-compiler#toggle-gdpr [14:53:36] promising but has a lot way to go [14:55:22] isaranto: qq - from your tests, do you see access logs from uvicorn? Like IPs + response codes + etc.. [14:55:39] lemme check [14:55:55] because I checked what kserve produces, and it is very weird [14:55:58] I opened https://github.com/kserve/kserve/issues/2778 [14:56:14] I created a patch to override the default config that kserve sets [14:56:30] but in theory even without it we should see some IPs logged [14:56:33] and I don't see anything [14:57:16] I see stuff like [14:57:16] 2023-03-31 14:03:39.133 16 root INFO [timing():49] kserve.io.kserve.protocol.rest.v1_endpoints.predict 0.8189318180084229, ['http_status:200', 'http_method:POST', 'time:wall'] [14:57:21] but this is not the access log [14:59:41] yes I see this [14:59:41] ``` [14:59:41] INFO: 172.17.0.1:43486 - "GET /v3/scores HTTP/1.1" 200 OK [14:59:41] 2023-03-31 14:58:56,769 app.utils INFO IP:172.17.0.1, User-Agent:kube-probe/1.23 [14:59:41] 2023-03-31 14:58:56,770 app.utils INFO response_time:0.0009589195251464844s [14:59:42] INFO: 172.17.0.1:49808 - "GET / HTTP/1.1" 200 OK [14:59:42] ``` [15:00:08] the first and last line come from uvicorn the one in the middle is our custom logging (if I'm not mistaken) [15:00:46] ok ok so there is something in kserve that is misconfigured for sure [15:00:58] I hope that somebody from upstream will help [15:01:40] any way we can try to apply the patch to ml-staging to check if it works? [15:01:53] I don't think I have permissions [15:02:37] let's do first a round of reviews [15:02:50] cool, I can w8 :) [15:03:56] isaranto: is package.json auto-generated? [15:05:19] yes. everything in the charts directory is auto-generated from sextant. only thing I changed was remove an additional {{- end}} and a line in deployment regarding volumes + add docker registry [15:05:28] okok [15:05:44] my stuff are under helmfile.d/ml-services/ores-legacy [15:10:26] we'll probably not get much feedback at this time of the day from SRE, let's ping them on monday [15:11:49] Sure [15:12:28] I just have a power outage here. Perfect timing ,so logging off for the weekend. Have a good time folks! [15:12:37] lol o/ [15:12:42] have a good weekend :) [15:15:42] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 11): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10diego) > I mentioned embeddings + classifications because embeddings usually serve as t... [15:25:10] bye Ilias :) [15:42:30] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10BTullis) [16:13:57] logging off as well, have a good weekend folks! [16:36:26] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 11): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10Ottomata) > I imagine it may be useful to have them in the same event stream. We coul... [16:53:55] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 11): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10diego) >>! In T331401#8746316, @Ottomata wrote: >> I imagine it may be useful to have t... [16:56:23] 10Lift-Wing, 10Machine-Learning-Team, 10Research (FY2022-23-Research-January-March): Create a language agnostic model to predict reverts on Wikipedia - https://phabricator.wikimedia.org/T314385 (10diego) **Updates** * We are coordinating with the ML team to have a public end-point for these models. [16:58:23] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 11): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10Ottomata) > having them in the same stream Just to be clear! 'same event' 'same stream... [17:20:44] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 11): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10diego) Yes, I was thinking on the same event. Like: ` scores: model_name: exam... [23:52:32] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10colewhite)