[01:03:09] (03PS36) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [01:04:14] (03CR) 10CI reject: [V: 04-1] Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [01:12:07] (03PS37) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [01:37:05] (03CR) 10CI reject: [V: 04-1] Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [02:58:07] (03PS38) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [03:09:40] (03PS39) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [03:17:32] (03CR) 10Kevin Bazira: Set up production and test images for the recommendation-api migration (032 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [03:20:10] (03PS40) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [03:45:07] (03CR) 10CI reject: [V: 04-1] Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [05:40:11] 10Machine-Learning-Team, 10Research: Index out of range in revert risk multi-lingual - https://phabricator.wikimedia.org/T340811 (10elukey) @diego @MunizaA Hi! IIUC we'd need to bump mwedittypes to 2.1.0 in knowledge_integrity, do you have time to do it? @Isaac really nice work! Thanks! [07:51:17] (03PS41) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [07:52:28] (03CR) 10CI reject: [V: 04-1] Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [07:53:57] 10Machine-Learning-Team, 10Research: Index out of range in revert risk multi-lingual - https://phabricator.wikimedia.org/T340811 (10isarantopoulos) Thanks a lot @Isaac for tackling this! Would this also solve the other revertrisk issues that crash in tree_differ.py of mwedittypes or should they be tackled inde... [08:10:14] 10Machine-Learning-Team, 10API Platform, 10Anti-Harassment, 10Content-Transform-Team, and 18 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10dcaro) [08:11:39] 10Machine-Learning-Team, 10API Platform, 10Anti-Harassment, 10Content-Transform-Team, and 18 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10dcaro) [08:51:12] Morning! [08:51:33] elukey: I see a pybal alert for ores2003, should I ack it while we work on removing it from config? [08:52:21] ohwait, we hadn't finally decided re: decom yet. [08:54:45] klausman: morning! My bad since I only depooled it from the node, we can set it "inactive" and the alert will auto-solve [08:54:53] via confctl I mean [08:55:12] do you want to do it? [08:55:41] sure, will do [08:58:00] 10Machine-Learning-Team, 10Research (FY2023-24-Research-July-September): Deploy multilingual readability model to LiftWing - https://phabricator.wikimedia.org/T334182 (10MGerlach) [08:58:02] elukey: this look good? `confctl select name=ores2003.codfw.wmnet set/pooled=inactive` [08:58:02] 10Machine-Learning-Team, 10Research (FY2023-24-Research-July-September): Deploy multilingual readability model to LiftWing - https://phabricator.wikimedia.org/T334182 (10MGerlach) [08:58:12] (plus adminlog of course) [09:02:35] (03CR) 10Kevin Bazira: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [09:03:54] yep! [09:10:20] (03PS42) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [09:11:53] (03CR) 10CI reject: [V: 04-1] Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [09:13:56] done&done (in case you hadn't seen in #-ops [09:40:47] (03PS43) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [10:03:36] (03CR) 10Kevin Bazira: Set up production and test images for the recommendation-api migration (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [10:06:41] 10Machine-Learning-Team, 10Research: Add ML team as developers to research repos - https://phabricator.wikimedia.org/T341856 (10isarantopoulos) [10:12:41] Finally resolved the CI file size limitation issues on recommendation-api migration patch. elukey, klausman, isaranto: o/ when you get a minute please let me know whether this patch is good to go: https://gerrit.wikimedia.org/r/932810 thanks! [10:18:57] Sure! thanks for all the work kevinbazira ! [10:21:51] yep! [10:29:48] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "LGTM, thanks for all the great work Kevin!" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [10:44:35] * elukey lunch! [10:49:13] 10Machine-Learning-Team, 10Research: Index out of range in revert risk multi-lingual - https://phabricator.wikimedia.org/T340811 (10MunizaA) Hi @elukey, the dependency contraint we have for `mwedittypes` in KI is "1.2.1" so unfortunately this new version is not a drop-in replacement. There are some minor API c... [10:49:15] (03CR) 10Ilias Sarantopoulos: [C: 03+1] Set up production and test images for the recommendation-api migration (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [10:50:31] * isaranto having a light lunch in 38 degrees :) [10:57:11] It has thankfully been a lot more tolerable here recently [11:13:20] (03CR) 10Kevin Bazira: Set up production and test images for the recommendation-api migration (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [12:10:01] (03CR) 10Klausman: [C: 03+1] Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [12:20:45] * klausman late lunch and a few errands [13:24:47] isaranto: did you know this? https://grafana.wikimedia.org/d/b1jttnFMz/envoy-telemetry-k8s?from=now-1h&orgId=1&to=now&var-app=ores-legacy&var-datasource=thanos&var-destination=All&var-prometheus=k8s-mlserve&var-site=eqiad [13:25:44] (this is from the envoy that we use as tls proxy on ores-legacy) [13:26:11] aa nice [13:26:31] and I wanted to ask the other day, was thinking there would be sth [13:26:39] thanks! [13:27:16] 10Machine-Learning-Team: [ores-legacy] Clienterror is returned in some responses - https://phabricator.wikimedia.org/T341479 (10elukey) Ilias added DEBUG logs and now I can see: ` 2023-07-14 13:20:18,595 app.liftwing.response DEBUG URL:http://localhost:6031/v1/models/enwiki-goodfaith:predict, HOST:enwiki-goo... [13:41:24] 10Machine-Learning-Team: [ores-legacy] Clienterror is returned in some responses - https://phabricator.wikimedia.org/T341479 (10elukey) Another example: ` 2023-07-14 13:20:28,596 app.liftwing.response DEBUG URL:http://localhost:6031/v1/models/enwiki-articlequality:predict, HOST:enwiki-articlequality.revscori... [14:10:34] 10Machine-Learning-Team, 10Patch-For-Review: [ores-legacy] Clienterror is returned in some responses - https://phabricator.wikimedia.org/T341479 (10isarantopoulos) Good catch @elukey! You are right this is what we do. Although from what I checked the header in the response is application/json. This is the req... [14:13:05] 10Machine-Learning-Team, 10Patch-For-Review: [ores-legacy] Clienterror is returned in some responses - https://phabricator.wikimedia.org/T341479 (10elukey) @isarantopoulos the above error comes from kserve, but I suspect that the other 50x are coming from the istio proxy, that doesn't return json. There is som... [14:17:58] 10Machine-Learning-Team, 10Patch-For-Review: [ores-legacy] Clienterror is returned in some responses - https://phabricator.wikimedia.org/T341479 (10elukey) More precisely, there are three istio proxies: 1) local on the ores-legacy pod, that proxies to Lift Wing. 2) The Istio Gateway for Lift Wing 3) the istio... [14:43:41] if anybody has time :) [14:43:42] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/938252 [14:45:08] I increased some replica count but we hit limits [14:45:13] so I thought to refactor things a bit [14:48:19] 👀 [14:51:44] I am also seeing some strange stuff with ores-legacy regarding the errors we get. https://ores-legacy.wikimedia.org/v3/scores/enwiki/949447964/goodfaith will return a response in a request with many models and revids we get the aforementioned clienterrors [14:57:28] isaranto: if you are checking the tls proxy, I think that it logs both inbound and outbound requests [15:01:36] 10Machine-Learning-Team, 10Research: Index out of range in revert risk multi-lingual - https://phabricator.wikimedia.org/T340811 (10Isaac) > Would this also solve the other revertrisk issues that crash in tree_differ.py of mwedittypes or should they be tackled independently? I am talking about https://phabrica... [15:03:17] 10Machine-Learning-Team: FeatureNotFound exception in revertrisk multi-lingual - https://phabricator.wikimedia.org/T340812 (10Isaac) Just noting that this one now works in the newest version of the library: https://edit-types.wmcloud.org/diff-details?lang=ja&revid=95818540 [15:03:24] 10Machine-Learning-Team: Pop index out of range exception in revertrisk multi lingual - https://phabricator.wikimedia.org/T340813 (10Isaac) Just noting that this one now works in the newest version of the library: https://edit-types.wmcloud.org/diff-details?lang=uk&revid=39814738 [15:10:08] isaacj: Thanks for the checks and all the responses! [15:11:36] isaranto: happily! i'm excited to get additional usage and feedback :) plus i'm working slowly on trying to make it easier to run the package on the cluster across historical data so any errors that are caught and speed-ups are super helpful for simplifying that work too [15:30:31] 10Machine-Learning-Team: Define SLI/SLO for Lift Wing - https://phabricator.wikimedia.org/T327620 (10klausman) https://grafana.wikimedia.org/goto/x7S0HpjVk?orgId=1 I've started an SLO dahsboard here. It only has one metric (Latency) so far, but it's a start. [15:34:59] (03PS1) 10Ilias Sarantopoulos: ores-legacy: fix error due to response content type [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/938266 (https://phabricator.wikimedia.org/T341479) [15:39:15] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.41-notes (1.41.0-wmf.17; 2023-07-11), 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10isarantopoulos) I haven't had any luck trying to check the above errors when running th... [15:41:51] (03PS1) 10Ilias Sarantopoulos: add flag for host header [extensions/ORES] - 10https://gerrit.wikimedia.org/r/938267 (https://phabricator.wikimedia.org/T319170) [15:46:44] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.41-notes (1.41.0-wmf.17; 2023-07-11), 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10isarantopoulos) I the above patch I am attempting to resolve the issue that occurs when... [15:48:40] 10Machine-Learning-Team, 10Patch-For-Review: [ores-legacy] Clienterror is returned in some responses - https://phabricator.wikimedia.org/T341479 (10isarantopoulos) I also saw the following behavior: I make this request `curl https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-goodfaith:predict -X... [15:51:37] going afk for the weekend folks! [16:05:49] same \o [18:52:12] 10Machine-Learning-Team, 10Research: Index out of range in revert risk multi-lingual - https://phabricator.wikimedia.org/T340811 (10MunizaA) @Isaac thanks so much for the pointers! It seems like this model is also using node edit info for [[https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob... [20:25:37] 10Machine-Learning-Team, 10Research: Index out of range in revert risk multi-lingual - https://phabricator.wikimedia.org/T340811 (10Isaac) > so I was wondering if this API also has timeout disabled? If not then the problem could just be with my test setup. Yeah, the API has timeout disabled. I prefer that sett...