[08:02:33] good morning :) [08:07:06] o/ [08:26:37] \o [10:13:24] isaranto: o/ something is moving, from stat1004: [10:13:25] curl "https://ores-legacy.k8s-ml-staging.discovery.wmnet:31443/v3/scores" -i --http1.1 --resolve ores-legacy.k8s-ml-staging.discovery.wmnet:31443:10.192.0.201 -k [10:13:36] (the -k is still needed since I have to fix a TLS cert problem yet) [10:13:44] the main issue is that sometimes I get 404 [10:14:57] 🙌 [10:15:15] nice! I got 404 every second call I made [10:16:05] in the uvicorn logs I only see 200s, so I guess that the istio gateway is doing something weird [10:18:24] note to self: once we get this up and running I also need to improve error handling [10:18:48] e.g. for unknown revision ids I got a client error in the response instead of the standard ores message [10:19:38] atm contacting lift wing is not possible since we have to allow POSTs in the TLS proxy [10:22:43] ack [10:53:43] (03PS1) 10Ilias Sarantopoulos: feat: hardcode threshold calls to switch to Lift Wing [extensions/ORES] - 10https://gerrit.wikimedia.org/r/915541 (https://phabricator.wikimedia.org/T332953) [10:54:38] (03PS2) 10Ilias Sarantopoulos: feat: hardcode threshold calls to switch to Lift Wing [extensions/ORES] - 10https://gerrit.wikimedia.org/r/915541 (https://phabricator.wikimedia.org/T332953) [10:56:03] (03CR) 10CI reject: [V: 04-1] feat: hardcode threshold calls to switch to Lift Wing [extensions/ORES] - 10https://gerrit.wikimedia.org/r/915541 (https://phabricator.wikimedia.org/T332953) (owner: 10Ilias Sarantopoulos) [11:13:43] * elukey lunch! [11:39:04] * isaranto lunch [13:42:48] ok I found the issue of the random 404 [13:43:20] the K8s svc for the new ingress gateway targets all the istio gateway pods, including the kserve ones [13:43:42] so when the load balancing heuristic decides to use a "kserve" ingress it returns 404 [13:43:47] or other errors [13:44:28] ack [13:58:40] Ah, the rare "too many backends" problem :) [14:02:10] it is very weird since it may be a limitation of istio [14:03:31] https://github.com/istio/istio/blob/release-1.15.7-patch/manifests/charts/gateway/templates/service.yaml#L49 [14:03:47] ok so I need to add that bit, let's see [14:23:26] finally, now it works :) [14:23:26] curl "https://ores-legacy.k8s-ml-staging.discovery.wmnet:31443/v3/scores" -i --http1.1 --resolve ores-legacy.k8s-ml-staging.discovery.wmnet:31443:10.192.0.201 [14:24:54] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/915685 [14:25:13] ok so now the next problem is to allow envoy to make posts [14:28:02] 10Machine-Learning-Team, 10Patch-For-Review: Create a staging ingress configuration for ml-staging-codfw - https://phabricator.wikimedia.org/T335756 (10elukey) First result! from any node, like stat100x, one could use: ` curl "https://ores-legacy.k8s-ml-staging.discovery.wmnet:31443/v3/scores" -i --http1.1 --... [15:09:20] magic... :) [15:14:01] s/magic/horror [15:35:47] isaranto: o/ so I found the source of the problem, in ores-legacy we define only "app.post" [15:35:54] err sorry app.gets [15:36:26] mmm wait, am I missing something? Is ORES only working with GETs? [15:37:13] ahhhh of course I was trying to POST to the ores-legacy Ui [15:37:35] ORES works with GET.. [15:38:10] so we issue a GET request to ores-legacy which then POST(s) to Lift Wing [15:38:27] curl "https://ores-legacy.k8s-ml-staging.discovery.wmnet:31443/v3/scores/enwiki/1234331345/damaging" -i --http1.1 --resolve ores-legacy.k8s-ml-staging.discovery.wmnet:31443:10.192.0.201 works! [15:38:42] isaranto: yes yes I am stupid, too much confusion with Lift Wing! [15:39:15] I mean sort of works, we still need to apply the correct URL for lift wing [15:39:24] okok but the rest works fine [15:41:27] 🎉 [15:41:59] on the other hand I lost some changes to a script I had in a jupyter notebook [15:42:30] not checked in VCS and restarted the laptop [15:42:42] oh well, I love notebooks when this happens :) [15:43:30] ok, false alarm minor changes were not saved [15:50:02] isaranto: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/915727 should also fix the lift wing URL [15:53:44] should/is it be the same for all deployments? staging/prod [15:56:07] good point, we can definitely override for staging [15:56:17] also, I firewalled prod from staging, so we have to [15:56:19] sigh [15:56:22] lemme check [15:56:49] ahhh no wait [15:57:20] on the mesh settings (the envoy proxy config that serviceops manages as sidecar) we have only the "inference" settings, that is prod [15:57:43] so we'll need to add one for staging as well probably [16:00:53] mmmm no in theory the settings should allow to call prod from staging (need to be fixed) [16:01:41] isaranto: not sure why but I see a 200 contacting ores from the sidecar, but in the response I get "ClientError" [16:03:08] elukey: don't mind the responses for now I need to figure out what they will be [16:03:38] I mean it returns a 200 because the call succeeds (although the underlying LW calls fail) [16:04:06] isaranto: sure sure but it seems that an error response is generated, and from the create_error_response() code it seems that it is due to lift wing [16:04:54] I don't see logs in the pod except the access log, maybe also the python logging config for uvicorn needs to be set (or similar) [16:05:40] I think we can postpone the tests for when I am back next week (when you have time), but feel free to do any tests and report in the task! [16:05:43] hmm I will check [16:05:48] going afk for today folks, o/ [16:05:52] talk with you next week [16:05:56] ok I will for sure [16:05:59] have a nice rest of the day! [16:06:08] ciao! [16:51:04] logging off, more threshold debugging tomorrow! [17:02:06] bye Ilias! [17:05:07] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 17th round of wikis - https://phabricator.wikimedia.org/T308143 (10kevinbazira) [17:05:31] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 17th round of wikis - https://phabricator.wikimedia.org/T308143 (10kevinbazira) The conclusion on the backtesting results is that most of the languages look fine besides: - tiwiki (0.54), tkwiki (0.74), urwiki (0.62)...