[06:08:49] (03CR) 10Elukey: outlink: add WP code list and increase gpllimit for MW API call (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/837642 (owner: 10AikoChou) [09:36:41] morning :) [09:41:41] morning :) [10:39:57] * elukey lunch [11:00:18] ditto [12:54:03] (03PS1) 10AikoChou: editquality: align ORES prediction output with Lift Wing's one [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932) [13:06:14] (03CR) 10Elukey: editquality: align ORES prediction output with Lift Wing's one (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932) (owner: 10AikoChou) [13:39:00] klausman: o/ [13:39:25] I successfully deployed a destination rule and service entry for eventgate-main.discovery.wmnet, the tls egress origination happens [13:39:32] \o/ [13:39:44] and I have a similar config for api-ro, but I still see the weird msg from the mw api [13:40:05] /o\ [13:40:12] so maybe it is a weird use case that mw doesn't handle correctly? [13:43:36] is the MW API maybe redirecting us to http:// or something silly like that? The only other idea I have is that the https-ness is broken beyond the endpoint that we use, but then I'd expect it to happen for more people. We shouldn't be all that unique. [13:47:09] my only fear is that at some point in the future we'll go https only and our code stops working [13:48:47] mediawiki is behing httpd, so it cannot really tell if it is https or not by itself [13:49:01] and envoy terminates TLS conns on the mw servers [13:49:21] so I guess it adds a special header to the requests that lands on httpd/mediawiki [13:51:24] I suspect something is stripping SSL (or that header) [13:51:38] maybe we need to poke the API guys, see if they can see something [13:52:15] I am tcpdumping on mw1317, I am pretty sure it is x-forwarded-proto: https [13:54:27] we issue an http request on our side, not https, that gets wrapped into a tls tunnel basically, so I guess that the information is picked up from envoy on the mw1317 side [13:54:43] it sees that it received a http request, and it doesn't set the header [13:55:29] even if in theory the HTTPS bit shouldn't really bubble up to L7 [13:56:34] https://www.envoyproxy.io/docs/envoy/latest/faq/debugging/xfp_vs_scheme has some related info [13:56:54] "Generally users request https:// resources over TLS connections and http:// resources in the clear. However, it is entirely possible for a user to request http:// content over a TLS connection or in internal meshes to forward https:// requests in cleartext." [14:12:33] ok this is messy, I think we can live with a warning for the moment :D [14:27:12] waitaminute. Is this maybe because in-stream (i.e. what the model generates) are http:// requests (in the sense of URL notation), but they're transported inside TLS? [14:27:34] But HTTP/1.1 doesn't use the scheme really in requests, does it? [14:29:52] exactly I had the same thought, but https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/headers#scheme [14:31:03] does istio expose SchemeHeaderTransformation? [14:31:31] ?? [14:32:10] That page points out that in Envoy, you can use that config to change scheme: headers [14:32:16] what do you mean with SchemeHeaderTransformation? The headers mentioned above? (just to be on the same page, my brain is already melted :D) [14:32:51] Yes [14:33:07] klausman: the page says that envoy will add the x-forwarded-proto for HTTP 1.1 conns [14:33:09] Envoy apparently can be configured to some degree in what it does [14:33:29] "Envoy will always set the :scheme header while processing a request. It should always be available to filters, and should be forwarded upstream for HTTP/2 and HTTP/3, where x-forwarded-proto will be sent for HTTP/1.1." [14:33:30] > This default behavior can be overridden via the scheme_header_transformation configuration option. [14:34:00] ah ok got it [14:34:02] I was wondering if that might give us a way to set a header that convinces api-ro that Everything I Just Fineā„¢ [14:34:20] we could try to set the header in the code [14:34:41] in the meantime I filed https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/839587 [14:34:44] Might be the simplest thing to try first [15:05:17] (03CR) 10Elukey: [C: 03+1] "Left a comment that can be skipped, I'll let you decide. Thanks a lot for the patience!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932) (owner: 10AikoChou) [15:05:49] Morning all! [15:06:38] o/ [15:12:03] o/ [15:22:10] 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10hashar) I ran the script today and it is still a thing: ` $ python /home/thcipriani/elapsed_gc_time.py |sort -rn|head 7:14:43 TOTAL... [15:23:43] 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10hashar) [15:26:40] (03PS3) 10AikoChou: editquality: align ORES prediction output with Lift Wing's one [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932) [15:27:45] (03PS4) 10AikoChou: editquality: align ORES prediction output with Lift Wing's one [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932) [15:36:30] 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10hashar) [15:38:20] 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10elukey) Really sorry for the issue :( We are actively working on Lift Wing and we hope to deprecate any git-lfs usage in favor of s... [15:38:51] (03CR) 10AikoChou: editquality: align ORES prediction output with Lift Wing's one (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932) (owner: 10AikoChou) [15:43:49] 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10hashar) @elukey `scoring/ores/editquality.git` has a bunch of a large objects which somehow cause jGit gc to take four hours. I th... [15:44:38] chrisalbon: --^ this is another example of the pain that we cause with git-lfs [15:51:19] folks I am rolling out the changes to fix the istio egress tls config [15:51:27] it will restart all revscoring pods on all clusters [15:52:16] will do staging today and check that all is fine, then ml-serve clusters tomorro [15:52:19] *tomorrow [16:03:32] 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10hashar) From Luca: > You don't need to run the aggressive GC, ever. Unless the repo has massively changed its structure The gc ca... [16:08:04] going afk folks! [16:08:16] have a nice rest of the day :) [16:34:16] elukey: \o when you do the rollout to prod tomorrow, please ping me, I'd like to follow along