[06:08:49] <wikibugs>	 (03CR) 10Elukey: outlink: add WP code list and increase gpllimit for MW API call (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/837642 (owner: 10AikoChou)
[09:36:41] <aiko>	 morning :)
[09:41:41] <elukey>	 morning :)
[10:39:57] * elukey lunch
[11:00:18] <klausman>	 ditto
[12:54:03] <wikibugs>	 (03PS1) 10AikoChou: editquality: align ORES prediction output with Lift Wing's one [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932)
[13:06:14] <wikibugs>	 (03CR) 10Elukey: editquality: align ORES prediction output with Lift Wing's one (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932) (owner: 10AikoChou)
[13:39:00] <elukey>	 klausman: o/
[13:39:25] <elukey>	 I successfully deployed a destination rule and service entry for eventgate-main.discovery.wmnet, the tls egress origination happens
[13:39:32] <klausman>	 \o/
[13:39:44] <elukey>	 and I have a similar config for api-ro, but I still see the weird msg from the mw api
[13:40:05] <klausman>	 /o\
[13:40:12] <elukey>	 so maybe it is a weird use case that mw doesn't handle correctly?
[13:43:36] <klausman>	 is the MW API maybe redirecting us to http:// or something silly like that? The only other idea I have is that the https-ness is broken beyond the endpoint that we use, but then I'd expect it to happen for more people. We shouldn't be all that unique.
[13:47:09] <elukey>	 my only fear is that at some point in the future we'll go https only and our code stops working
[13:48:47] <elukey>	 mediawiki is behing httpd, so it cannot really tell if it is https or not by itself
[13:49:01] <elukey>	 and envoy terminates TLS conns on the mw servers
[13:49:21] <elukey>	 so I guess it adds a special header to the requests that lands on httpd/mediawiki
[13:51:24] <klausman>	 I suspect something is stripping SSL (or that header)
[13:51:38] <klausman>	 maybe we need to poke the API guys, see if they can see something
[13:52:15] <elukey>	 I am tcpdumping on mw1317, I am pretty sure it is x-forwarded-proto: https
[13:54:27] <elukey>	 we issue an http request on our side, not https, that gets wrapped into a tls tunnel basically, so I guess that the information is picked up from envoy on the mw1317 side
[13:54:43] <elukey>	 it sees that it received a http request, and it doesn't set the header
[13:55:29] <elukey>	 even if in theory the HTTPS bit shouldn't really bubble up to L7
[13:56:34] <elukey>	 https://www.envoyproxy.io/docs/envoy/latest/faq/debugging/xfp_vs_scheme has some related info
[13:56:54] <elukey>	 "Generally users request https:// resources over TLS connections and http:// resources in the clear. However, it is entirely possible for a user to request http:// content over a TLS connection or in internal meshes to forward https:// requests in cleartext."
[14:12:33] <elukey>	 ok this is messy, I think we can live with a warning for the moment :D
[14:27:12] <klausman>	 waitaminute. Is this maybe because in-stream (i.e. what the model generates) are http:// requests (in the sense of URL notation), but they're transported inside TLS? 
[14:27:34] <klausman>	 But HTTP/1.1 doesn't use the scheme really in requests, does it?
[14:29:52] <elukey>	 exactly I had the same thought, but https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/headers#scheme
[14:31:03] <klausman>	 does istio expose SchemeHeaderTransformation?
[14:31:31] <elukey>	 ??
[14:32:10] <klausman>	 That page points out that in Envoy, you can use that config to change scheme: headers
[14:32:16] <elukey>	 what do you mean with SchemeHeaderTransformation? The headers mentioned above? (just to be on the same page, my brain is already melted  :D)
[14:32:51] <klausman>	 Yes
[14:33:07] <elukey>	 klausman: the page says that envoy will add the x-forwarded-proto for HTTP 1.1 conns
[14:33:09] <klausman>	 Envoy apparently can be configured to some degree in what it does
[14:33:29] <elukey>	 "Envoy will always set the :scheme header while processing a request. It should always be available to filters, and should be forwarded upstream for HTTP/2 and HTTP/3, where x-forwarded-proto will be sent for HTTP/1.1."
[14:33:30] <klausman>	 > This default behavior can be overridden via the scheme_header_transformation configuration option.
[14:34:00] <elukey>	 ah ok got it
[14:34:02] <klausman>	 I was wondering if that  might give us a way to set a header that convinces api-ro that Everything I Just Fine™
[14:34:20] <elukey>	 we could try to set the header in the code
[14:34:41] <elukey>	 in the meantime I filed https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/839587
[14:34:44] <klausman>	 Might be the simplest thing to try first
[15:05:17] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "Left a comment that can be skipped, I'll let you decide. Thanks a lot for the patience!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932) (owner: 10AikoChou)
[15:05:49] <chrisalbon>	 Morning all!
[15:06:38] <elukey>	 o/
[15:12:03] <aiko>	 o/
[15:22:10] <wikibugs>	 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10hashar) I ran the script today and it is still a thing: ` $ python /home/thcipriani/elapsed_gc_time.py |sort -rn|head 7:14:43 TOTAL...
[15:23:43] <wikibugs>	 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10hashar)
[15:26:40] <wikibugs>	 (03PS3) 10AikoChou: editquality: align ORES prediction output with Lift Wing's one [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932)
[15:27:45] <wikibugs>	 (03PS4) 10AikoChou: editquality: align ORES prediction output with Lift Wing's one [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932)
[15:36:30] <wikibugs>	 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10hashar)
[15:38:20] <wikibugs>	 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10elukey) Really sorry for the issue :( We are actively working on Lift Wing and we hope to deprecate any git-lfs usage in favor of s...
[15:38:51] <wikibugs>	 (03CR) 10AikoChou: editquality: align ORES prediction output with Lift Wing's one (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/839516 (https://phabricator.wikimedia.org/T318932) (owner: 10AikoChou)
[15:43:49] <wikibugs>	 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10hashar) @elukey `scoring/ores/editquality.git` has a bunch of a large objects which somehow cause jGit gc to take four hours.  I th...
[15:44:38] <elukey>	 chrisalbon: --^ this is another example of the pain that we cause with git-lfs
[15:51:19] <elukey>	 folks I am rolling out the changes to fix the istio egress tls config
[15:51:27] <elukey>	 it will restart all revscoring pods on all clusters
[15:52:16] <elukey>	 will do staging today and check that all is fine, then ml-serve clusters tomorro
[15:52:19] <elukey>	 *tomorrow
[16:03:32] <wikibugs>	 10Machine-Learning-Team, 10Gerrit, 10Release-Engineering-Team (Seen): gerrit: scoring/ores/editquality takes a long time to git gc - https://phabricator.wikimedia.org/T237807 (10hashar) From Luca:  > You don't need to run the aggressive GC, ever. Unless the repo has massively changed its structure  The gc ca...
[16:08:04] <elukey>	 going afk folks!
[16:08:16] <elukey>	 have a nice rest of the day :)
[16:34:16] <klausman>	 elukey: \o when you do the rollout to prod tomorrow, please ping me, I'd like to follow along