[06:10:59] Goood morning! [09:52:01] Καλημέρα o/ [09:57:39] \o 早安 [09:57:45] :D [10:20:43] * aiko lunch [10:36:37] 06Machine-Learning-Team, 10MW-on-K8s, 06serviceops, 06SRE: Migrate ml-services to mw-api-int - https://phabricator.wikimedia.org/T362316 (10Clement_Goubert) 03NEW [10:36:51] 06Machine-Learning-Team, 10MW-on-K8s, 06serviceops, 06SRE: Migrate ml-services to mw-api-int - https://phabricator.wikimedia.org/T362316#9706397 (10Clement_Goubert) p:05Triage→03Medium [10:52:45] * isaranto lunch [11:44:00] 06Machine-Learning-Team, 10MW-on-K8s, 06serviceops, 06SRE, 13Patch-For-Review: Migrate ml-services to mw-api-int - https://phabricator.wikimedia.org/T362316#9706666 (10Clement_Goubert) [11:49:13] 06Machine-Learning-Team, 10MW-on-K8s, 06serviceops, 06SRE, 13Patch-For-Review: Migrate ml-services to mw-api-int - https://phabricator.wikimedia.org/T362316#9706673 (10Clement_Goubert) Aaaand I just realized they all use http and not https, so now I can change them all. [11:51:46] Hello :) So err, I just sent a big batch of CRs for switching ml-services to using mw-api-int-ro (mw-on-k8s instead of bare-metal) but... I just realised they all use http and not https. So first of all, sorry for the CR spam for non-functional stuff. Second, can the services use https directly (before I redo them all and resend a bunch of PS)? [12:01:01] hello! Dunno if this is important but I noticed that the rest-gateway is configured for this model with the wrong port https://gerrit.wikimedia.org/g/operations/deployment-charts/+/refs/changes/59/1018959/1/helmfile.d/ml-services/article-descriptions/values.yaml#74 [12:01:16] the correct port is 4113 and it uses https [12:05:53] hnowlan: this port (41111) is the configuration for istio which in turn will target the correct port (4113). the relevant configuration part is here https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/helmfile.d/admin_ng/values/ml-serve.yaml#358 [12:07:19] isaranto: oh, interesting! is the same true of api-ro.discovery.wmnet? [12:08:43] I'm about to delete a lot of CRs if it's true [12:08:52] hnowlan: yes it is https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/helmfile.d/admin_ng/values/ml-serve.yaml#328. elukey: can verify when he joins [12:08:52] cc: claime: [12:09:02] ty isaranto [12:09:15] I knew I should have checked in with y'all before wasting time x) [12:09:29] just wait a bit, but we'll help in any case, you don't have to do all this by yourself [12:20:54] if istio can use the same port on both sides then I *should* just have to add the right service entry, and destination rule [12:32:38] hello folks! [12:33:31] Hey Luca! [12:34:06] claime, hnowlan o/ - yes it is a "Feature" (with uppercase F) of the istio sidecars.. if you start a https/tls conn from within the python code, then the envoy/istio sidecar will have no choice but to simply act as l4 proxy. Since we want to have HTTP metrics, we use the trick of forcing http inside the python code and then the sidecar takes care of proxing to the right TLS port. [12:34:25] otherwise we wouldn't have http metrics :( [12:34:49] makes sense! the port on rest-gateway was just close enough to fool me into thinking it was a typo :D [12:35:18] yes yes I totally understand, maybe we should add comments so it is more clear [13:38:33] (03CR) 10Elukey: [C:03+1] revertrisk: use the Pytorch base image for RRML GPU inference (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou) [14:04:04] (03PS3) 10AikoChou: revertrisk: use the Pytorch base image for RRML GPU inference [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) [14:06:51] (03CR) 10AikoChou: [C:03+2] "Thanks for the review!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou) [14:14:02] (03CR) 10CI reject: [V:04-1] revertrisk: use the Pytorch base image for RRML GPU inference [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou) [14:29:58] (03PS10) 10Jsn.sherman: Exclude first/only revision on page from scoring [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014572 (https://phabricator.wikimedia.org/T356281) [14:39:59] (03CR) 10Jsn.sherman: Exclude first/only revision on page from scoring (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014572 (https://phabricator.wikimedia.org/T356281) (owner: 10Jsn.sherman) [15:11:15] (03CR) 10AikoChou: [C:03+2] "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou) [15:12:12] (03CR) 10CI reject: [V:04-1] revertrisk: use the Pytorch base image for RRML GPU inference [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou) [15:34:43] (03CR) 10AikoChou: [C:03+2] "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou) [15:35:32] (03Merged) 10jenkins-bot: revertrisk: use the Pytorch base image for RRML GPU inference [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1018240 (https://phabricator.wikimedia.org/T356045) (owner: 10AikoChou) [16:09:12] my .git directory disappeared from my local copy of inferences-services repo :( [16:09:59] I have no idea when and how, but I lost any branches I had local. at this point I don't remember if there was anything important or not (probably not) [16:11:08] :( [16:15:25] (03CR) 10Elukey: "First pass on the code, left some comments, lemme know your thoughts!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) (owner: 10Kevin Bazira) [16:16:31] time for a fresh start [17:55:02] (03CR) 10Ilias Sarantopoulos: "Thanks for working on this Kevin!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1017453 (https://phabricator.wikimedia.org/T361803) (owner: 10Kevin Bazira) [17:57:31] good evening folks o/ [23:51:07] 06Machine-Learning-Team: Review Revert Risk reports from WME - https://phabricator.wikimedia.org/T347136#9708893 (10leila) [23:52:01] 06Machine-Learning-Team: Review Revert Risk reports from WME - https://phabricator.wikimedia.org/T347136#9708892 (10leila) Hi folks. I don't see any further requests on this task. I'm going to remove it from Research's backlog. If this is incorrect add us back or assign a subtask to us.