[06:37:39] good morning :) [07:06:42] 10Machine-Learning-Team, 10Data Engineering Planning, 10Research: Proposal: deprecate the mediawiki.revision-score stream in favour of more streams like mediawiki-revision-score- - https://phabricator.wikimedia.org/T317768 (10elukey) >>! In T317768#8281631, @Ottomata wrote: > We put it in our current... [08:38:23] 10Machine-Learning-Team, 10serviceops, 10Patch-For-Review: Fix calico, cfssl-issuer and knative-serving Helm dependencies - https://phabricator.wikimedia.org/T303279 (10JMeybohm) Fixed for calico with v3.23.3 [08:52:11] going afk for some errands, ttl! [09:38:51] kevinbazira: o/ [09:41:58] https://phabricator.wikimedia.org/T317531#8279625 > I think you put the result in wrong cells in the table? The 2nd row of the old model should be exchanged with the 1st row of the new model? Could you have a look? [09:49:43] The old model's result drops when changing ref tags to sfn templates because the model doesn't recognize sfn templates. The new model's result should be consistent for ref tags and sfn templates. [09:51:15] right? [10:19:52] Morning! [11:42:07] I am checking the rps values reported in https://grafana.wikimedia.org/d/HIRrxQ6mk/ores?orgId=1&refresh=1m [11:42:49] and I think that we should really be able to sustain that workload with the current Lift Wing config, maybe just tuning wikidata/enwiki pods a little (like 2/3 pods each) [12:13:04] Agreed. Have we done load tests that are long enough to see how e.g. https://grafana.wikimedia.org/d/000000519/kubernetes-overview?orgId=1&var-datasource=thanos&var-site=codfw&var-cluster=k8s-mlserve "reacts"? [12:15:09] especially the bottom-most graphs and of course containewr details page [12:17:47] aiko, just for my curiosity, what does the GPL in gpllimit stand for? [12:18:18] (03CR) 10Klausman: [C: 03+1] outlink: add WP code list and increase gpllimit for MW API call [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/837642 (owner: 10AikoChou) [12:24:21] klausman: should be the pagination results, how many max to return in a single response (mwapi handles multiple calls transparently for each pagination) [12:24:52] ah so something like "general page length" or somesuch? [12:28:03] (03CR) 10Elukey: outlink: add WP code list and increase gpllimit for MW API call (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/837642 (owner: 10AikoChou) [12:32:40] yep [12:36:42] it is very weird, I see these messages in kserve logs [12:36:43] 'HTTP used when HTTPS was expected.\nSubscribe to the mediawiki-api-announce mailing list at for notice of API deprecations and breaking changes. Use [[Special:ApiFeatureUsage]] to see usage of deprecated features by your application.' [12:37:36] Hmm. So we speak http to istio, and it _should_ then do TLS to the MW API, right? [12:38:20] I am wondering if I have to set something like "https://api-ro.discovery.wmnet:80" [12:38:38] api-ro speaks TLS on port 80? [12:38:44] TIL. [12:39:06] no it doesn't, it is to trick istio [12:39:46] I would've thought that you can tell it to elevate certain requests from plain to TLS [12:40:07] Without trickery, that is [12:40:26] in theory the sidecar does it behind the scenes, but if the mw api servers answer with "this is HTTP etc.." then something is off [12:40:36] Agreed [12:41:13] how does istio know to do the elevation? Does it just try it for every request it gets? [12:41:15] also the same trick for eventgate doesn't work [12:41:36] it is stored in the destination-rule/service-entry config [12:42:12] https://istio.io/latest/docs/tasks/traffic-management/egress/egress-tls-origination/ [12:42:44] klausman: https://www.mediawiki.org/wiki/API:Links > this is the related doc, but it's not very clear. It mentions "pllimit" is how many links to return. prop=links (pl) [12:43:55] elukey: I can't find any active rule for that [12:44:30] klausman: where did you check? [12:44:51] the deplyment-charts repo [12:44:59] klausman: I guess "g" means generator, because we set generator=links [12:45:10] klausman: it is in admin_ng's ml-serve.yaml [12:45:12] aiko: ah, makes sense. Thanks! [12:46:49] elukey: argh. destinationRule vs. destination_rules [12:48:02] klausman: sure feel free to change it [12:49:40] nah it's fine [12:50:29] I am not sure I understand the docs on the `port` item correctly [12:51:02] or rather, if I am looking at the right thing. [12:51:55] if ./helmfile.d/admin_ng/values/ml-serve.yaml:201 ff are isntructing istio to accept plain req's on p80, but elevate them to TLS, I am not quite sure where the 80->443 thing would come from [12:52:23] from the service entry [12:52:26] in theory [12:52:42] see the example in https://istio.io/latest/docs/tasks/traffic-management/egress/egress-tls-origination/#tls-origination-for-egress-traffic [12:53:42] aaah, the missing piece was the serviceEntry [12:53:58] ok, then I we got the right config [12:54:20] Are we maybe hitting an API endpoint that isn't api-ro? [12:54:38] there is also another bit, the virtualservice [12:55:04] it is not mentioned in the above docs since I think it changed (got simplified) in recent istio versions [12:55:20] but in https://github.com/istio/istio/issues/33105 for example, there is a config with virtual service - destination rule - service entry [12:55:29] and in the vs we explicitly map 80->443 [13:08:06] (03PS2) 10AikoChou: outlink: add WP code list and increase gpllimit for MW API call [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/837642 [13:09:46] ah interesting - I am using the wrong port in the api-ro's destination rule (443 instead of 80) and for some reason it worked [13:10:02] If I use port 80 I see [13:10:02] {"error": "Could not decode as JSON:\nupstream connect error or disconnect/reset before headers. reset reason: connection termination"} [13:10:13] that seems to be the issue in https://github.com/istio/istio/issues/33105 [13:10:42] no mmm [13:10:45] istio is so clear :D [13:11:00] "Clear as mud", as Stevie says [13:11:25] ok yeah now I see, the istio upstream docs are different from https://github.com/istio/istio/issues/33105 [13:12:56] Do they maintain the old docs somewhere? [13:15:54] there is https://istio.io/v1.9/docs/tasks/traffic-management/egress/egress-tls-origination/ [13:16:26] it doesn't mention the vs, weird [13:16:36] I remember that in my tests without the vs it didn't work [13:16:48] time to do some tests again on staging [13:16:51] sigh [13:16:52] 10Machine-Learning-Team, 10Observability-Metrics, 10serviceops, 10Kubernetes: Don't scrape every containerPort for metrics - https://phabricator.wikimedia.org/T318707 (10bking) a:05bking→03None Unassigning for now, will circle back later this week or next week to discuss further. [13:17:40] afk for a bit [13:36:33] kevinbazira: did you see Aiko's comment in IRC about https://github.com/wikimedia/articlequality/pull/174 ? [13:36:54] nope ... let me check [13:39:05] I've just seen it in the logs ... [13:40:43] I figured :) [13:48:09] thanks for the notice elukey [13:48:21] klausman: I tried to follow the istio tutorial to the letter, I believe that the errors that I found at the time (when testing) were https://github.com/istio/istio/issues/33105 [13:48:25] sigh [13:48:27] kevinbazira: np! [13:50:09] elukey: so it's "just" a cipher problem? [13:50:13] 10Machine-Learning-Team: Migrate ORES clients to LiftWing - https://phabricator.wikimedia.org/T312518 (10Isaac) > Do you think that we need the Mariadb tables for the dumps, or would it be ok to explore alternatives like the mediawiki.revision-score dataset in Hive/HDFS? If the latter is viable we could partner... [13:52:04] aiko, sorry when my internet drops, I could miss some IRC messages. To respond to your question - the results in https://phabricator.wikimedia.org/T317531#8279625 are from the text_35130784 and text_35130948 which are subtrings of revids 35130784 and 35130948. The results were not put in wrong cells. You could also simulate it using the same workflow and will get the same results. [13:52:53] klausman: still need to figure it out, but the problem described in the gh issue is very similar to what I see now [13:53:30] klausman: I also checked better the istio-proxy logs, and indeed now (with the new config from the istio docs) I see the correct upstream host being used [13:53:48] so for some reason, I hit the bug and found a config that worked with api-ro [13:53:58] but failing a little between the logs [13:54:02] so lucky [13:54:35] yeah, complete, loud failure is better than this fuzzy half-working thing [13:54:55] And if the GW didn't yell about not using crypto, we might have never noticed. [13:55:20] imagine if we were talking to an outside service, and had sent/received sensitive stuff :( [13:56:03] not sure why the config worked, but api-ro doesn't accept non-tls conns [13:56:09] so it was surely encrypted [13:56:18] this is why I was relatively sure it worked [13:57:00] But.... why were getting the "http used when you should use https" message? [13:57:32] there may be something weird happening between istio -> lvs -> nginx -> apache (on mw servers) [13:57:37] with the current config [16:29:05] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): API Gateway Integration - https://phabricator.wikimedia.org/T288789 (10hnowlan) [16:29:17] 10Machine-Learning-Team, 10Platform Team Initiatives (API Gateway), 10Platform Team Workboards (Platform Engineering Reliability): Proposal: add a per-service rate limit setting to API Gateway - https://phabricator.wikimedia.org/T295956 (10hnowlan) 05Open→03Resolved a:03hnowlan [16:32:28] * elukey afk! [16:53:41] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10Umherirrender) [23:30:58] 10Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Preferences, 10Moderator-Tools-Team (Kanban), 10Patch-For-Review: When ORES quality filters are selected in mobile web, entries should be highlighted - https://phabricator.wikimedia.org/T314026 (10Jdlrobson) if highlighting is not working then it's likely...