[05:52:13] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10CheckUser, 10Growth-Team, and 27 others: WikiPage::doEditContent falls back to $wgUser - https://phabricator.wikimedia.org/T255507 (10DannyS712) [06:09:33] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10CheckUser, 10Growth-Team, and 27 others: WikiPage::doEditContent falls back to $wgUser - https://phabricator.wikimedia.org/T255507 (10DannyS712) [06:23:47] hi! [06:23:54] from kfserving's slack channel: https://docs.google.com/presentation/d/1JFI0lY_M5NOnRVZpH9wFDY5ZIIdylAemyOPX-_cSzLg/edit#slide=id.gdfed017b30_1_6 [07:07:13] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10CheckUser, 10Commons, and 29 others: WikiPage::doEditContent falls back to $wgUser - https://phabricator.wikimedia.org/T255507 (10DannyS712) [07:21:24] really interesting: [07:21:25] https://towardsdatascience.com/lessons-on-ml-platforms-from-netflix-doordash-spotify-and-more-f455400115c7 [07:43:02] running errand for ~1h/1:30h, ttl! [09:27:25] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10CheckUser, 10Commons, and 30 others: WikiPage::doEditContent falls back to $wgUser - https://phabricator.wikimedia.org/T255507 (10DannyS712) [10:55:36] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10CheckUser, 10Commons, and 30 others: WikiPage::doEditContent falls back to $wgUser - https://phabricator.wikimedia.org/T255507 (10Nikerabbit) [10:56:26] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10CheckUser, 10Commons, and 30 others: WikiPage::doEditContent falls back to $wgUser - https://phabricator.wikimedia.org/T255507 (10Nikerabbit) [12:18:04] wow the knative istio config is intricate [12:18:31] but I was able to add an HTTPS gateway to ingressgateway [12:19:38] but it is still not working due to the TLS name mismatch [12:28:18] something is working but still looking a little weird [12:28:25] so far I got this working [12:28:40] echo $SERVICE_HOSTNAME [12:28:40] enwiki-goodfaith.default.example.com [12:28:46] VICE_HOSTNAME}" https://inference-service.wikimedia.org:31137/v1/models/${MODEL_NAME}:predict -X POST -d @input.json [12:28:58] err sorry [12:28:59] curl --resolve inference-service.wikimedia.org:31137:$(./minikube ip) --cacert ingress-tls/wikimedia.org.crt -v -H "Host: ${SERVICE_HOSTNAME}" https://inference-service.wikimedia.org:31137/v1/models/${MODEL_NAME}:predict -X POST -d @input.json [12:29:58] I added a gateway listening on port 443 (mapped to 31137's node port) with a TLS cert for inference-service.wikimedia.org [12:30:34] but the gateway needs the Host: header properly set up (in this case, enwiki-goodfaithetc..) to find the target [12:31:49] ah but this makes sense, TLS uses SNI etc.. [12:32:56] well no sorry scratch that, in this case the TLS handshake happens since I manually added a "fake" DNS entry [12:33:04] via --resolve [12:33:33] all right then it looks working, we need to brainstorm a little how this API will be called [12:33:59] (.example.com is of course not greaT) [12:34:49] going to try to start from scratch again in minikube to see if the updated settings work [12:35:29] the multi-gateway config between istio/knative is a little confusing [12:35:34] and intricate [13:51:25] ok so remaining things to do: [13:51:45] 1) add fixed nodePorts to the istio config for ingress (atm they are randomly assigned) [13:52:02] 2) figure out what to use as endpoint for health checking [13:52:16] the 2) will be useful when creating the config for the LVS load balancer [14:02:56] ok found something, there is a health URI for the istio gateway as whole [14:03:00] not for all the single services [14:03:07] may be a compromise, should work [14:04:01] oooook I have to retry this from scratch just to be sure, but it looks like we have something [14:04:34] once we have the TLS certs created and added as secrets to k8s (one for the ingress gw, the other one for the kfserving webhook) we should be ok [15:00:55] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10RonnieV) I am just looking at the table @Halfak gave on June 3. Two of the parameters say 'enwiki' in stead of 'nlwiki'... [16:20:07] o/ [16:20:20] o/ [16:20:23] elukey: sounds like things are going well [16:20:40] i agree about figuring out our DNS for the api [16:20:44] accraze: if you mean my mental sanity I'd say no, but the rest looks better :D [16:20:47] hahaha [16:21:32] does it sound ok what I wrote abouve? Basically the LVS endpoint will be inference-service.wikimedia.org (or any other name that we want), and it will have a TLS cert [16:21:44] then we'll use a Host: blabla http header to target a specific backend [16:21:53] in this way it looks working fne [16:21:59] yeah that sounds reasonable to me [16:23:57] perfect, I am now trying to translate all of this into something suitable for deployment charts [16:24:56] haha yeah that's similar to where i'm at right now, will be spending the day figuring out blubber for the revscoring images and then looking at making charts for the inference services [16:25:25] things are going well on the ORES migration front, kevinbazira has created a model server for our article quality models which are running well, so now we have 2/3 model classes that can run on Lift Wing [16:25:37] very nice! [16:26:15] I guess that when we'll put all together it will explode for sure, but after some debugging we should be able to have a good lift wing prototype [16:26:53] haha yeah it will be interesting to see the first pass of putting everything together [16:31:07] I feel a little better now in navigate/debug the whole stack, but istio and knative are definitely intricate [16:31:17] and upgrading the settings will be very delicate [16:31:56] one good thing is that, in theory, given the fact that we'll be active / active in eqiad/codfw we should be able to drain a cluster at the time and operate on it freely [16:32:54] yeah, i kinda hope they make those pieces of the stack optional sooner rather later [16:33:47] but good point about draining clusters [16:34:50] istio and knative are very powerful, I doubt that they will replace them easily [16:37:38] the only bit that I don't like is that kfserving doesn't really give a lot of documentation about how things works internally [16:38:10] either you use the upstream configs and apply them without question asked, or if you need to make changes you need to check a ton of yaml things [16:56:57] going out for a run, ttl :) [16:57:13] have a good evening elukey! [22:12:37] 10Lift-Wing, 10artificial-intelligence, 10articlequality-modeling, 10revscoring, 10Machine-Learning-Team (Active Tasks): Create a KFServing model server for articlequality models - https://phabricator.wikimedia.org/T284678 (10ACraze) @kevinbazira excellent work on this so far! I have tested it out on the... [22:29:32] 10Lift-Wing, 10artificial-intelligence, 10editquality-modeling, 10revscoring, 10Machine-Learning-Team (Active Tasks): Create migration plan for editquality models from ORES to Lift Wing - https://phabricator.wikimedia.org/T284689 (10ACraze) [22:43:49] 10Lift-Wing, 10artificial-intelligence, 10editquality-modeling, 10revscoring, 10Machine-Learning-Team (Active Tasks): Create migration plan for editquality models from ORES to Lift Wing - https://phabricator.wikimedia.org/T284689 (10ACraze) We have been discussing how best to handle deployment and CI/CD...