[06:51:38] <elukey>	 accraze: o/ I will check later on to see what the error is
[08:32:57] <wikibugs>	 10Machine-Learning-Team, 10Platform Team Initiatives (API Gateway): Proposal: add a per-service rate limit setting to API Gateway - https://phabricator.wikimedia.org/T295956 (10elukey)
[11:01:11] <wikibugs>	 10Machine-Learning-Team, 10Platform Team Initiatives (API Gateway): Proposal: add a per-service rate limit setting to API Gateway - https://phabricator.wikimedia.org/T295956 (10Urbanecm) No problem with this (if there's an usecase here), but note that individual clients with an acceptable need for higher rate...
[12:16:56] <wikibugs>	 10Machine-Learning-Team, 10Platform Team Initiatives (API Gateway): Proposal: add a per-service rate limit setting to API Gateway - https://phabricator.wikimedia.org/T295956 (10elukey) Hi @Urbanecm! Thanks for the link, very interesting, I didn't know it.  My understanding of the API-Gateway is still very high...
[12:21:56] <elukey>	 istio network policies deployed! They seem working fine
[12:22:01] <elukey>	 tried to kill some pods too
[12:22:21] * elukey lunch
[12:50:29] <elukey>	 the last step is to add default-deny to global network policies
[13:27:56] <elukey>	 deployed! I am deleting pods to make sure that they can be re-created correctly
[13:28:25] <elukey>	 it will also be good for https://phabricator.wikimedia.org/T289578 (I restarted docker on ml-serve nodes yesterday)
[13:40:32] <elukey>	 all pods restarted
[13:43:19] <elukey>	 all metrics look good
[13:43:23] <elukey>	 \o/
[13:59:35] <wikibugs>	 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Add network policies to the ML k8s clusters - https://phabricator.wikimedia.org/T289834 (10elukey) Istio policies applied, plus global default-deny (same used in other clusters) applied. Deleted all the pods, they came back up correctly!
[13:59:56] <wikibugs>	 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Lift Wing proof of concept - https://phabricator.wikimedia.org/T272917 (10elukey)
[14:00:05] <wikibugs>	 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Add prometheus metrics collection for Istio and Knative - https://phabricator.wikimedia.org/T289841 (10elukey) 05Open→03Resolved a:03elukey
[14:02:45] <wikibugs>	 10Lift-Wing: Bootstrap the ml-serve-codfw cluster - https://phabricator.wikimedia.org/T294412 (10elukey) The base network policies have been deployed to eqiad, so we can proceed with codfw (to double check that everything will go fine when starting from scratch). I reviewed the puppet private and public config,...
[14:27:42] <elukey>	 going to run errand for a couple of hours, ttl!
[15:51:12] <wikibugs>	 10Machine-Learning-Team, 10artificial-intelligence, 10Wikidata, 10Wikidata-Query-Service, 10articlequality-modeling: Add ORES article quality predictions to the WDQS - https://phabricator.wikimedia.org/T257341 (10Gehel)
[16:37:19] <accraze>	 o/
[17:03:36] <elukey>	 accraze: o/
[17:03:45] <elukey>	 still haven't checked the ml sandbox, doing it now!
[17:04:52] <elukey>	 mmm how do you check pods etc..?
[17:04:56] <elukey>	 I guess there is minikube
[17:05:29] <elukey>	 ahh it is installed okok
[17:06:21] <accraze>	 elukey: yeah its minikube, i think the issue is related to istio gateway but unsure
[17:06:36] <elukey>	 accraze: how do you check pods?
[17:06:45] <accraze>	 k get po -A
[17:08:14] <elukey>	 and those are aliases, yeah I see in your home
[17:08:54] <accraze>	 ahh whoops yeah.... i aliased `minikube kubectl` to be `k`
[17:13:54] <elukey>	 accraze: so I think that it works for you since you have a .kube dir in your homedir
[17:13:58] <elukey>	 with the config etc..
[17:14:53] <wikibugs>	 10Jade, 10Machine-Learning-Team: Investigate MCR support gap for Jade purposes - https://phabricator.wikimedia.org/T204303 (10CBogen)
[17:19:11] <elukey>	 how did you and Kevin shared minikube on the previous sandbox?
[17:20:40] <accraze>	 the minikf distro did it all for us :/
[17:21:10] <elukey>	 ahhh
[17:24:32] <accraze>	 oh i remembered i installed minikube to /usr/local/bin
[17:25:31] <accraze>	 oof
[17:30:13] <elukey>	 accraze: the main issue is that if I run minikube etc.. I get that no cluster is running
[17:30:30] <elukey>	 I can try an hack and copy your .kube dir
[17:30:51] <accraze>	 ah i see, yeah copying the .kube dir might work
[17:32:19] <elukey>	 also the .minikube dir
[17:32:24] <elukey>	 plus some tweaks
[17:36:44] <elukey>	 accraze: ok I made it :D
[17:37:29] <elukey>	 one thing that I noticed is that there is no cluster local gateway pod
[17:37:40] <elukey>	 what istioctl config did you apply?
[17:39:44] <accraze>	 when i do `k get gw -A` it shows cluster-local-gateway in the knative-serving namespace
[17:41:22] <accraze>	 i used istioctl from wmf APT repo and then applied istio-minimal-operator.yaml
[17:41:26] <accraze>	 https://wikitech.wikimedia.org/wiki/User:Accraze/MachineLearning/Local_Kserve#Istio
[17:44:28] <elukey>	 serving.knative.dev/release=v0.22.0
[17:44:30] <elukey>	 :)
[17:44:42] <elukey>	 you are in the future!! :D
[17:44:50] <accraze>	 omggg it should v.18.x??
[17:45:15] <elukey>	 exactly yes! From 0.19 onward we don't need the cluster local gateway pod in the istio namespace
[17:45:23] <accraze>	 ahhhhhh
[17:46:03] <accraze>	 okok that makes alot of sense
[17:52:14] <accraze>	 i'll think about how to share the cluster for all users, i think minikf ran everything inside of virtualbox
[17:57:43] <accraze>	 oh actually when i check the knative version i get v0.18.1
[17:58:00] <accraze>	 `kubectl get namespace knative-serving -o 'go-template={{index .metadata.labels "serving.knative.dev/release"}}'`
[18:03:52] <elukey>	 how did you deploy knative?
[18:06:06] <accraze>	 i downloaded the crd/core/release yaml from github and then changed the images to use the ones from wmf registry
[18:08:09] <accraze>	 and then just did k apply -f ... for each of those
[18:09:52] <elukey>	 ah weird then
[18:10:15] <elukey>	 ahhh ok wait a min, you picked the last release yaml files right?
[18:10:20] <elukey>	 not the 0.18.1 ones
[18:12:15] <accraze>	 nah i  actually did the 0.22.0 yaml files last week and then re-did it on monday using the 0.18.1 files
[18:12:56] <accraze>	 kinda confused how you are seeing the 0.22.0 versions still
[18:18:38] <elukey>	 very weird then
[18:18:49] <elukey>	 I just did kubectl describe pod blabla on one
[18:20:44] <elukey>	 the docker image is Image:          docker-registry.wikimedia.org/knative-serving-activator:0.18.1-3-20211107
[18:21:20] <elukey>	 so I can't explain the 0.22, maybe when you re-applied the 0.18.1 config it was not from a clean state?
[18:22:06] <elukey>	 anyway, for istio we surely need to add the cluster-local-gateway
[18:22:13] <accraze>	 haha yeah maybe that was it, i'll try tearing down and starting over today now that i have a better understanding of things
[18:22:34] <elukey>	 the one that we have in deployment charts explicitly creates it
[18:22:51] <elukey>	 otherwise local comms betwen pods will not happen (since we don't use the mTLS configs)
[18:36:59] <elukey>	 going afk! 
[18:37:15] <elukey>	 accraze: please post any questions/doubts/etc.. in case, I'll try to answer tomorrow morning!
[18:38:30] <accraze>	 ok cool thanks, have a good evening elukey!
[18:44:23] <elukey>	 accraze: I just realized one thing while walking away from the keyboard :D You were talking about a dev chart, but we could try this compromise
[18:44:38] <elukey>	 when I test charts locally I usually do
[18:45:26] <elukey>	 helm3 template 'charts/knative-serving'
[18:45:33] <elukey>	 in the deployment-chart repo
[18:45:55] <elukey>	 if you don't pass to it anything, it will use the values.yaml contained in the chart, with default values
[18:46:15] <elukey>	 but we set other ones via helmfile
[18:46:59] <elukey>	 we can try
[18:47:00] <elukey>	 helm3 template -f helmfile.d/admin_ng/knative-serving/values.yaml charts/knative-serving
[18:47:59] <elukey>	 if you save it as yaml file it will be basically what we apply in production
[18:48:03] <elukey>	 for the kserve the same
[18:48:12] <elukey>	 (but changing file paths of course)
[18:48:24] <elukey>	 for kserve-inference
[18:49:11] <elukey>	 helm3 template -f helmfile.d/ml-services/revscoring-editquality/values.yaml charts/kserve-inference
[18:49:14] <elukey>	 etc..
[18:49:55] <elukey>	 there are some things to tweak since you'll see a lot of RELEASE-NAME here and there (in prod helm replaces them)
[18:50:02] <elukey>	 but with a quick sed they can be changed
[18:50:25] <elukey>	 it would be nice to find something like the above that worked so it would be easy-ish to re-use our prod charts
[18:50:43] <elukey>	 EOF :)
[18:50:46] * elukey afk again
[18:51:40] <accraze>	 elukey: awesome thanks! i will give this a try this afternoon