[15:45:06] <inflatador>	 ^^ Still having DNS resolution issues on dse-k8s: https://phabricator.wikimedia.org/T346048  Any suggestions? Right now I'm just tcpdumping and comparing configs with staging
[15:49:00] <inflatador>	 btullis any objections to me deploying spark on dse-k8s? I think rdf-streaming-updater is the only non-control plane service that has a helmfile ATM
[15:52:41] <elukey>	 inflatador: what kind of dns resolution issues? Do you have specific queries that hang?
[15:53:43] <elukey>	 the pods should use coredns that is correctly deployed on the nodes
[15:53:59] <elukey>	 (plus if it didn't work we'd have seen explosions earlier on probably)
[15:56:45] <inflatador>	 elukey the containers appear to mount the resolv.conf from the host, which is not a CoreDNS IP, and they can't resolve using the hosts' resolver. I've one-offed a host to add a CoreDNS resolver and the container can properly resolve with that IP, but the containers I looked it in staging were just mounting resolv.conf from the host
[15:59:50] <elukey>	 inflatador: do you have an example pod to check?
[16:01:09] <inflatador>	 elukey flink-app-wdqs-64c5576cc5-pwqgp on dse-k8s-worker1001.eqiad.wmnet
[16:02:04] <inflatador>	 are the containers expected to have a full routing table? That ctr only has an apipa route
[16:02:28] <akosiaris>	 no server is expected to have a full routing table in our env
[16:02:32] <akosiaris>	 or container
[16:02:37] <elukey>	 nameserver 10.67.32.3
[16:02:45] <akosiaris>	 a full routing table btw is >200k routes
[16:02:47] <elukey>	 it seems the correct one 
[16:03:09] <inflatador>	 akosiaris sorry ,what I mean is a non-apipi route
[16:03:13] <inflatador>	 apipa that is
[16:03:47] <elukey>	 inflatador: the pod doesn't mount any resolv.conf afaics
[16:04:30] <akosiaris>	 dnsPolicy: ClusterFirst so yeah it should use CoreDNS
[16:09:57] <akosiaris>	 inflatador: btw, regarding the route thing. It's a nice surprise first time you encounter it. The logic is explained here: https://docs.tigera.io/calico/latest/reference/faq#why-does-my-container-have-a-route-to-16925411
[16:11:36] <inflatador>	 akosiaris Ah, makes sense. I've seen AWS doing magic with APIPA too
[16:19:53] <akosiaris>	 a quick check btw: flink@flink-app-wdqs-64c5576cc5-pwqgp:/usr/local/lib/python3.7/dist-packages/pyflink$ host swift.discovery.wmnet
[16:19:53] <akosiaris>	 swift.discovery.wmnet has address 10.2.2.27
[16:20:19] <akosiaris>	 maybe it's not DNS but some network policies, add some more info in that task and maybe we can help more
[16:20:54] <inflatador>	 must be...thanks for taking a look! 
[16:21:54] <inflatador>	 FWiW that def wasn't working earlier
[16:22:41] <inflatador>	 working now though...y'all are magic ;P
[16:25:46] <akosiaris>	 I just did a kubectl exec, nothing more
[16:26:30] <inflatador>	 I've been using `nsenter -t 2660329 -n` to check, and I'm seeing the DNS issue again, should I use `kubectl exec` instead?
[16:26:49] <akosiaris>	 nsenter -n only enters the network namespace
[16:26:59] <akosiaris>	 not the mount namespace, so you use the hosts mounts
[16:27:11] <akosiaris>	 and that's why you saw the resolv.conf of the host
[16:27:17] <akosiaris>	 pass -m too to nsenter
[16:27:42] <akosiaris>	 but then you are faced with the problem of not having in the container the tools that you have on the host ofc. 
[16:27:53] <akosiaris>	 thankfully the host still exists
[16:28:05] <inflatador>	 yeah, but our containers are still way better for that than most ;)
[16:29:12] <inflatador>	 anyway, DNS doesn't seem to be the issue, sorry for the troublhe
[16:31:05] * inflatador should've just used good ol' docker exec
[16:48:09] <inflatador>	 I think this is related to the weird helmfile.d/dse-k8s-services/rdf-streaming-updater/values-dse-k8s-eqiad.yaml override stuff we're doing. e-lukey already advised against doing this, but I got some pushback on that from my SWE. I'll talk it over w/him again
[16:53:32] <ebernhardson>	 at times i think templating in values files would be useful, and helmfile supports it with templates suffixed in .gotmpl, but i notice we don't use it anywhere so i've been avoiding it. Is there a particular reason it's not used anywhere?