[08:47:00] Good morning. I could do with a little help to troubleshoot a networking problem with datahub on staging, if anyone has time. Here's a summary of where I'm blocked: https://phabricator.wikimedia.org/T329514#8966597 [08:52:01] I'm trying to get two migration job running during a deployment of datahub, to help with the upgrade. One of the jobs reaches out to MariaDB (outside the cluster) and that's fine. The second also need to talk to an in-cluster service and it seems blocked. I'm drawing a bit of a blank, so I'd appreciate a second pair of eyes if possible. Thanks. [09:21:17] btullis: as that's pod-to-pod traffic I would assume it just works as long as gms has a correspinding ingress rule (which it has) [09:22:07] btullis: did you try to verify this via curl the jobs namespace for example? maybe the error message is missleading [09:22:56] jayme: No, not yet. That's a good idea. [09:27:28] jayme: Would that be with an `nsenter` command like this? https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Exec_into_a_pod_and_run_commands [09:27:49] yep [09:28:45] OK, thanks. I will try now. [10:16:13] https://www.irccloud.com/pastebin/ulich6mf/ [10:18:33] jayme: So it looks like at a network level I can get through. I had to use the IP of `datahub-gms-main-tls-service` because I had no DNS resolution. It's a bit tricky because the pod and namespace goes away when the job fails, so I have to be quick. [10:22:50] hm...DNS should work as well ofc [10:24:12] well..maybe not like in the container because nsenter will still use the nodes resolv.conf ins not netns specific one is provided [10:26:39] yeah, nsenter and DNS resolution don't play well together, cause you need to pass -m to get the mount namespace, but then you are in the container and you lose all the niceties of the tools on the host (and you could get anyway the same experience with either docker exec or kubectl exec) [10:27:02] the mount namespace is needed for getting the pods /etc/resolv.conf, just in case it isn't clear [10:27:39] but one could copy the resolv.conf from the container fs and place it in /etc/netns/... right? [10:29:38] hmm hadn't thought of that [10:29:49] I don't know, try it out and let me know ? [10:30:25] 302 Location: b.tullis :-) [10:34:39] Ah, thanks :) I might have a go at that. I'll see if I can get some way to get the container to hang around a bit longer as well. [13:48:00] jayme: I'm wondering if it's related to TLS now. I see that a `ca.crt` certificate that is being removed from the `datahub-gms-main-tls-proxy-certs` ConfigMap and the envoy image is updated to `1.18.3-2-s2` [13:58:24] yeah, because we ship the wmf-certificates package with the envoy image now and use that [13:59:28] the cert that gms tls-proxy uses is untouched [14:20:57] OK, thanks. Unrelated then. I have something to try, so I'll see how it goes.