[12:59:57] btullis: o/ trying to deploy istio on dse [13:01:24] elukey: much appreciated. [13:01:46] worked! [13:01:47] root@deploy2002:/srv/deployment-charts/custom_deploy.d/istio/dse-k8s# kubectl describe pods istiod-658c96c486-244lt -n istio-system | grep seccomp seccomp.security.alpha.kubernetes.io/pod: runtime/default [13:01:54] well horrible paste sorry [13:02:03] but the violation should be gone [13:02:48] do you want also to do the cni removal? [13:03:07] after that we should be able to also clean up via puppet etc.. [13:04:30] btullis: --^ [15:16:47] btullis: trying to ping again :) [15:17:04] Oh, sorry. Looking now. [15:18:29] That's excellent. Thank you. I think that removing the cni is a good idea, but not sure how to proceed. [15:19:10] btullis: I can do it, my idea is to: 1) istioctl etc.. 2) merge the puppet change to clean up [15:19:22] we can test 2) on one kubelet in isolation to be sure [15:19:38] but the istio cni is only used by the istio sidecars [15:19:42] so it should be really safe [15:21:55] That would be excellent. Many thanks. [15:26:31] all right going to do it in a bit [15:37:45] Thank you. I'll keep an eye out for the puppet patches./ [15:37:54] patch [15:40:30] btullis: istioctl executed, all good afaics [15:40:40] ✔ Istio core installed [15:40:43] ✔ Istiod installed [15:40:46] ✔ Ingress gateways installed [15:40:49] - Pruning removed resources [15:40:52] Removed Deployment:istio-system:istio-cni-node. [15:40:55] Removed ConfigMap:istio-system:istio-cni-config. [15:40:57] Removed ServiceAccount:istio-system:istio-cni. [15:41:00] Removed ClusterRole::istio-cni. [15:41:02] Removed ClusterRole::istio-cni-repair-role. [15:41:05] Removed ClusterRoleBinding::istio-cni. [15:41:07] Removed ClusterRoleBinding::istio-cni-repair-rolebinding. [15:41:10] ✔ Installation complete [15:41:58] so we can now proceed with the puppet patch [15:42:11] klausman: o/ I've updated the istio config on ml-staging-codfw, all good as well [15:42:28] excellent, thank you! [15:42:46] btullis: proceeding with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1114753?forceReload=true [15:43:04] klausman: the next step is to run it for prod, but we can wait [15:43:13] it touches only istiod pods and they are fine on staging [15:43:23] I'm currenylu in a meeting, so maybe tomorrow? [15:43:50] yes yes anytime [15:43:58] it is for the PSS migration, so even next week [15:47:11] btullis: dse-k8s-worker1001 updated, and kubelet restarted just to be sure, all good! I'll let puppet finish the work [15:47:24] lemme know if you notice anything weird later on [15:48:03] Great. FYI, this one is bullseye (because of rocm packages that we haven't upgraded to bookworm) - Maybe it would be good to apply and restart a bookworm kubelet, too? [15:50:19] nah it should be fine [16:05:16] does anyone know how I should proceed to debug the 404 I'm getting when reaching out to mw-misc via the mesh? Thanks! [16:05:16] >>> requests.get("http://envoy:6508/conf/dblists/open.dblist", headers={"Host": "noc.wikimedia.org"}) [16:05:16] [16:05:16] >>> requests.get("https://mw-misc.discovery.wmnet:30443/conf/dblists/open.dblist", headers={"Host": "noc.wikimedia.org"}) [16:05:16] [16:06:25] brouberol: what's the namespace? [16:06:36] that'd be dse-k8s-eqiad/airflow-test-k8s [16:09:19] note that _I_ added mw-misc / 6508 to the mesh, so it's more than probable that I have messed up something [16:10:24] the discovery DNS points to a k8s ingress IIRC, which differs from other mw-api services, which have their dedicated VIP and pybal entries [16:11:37] i think it may be auto_host_rewrite: true in the envoy config [16:11:53] Indicates that during forwarding, the host header will be swapped with the hostname of the upstream host chosen by the cluster manager. This option is applicable only when the destination cluster for a route is of type strict_dns or logical_dns, or when hostname field is not empty. [16:12:01] where can I test this? [16:12:14] do you have a host with a pod I can nsenter into? [16:17:01] yes, 2s [16:17:32] do you need the envoy pod, or pod that talks to envoy? [16:17:51] pod that talks to envoy is fine [16:17:54] airflow-scheduler-6877b455d7-dxs8x 2/2 Running 0 16m 10.67.26.70 dse-k8s-worker1006.eqiad.wmnet [16:18:11] (I use python3/requests as the image does not have curl) [16:27:32] Can i edit the envoy configmap or am I going to break stuff? [16:28:26] I bet it's sets_sni that strips the host header, in which case we may need to set http_host in the service definition maybe [16:28:33] if I can read envoy config and doc correctly [16:29:55] I feel like a few of the times I've edited things like configmaps or deployments by hand, I've been able to cause changes that `helmfile diff` doesn't see [16:30:02] (but not all of the times) [16:30:35] i'd put it back as it was after :p [16:33:20] > Can i edit the envoy configmap or am I going to break stuff? [16:33:20] please do [16:33:43] this is my team's sandbox [16:34:05] oh god it's full of \n [16:34:46] oh, you're trying to edit the config by hand? If you tell me what you need to change, I can do that in an easier way (https://wikitech.wikimedia.org/wiki/User:Brouberol#Test_out_local_changes_in_a_test_application_in_Kubernetes) [16:34:56] it's easier than kubectl edit [16:35:16] (source: me, who had to kubectl edit some inlined python code defined in a configmap) [16:38:09] I wanted to put append_x_forwarded_host: true in there to see if my theory was correct [16:38:40] and also try without auto_host_rewrite and with host_rewrite_literal: noc.wikimedia.org [16:44:41] _stares in the 100000 lines of envoy configuration_ [16:45:05] well that's not working anyways [16:45:17] (i did the manual edit, and it's not adding the header) [16:45:36] I have a family thing rn, and I'm taking the train to brussels tomorrow, so don;t lose sleep over it FWIW [16:47:14] Ah HA it works [16:47:23] ok so [16:49:04] What I think happens is that mw-misc.discovery.wmnet doesn't match any virtualhost in apache [16:49:04] envoy with sets_sni replaces your host header with mw-misc.discovery.wmnet [16:49:08] so you have to make a custom listener just for noc that doesn't put in sets_sni: true, but http_host: 'noc.wikimedia.org' instead [16:49:48] that will force envoy to serve that as a host header no matter what [16:50:43] gotcha, thanks. I'll do that probably on friday. <3<3 [16:50:55] now what happens if i don't set any of this [16:51:52] ah, I crash envoy, that's what happens [17:04:41] ok I fixed it, it's back to the original config [17:06:14] brouberol: there's probably a "better" way to do this, but that's how I can get you to a working state with what I know of envoy [17:06:20] maybe j.ayme has a better idea