[12:25:32] brouberol: o/ [12:25:45] I think something is off on dse's istio-system [12:26:04] get events says [12:26:05] 6m26s Warning FailedCreate replicaset/istiod-74b5fd75f Error creating: pods "istiod-74b5fd75f-66lhm" is forbidden: violates PodSecurity "restricted:latest": seccompProfile (pod or container "discovery" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") [12:26:49] I also tried [12:27:03] istioctl-1.15.7 manifest generate > /tmp/dse-generated.yaml and then istioctl-1.15.7 verify-install /tmp/dse-generated.yaml [12:27:06] and I see [12:27:18] ✘ Deployment: istiod.istio-system: Istio installation failed, incomplete or does not match "generated from in cluster operator installed-state": waiting for deployment "istiod" rollout to finish: 1 out of 2 new replicas have been updated [12:28:53] at this point I'd try to re-run istioctl manifest apply, it seems to me that istiod is not running the right config [12:59:36] I think that b.rouberol is going to be out today. Should I look into it, elukey? Or shall we leave it? [13:00:20] btullis: o/ if you have time we can check together, my impression is that the istiod's config is borked and I just re-run istioctl manifest apply -f etc.. [13:00:23] I don't believe that I've ever run the istio custom deployment, apart from maybe once when intially building the cluster. [13:01:00] if you want I can do it, otherwise I can give you the command [13:01:18] are we going to impact anything that runs on dse if it fails? Like say superset? [13:03:18] Yeah, superset, several airflow instances, mpic.wikimedia.org - They're OK to fail for a bit, though of course Superset is critical when under DDoS. [13:03:51] If you give me the command, I'll try it and we can see how it goes. [13:05:19] ack! So it needs first `kube-env admin dse-k8s-eqiad` [13:05:45] then [13:05:46] istioctl-1.15.7 manifest apply -f /srv/deployment-charts/custom_deploy.d/istio/dse-k8s/config.yaml [13:06:18] if you see above I checked with verify-install what would be changed, and istiod seems the only borked on [13:06:37] so even if it doesn't fixes, it should not cause any harm [13:06:45] lemme know if/when you run it :) [13:08:13] I have repeated your verify steps. Applying now. [13:11:50] I'm still getting `Waiting for Deployment/istio-system/istiod` [13:11:53] 94s Warning FailedCreate replicaset/istiod-74b5fd75f Error creating: pods "istiod-74b5fd75f-dxzlm" is forbidden: violates PodSecurity "restricted:latest": seccompProfile (pod or container "discovery" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") [13:14:28] https://www.irccloud.com/pastebin/LpDlnDHM/ [13:23:59] okok [13:28:36] istio-system Active 704d app.kubernetes.io/managed-by=Helm,app=raw,chart=raw-0.3.0,heritage=Helm,istio-injection=disabled,kubernetes.io/metadata.name=istio-system,pod-security.kubernetes.io/audit=restricted,pod-security.kubernetes.io/enforce=restricted,pod-security.kubernetes.io/warn=restricted,release=namespaces [13:29:01] so it seems to already have the enforce label set [13:30:51] ahhh wait ok [13:31:16] in admin-ng I see that all three PodSecurityStandard are already set [13:32:36] in theory the pods should already have the right settings [13:32:53] but I am wondering if the migration to PSS was done while istio was still in the weird state [13:33:45] so maybe we could try disabling the "enforce" flag temporarily, redeploy istioctl and re-enable it [13:35:51] context in https://phabricator.wikimedia.org/T369492 [13:35:59] jayme: o/ any ideas/suggestions? [13:41:31] 👀 [13:44:27] yeah you can def. remove the annotation from the namespace temporarily to unbreak itr [13:44:30] *it [13:46:05] but very much no idea how it came to this. IIRC the dse and ml istio config is pretty different from wikikube/aux [13:46:12] maybe there is something missing? [13:53:54] I rechecked as well but didn't find much [13:56:00] so to remove the label, kubectl label namespaces istio-system pod-security.kubernetes.io/enforce [13:56:07] - (at the end) [13:56:29] OK, doing it now. [13:56:35] nono wait sorry :D [13:56:37] it was a question [13:57:05] yeah in theory it should work btullis [13:57:10] OK, not doing it now :-) [13:57:16] let's do it: kubectl label namespaces istio-system pod-security.kubernetes.io/enforce [13:57:22] OK. [13:57:26] uff, - at the end after "enforce" [13:58:02] So: `kubectl label namespaces istio-system pod-security.kubernetes.io/enforce-` ? [13:58:10] yeah [13:58:26] `namespace/istio-system unlabeled` [13:58:40] Now try again with: `istioctl-1.15.7 manifest apply -f /srv/deployment-charts/custom_deploy.d/istio/dse-k8s/config.yaml` ? [13:58:52] I don't see the label anymore, good, let's retry the apply! [13:59:14] Proceeding. [13:59:36] https://www.irccloud.com/pastebin/xLn3tDUw/ [14:00:54] I think this is due to some istio hiccup - https://github.com/istio/istio/issues/39573 [14:01:06] we can remove the mutating webhook and retry [14:01:08] lemme check its name [14:02:48] root@deploy2002:~# kubectl delete MutatingWebhookConfiguration istio-sidecar-injector -n istio-system [14:02:51] warning: deleting cluster-scoped resources, not scoped to the provided namespace [14:02:54] mutatingwebhookconfiguration.admissionregistration.k8s.io "istio-sidecar-injector" deleted [14:02:57] btullis: let's retry :D [14:03:04] Ack. [14:03:34] Same error. [14:04:45] btullis: one last time, I removed another webhook [14:05:14] Looking good. [14:05:24] https://www.irccloud.com/pastebin/Ersz9bHk/ [14:05:36] all righhhhttt [14:05:52] the webhook have been recreated [14:06:09] two istiod running now [14:06:19] Should we redeploy namespace from admin_ng to get the annotation back? [14:07:11] I think we can do it manually: "kubectl label namespaces istio-system pod-security.kubernetes.io/enforce=restricted" [14:07:44] https://www.irccloud.com/pastebin/q9WsqpEq/ [14:08:03] Hmm. [14:08:09] what the... [14:08:15] is it the old pod? [14:08:25] no the new one [14:10:48] root@deploy2002:~# kubectl describe pod istiod-74b5fd75f-w7wsr -n istio-system | grep seccomp [14:10:51] root@deploy2002:~# kubectl describe pod istiod-74b5fd75f-4ffmh -n istio-system | grep seccomp seccomp.security.alpha.kubernetes.io/pod: runtime/default [14:11:05] ok so the new pod somehow doesn't carry the annotation [14:11:33] trying to delete the old one carrying the label, to see what happens [14:12:14] all right now it fails to start as well for the same reason [14:12:23] so at this point the dse istio manifest is missing something [14:14:16] btullis: I may have a patch in 5 mins, I'll ping you when done [14:15:19] elukey: Ack. [14:21:33] I think that the ml one is not working as well [14:22:08] mmm no it appears working [14:46:03] btullis: I'd try https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1114743 [14:46:18] I think addonComponents is not the right one [14:46:32] if so, I have no idea why it works on ML [14:46:49] but maybe the pods have been running for a ton of time [14:51:15] another thing to probably delete is the CNI config [14:51:26] we have it on ML since we use the istio sidecars, but otherwise it is not needed [14:54:39] basically https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1114749 [14:54:46] but it makes sense to keep it separated [14:58:29] (it is not also enabled via puppet in kubernetes.yaml, so we don't deploy its config etc.. on dse [14:59:24] or we do, mmm [15:01:07] uff yeah we need to change puppet as well [15:01:13] this requires a task I think [15:05:36] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1114753 [15:05:59] I shouldn't complain since this is partially my fault :D [15:12:58] Oh sorry, got distracted. Just coming back to this. [15:23:38] btullis: sorry for all the noise, added a bigger cleanup in the pipeline :D anyway, for the immediate issue https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1114743 [17:51:26] btullis: slides looked pretty comprehensive for 'slapped together in a hurry' :) - thanks for taking the time [17:52:14] Thanks very much, but you're right. I should have had a demo :-) [17:53:47] eheh