[13:31:02] elukey: when applying https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1017935, I did not see the change for the new tls setting in the diff. [13:31:35] maybe the templating doesn't work that way? [13:31:40] urandom: o/ [13:31:44] hi! [13:32:27] nono my bad I haven't checked the helm lint before the +1 [13:32:32] lemme check the chart [13:32:50] oh man, and the apply is just hanging there too [13:32:54] it's not completing [13:33:07] this is weird, maybe the new pod is failing [13:33:37] yeah it is crashlooping [13:33:52] {"msg":"Error connecting to Cassandra: gocql: unable to create session: unable to discover protocol version: x509: certificate is not valid for any names, but wanted to match cassandra-dev2003-b.codfw.wmnet","appname":"sessionstore","time":"2024-04-10T13:32:18Z","level":"FATAL"} [13:33:56] should I ctrl+c? [13:34:14] nono let it run, it will automatically rollback after 5/10 mins [13:34:17] fun. [13:34:29] ah ok the above makes sense! [13:34:33] so even though it defaults to false, and I set it false, it's not...false [13:34:37] it does? [13:35:08] by default we se the new option true, and if we didn't see a change in helm lint it means that the templating needs to be tweaked to allow tls options [13:35:12] *we set [13:35:41] oooh, right [13:35:44] yes, ofc [13:36:01] but, why doesn't verify? [13:36:19] I mean, one problem at a time I guess, but I also expected verification to work [13:37:36] urandom: so the new verification is very strict, it checks cert expiry etc.. but also its CN/SANs, if they don't contain the target hostname "cassandra-dev2003..." then it fails [13:37:47] we probably set a common CN: field for all the nodes [13:37:57] like session-store-staging [13:39:24] urandom: I am checking the kask chart's _config.yaml [13:39:47] IIUC we basically manually inject the tls->ca option [13:40:40] if you want I can rework it quickly and send a patch [13:40:59] Subject: C = US, O = WMF, OU = cassandra-dev, CN = cassandra-dev2003-b [13:41:29] :/ [13:41:45] ah so close! :D [13:41:49] not the FQDN [13:41:56] PKI will solve this don't worry [13:47:53] elukey: the injection you're referring to happens in configmap.yaml? is that how you were thinking of doing this setting? [13:48:22] urandom: it happens in _config.yaml afaics [13:48:36] {{- with .Values.main_app.certs -}}{{- if .cassandra -}} [13:48:36] {{- $tls := dict "ca" "/etc/cassandra-certs/ca.crt" -}} [13:48:36] {{- $cassandra := set $cassandra "tls" $tls -}} [13:48:37] {{- end }}{{ end }} [13:49:34] I am not a big fan of the code above, I'd make it more simple and explicitly add the option to values.yaml [13:52:10] or we can tweak it with another if [13:53:47] From https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1017320, I assume the ca from values-staging.yaml is/was being injected somehow, is that not the case? [13:53:58] how was that going to work? [13:54:32] yes that is saved and configured in kask pods [13:54:44] my point is that tls->ca is the only option that will be rendered [13:55:00] because the cassandra->tls dict is overridden in the above code IIUC [13:55:41] I know, I was just trying to understand how the former works, in order to understand how that might be transferred to the latter [13:56:21] I can see why my tls change was ignored, I was wondering how to go about making that override-able [13:56:38] honestly, all of this is so opaque to me...I need to fix that [13:56:40] ah okok, filing a patch in a sec so you can tell me if you like it [13:59:58] urandom: sorry I didn't see the above patch, now I understand [14:00:37] so with https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1017320/1/helmfile.d/services/sessionstore/values-staging.yaml I modified main_app->certs->etc.. that is rendered by configmap.yaml, as you pointed out [14:01:06] but the cassandra->tls value modified by you should have been rendered in /etc/kask/config.yaml, that is rendered by _config.yaml [14:01:28] so in the former case, everything was overridden correctly, in the latter no due to the code that I pasted above [14:01:32] basically different templates [14:01:53] I hope to have got your question, if not we can meet later and go through the code! [14:04:42] (meeting brb) [14:09:13] swfrench-wmf: hey! I just have T360332 that's still running but it'll be over by the end of the week, it runs only on S1 atm. I'll pause schema changes after that to let you work and wait for your ping to pick it back up :) (ack marostegui !) [14:09:13] T360332: Make the cupe_actor column nullable on WMF wikis - https://phabricator.wikimedia.org/T360332 [15:56:22] urandom: eventually I made it https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1018722 [15:56:29] I saw! [15:56:56] one extra bit - I think that the tls -> enable_host_verification setting was in the wrong plance, in theory it should go in /etc/kask/config right? [15:56:56] arnaudb: thank you! it _might_ be that we need to push it out to the week after next (depends on some coordination for cookbooks as well). I'll keep you posted. [15:56:59] Ok, maybe I have a better question: what is configmap being used for? I know from reading upstream docs, that's just arbitrary data sent to k8s [15:57:22] uh, yes, the kask config [15:57:55] it seems the verbatim certs in values.yaml are being templated into configmap, and no where else, how does that do...well anything? [15:58:22] the configmaps are useful to store data that can be passed as file/env-variable/etc.. to a pod [15:58:31] at least this is how I have seen used so far [15:59:03] but it's not contributing to kask config.yaml, is it? [15:59:07] re: certs and configmap, there is surely something that renders them as files [15:59:11] exactly yes [15:59:19] in the chart, _config.yaml renders it afaics [15:59:30] wha..? [15:59:55] the verbatim/in-lined certificate? [16:00:21] nono I mean kask's config.yaml [16:00:25] the last question that you asked [16:01:17] right, so rephrased: the only place I see the in-lined certs used is configmap.yaml, and not anywhere that would directly contribute to kasks config.yaml [16:01:28] correct yes [16:01:32] which seems to be templated with a path [16:01:42] this is my understanding yes [16:02:16] so... is are the keys/certs in configmap somehow become that file (those files?)? [16:02:46] /etc/cassandra-certs/ca.crt and /etc/kask-certs/{cert,key}.pem [16:03:02] presumably by some magic not represented in the chart? [16:03:53] check deployment.yaml line 65 onward [16:04:31] oooooh, ok, I missed that [16:04:44] that is the glue that renders the files [16:05:03] right. [16:05:05] * urandom sighs [16:05:31] yaml engineering to the limit [16:05:43] it's like my cat has been walking over the keyboard [16:06:35] swfrench-wmf: ack! [16:06:36] Ok, so then https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1018722 is a no-op [16:07:32] but lays the groundwork for us to be able to switch to PKI by simply altering the path, and then I guess later cleaning up the chart of the inlined cassandra ca? [16:07:36] nono there is a change buried in there [16:07:56] + enable_host_verification: false [16:07:58] for staging [16:08:08] all the rest yes correct! [16:08:56] oh, I needed to refresh [16:14:46] urandom: thanks for the review! Ok to merge and test staging? [16:15:17] elukey: ready; want me to deploy? [16:15:27] I can do it np [16:15:40] 👍 [16:16:06] urandom: since we are here, do you mind to review https://gerrit.wikimedia.org/r/c/operations/puppet/+/1018309 when you have time ? [16:19:57] done [16:20:07] thanksss [16:20:14] session store deployed in staging, now the pod works [16:20:18] \o/ [16:20:52] Ok, so I guess we don't need to test that the setting is working :) [16:21:41] I didn't think about moving cassandra-dev to pki, probably worth to do it as well [16:22:25] when we deploy PKI to it we can turn on enable_host_verification and check [16:22:28] wdyt? [16:22:35] yup [16:22:48] sessionstore and echostore staging already point there [16:23:11] ok I'll prep the changes tomorrow [16:23:28] going afk for today, thanks for the help! [16:27:04] elukey: thank you! [16:27:23] I'm going to owe you a bunch of ${beverage} when next we meet! [16:27:33] enjoy the rest of your day