[07:42:57] jhathaway: I am testing during the iteration phase but mostly the service itself and that the nature of chart is correct (as in, it does what I intend it to do). The yaml part is taken care of by CI, which can be run locally. Make sure you got docker and Ruby installed and just do rake run_locally['default'] to get the entire gamut of tests that [07:42:57] will be run by CI if you upload the patch to gerrit. So, my iteration part usually is editing a file like one of the templates or values.yaml, running helm upgrade --install . [07:43:21] then run checks on the service (e.g. a curl call), rinse, repeat [07:43:44] when done, I bump the version in Chart.yaml, commit and upload to gerrit [07:45:09] I am not sure what you mean by spnning up aux-k8s-eqiad in minikube. But iterating on the jaeger chart locally using minikube, until you are satisfied is the same approach I do. [09:05:50] akosiaris: is it a fair assumption that all the k8s nodes will be in private vlans? [09:06:23] XioNoX: Yes. Always. [09:06:49] workloads could at some point end up not being in private vlans, but if that is ever to happen is far away. [09:07:30] yeah, looking at it still in the optic of simplifying BGP config [09:11:02] Why does this line for example have `|| has(node-role.kubernetes.io/master)`? https://github.com/wikimedia/operations-deployment-charts/blob/master/helmfile.d/admin_ng/values/ml-serve-eqiad/calico-values.yaml#L14 [09:11:30] wouldn't the previous selectors (region/zone) be enough? [09:16:01] it might have been merged before the netbox zone selector population was added [09:17:55] I see, so the labels get applied to the master nodes regardless [09:18:04] but, yes, the former should suffice now [09:18:39] yes, now the labels get also applied to those. It used to not be the case. [09:19:00] cool, will send a patch when I find time [09:19:09] also in the past, masters did not host pods and did not bgp with the crs [09:19:21] istio changed that [09:19:51] XioNoX: IIRC it was done for https://phabricator.wikimedia.org/T306649 [09:20:41] yeah https://phabricator.wikimedia.org/T306649#8466962 [09:22:52] elukey: nah, that's not where the `|| has(node-role.kubernetes.io/master)` got added [09:23:03] but Alex gave context [09:26:33] okok [11:15:39] Before I spend more time on it, I'm wondering what you think of the approach taken in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/887945 ? (/cc akosiaris) [11:16:20] centralize all the BGPpeer, as right now they're defined for each clusters, racks E/F without any selectors (just the nodes hardcoded) [11:23:41] <_joe_> XioNoX: I love the idea, but I'd go further [11:23:58] <_joe_> the nodeselectors in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/887945/3/helmfile.d/admin_ng/values/bgppeer-values.yaml have a lot of repetition [11:24:10] <_joe_> we should template that out :) [11:30:01] good idea :) [11:32:31] <_joe_> I added a comment, but I think that with some annoying templating we can make it easier to manage [11:32:45] <_joe_> easier than what my comment suggests [11:33:03] <_joe_> I'd take a stab at it if I had any time :) [11:34:13] I would add that d the data is available in puppet, we would maybe auto-generate bgppeer-values.yaml as a /etc/helmfile-defaults file on deployment hosts [11:34:22] *if the data is [11:34:57] oh...ther is a comment about this - sorry [13:28:58] thanks! I'll spend more time on it then ;) [13:29:24] I might need help to fix the CI error though [13:39:22] XioNoX: I think you just missing the bgppeer's name (cr1-codfw-ipv4, cr2-codfw-ipv4, ...) in the bgppeer.spec you generate [13:40:10] oh nice! I was also wondering about the ordering of https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/887945/3/helmfile.d/admin_ng/calico/helmfile.yaml#22 [13:40:43] they override each other, top to bottom [13:40:46] bottom wins [13:41:10] is charts/calico/values.yaml "out of the picture" ? [13:41:20] no, that is the first of them all [13:41:41] so before "values/common.yaml", right? [13:41:45] even if nt defined [13:41:47] not* [13:41:55] yes, correct. Check https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/helmfile.d/services/README.MD [13:42:31] that's related to services but the same rules apply to admin_ng [13:43:39] oh, I now see you where trying to add one BGPPeer object per peer [13:43:50] cool, yeah that's what I thought but wanted to make sure [13:43:58] jayme: that was already the case, no? [13:44:34] yes, sorry...I'm obviously not back to 100% capacity [13:46:19] XioNoX: then it's just that the BGPPeer spec does not allow the keys ipv4, ipv6 [13:46:21] https://docs.tigera.io/archive/v3.23/reference/resources/bgppeer [13:47:15] I don't see where in my code those keys get created [13:47:20] https://integration.wikimedia.org/ci/job/helm-lint/9283/console [13:48:36] yeah those errors "12:11:44 BGPPeer cr1-codfw is invalid: For field spec: Additional property ipv4 is not allowed - For field spec: Additional property ipv6 is not allowed" [13:48:59] but afaik my template doesn't create those keys [13:49:06] oh, maybe I need to bump the version [13:49:11] if you scroll past that, you will see the diff of the yaml generated [13:50:34] in general I'd say it's more easy to work with seperating the chart change (e.g. stuff in charts/*) from the deployment change (e.g. stuff in helmfile.d/*) [13:51:27] you could add a proper fixture for your new data structure and then iterate on just the chart change until it works. Then adapt the helmfile.d part [13:52:37] I see the wrong diff, but dunno how my template can possibly generate that [13:54:20] XioNoX: I'd say $ipfamily is not se properly [13:54:35] see the object names in the diff, they don't include it [13:55:13] so the index call later does something unepected as it probably gets an null or undefined key to look up [13:55:27] hmmm, I see [13:55:34] makes sens [13:56:29] I think it's again a matter of variable scoping [13:58:05] learning go templating the hard way :) [13:58:20] there is only that one way to learn go templating :) [14:02:21] jayme: in that case how can I get the same output as CI, without having to run the whole of it? `helm template --debug calico` doesn't generate "bgppeer" [14:03:24] XioNoX: try re-adding the fixture you removed and run with helm template -f calico/.fixture/base.yaml --debug calico [14:03:51] jayme: in the new format, though right? [14:03:58] exactly, yes [14:04:22] so a random selection of helmfile.d/admin_ng/values/bgppeer-values.yaml ? [14:04:58] yeah - or totally make something up [14:05:34] it's just a matter of having the data structure really. I think I was just lazy, so I copied some real values [14:06:39] yeah that works just fine [14:06:49] https://www.irccloud.com/pastebin/qXezRwrS/ [14:06:55] could it be a version issue? [14:07:00] like helm/go version? [14:07:54] ah, snap...np [14:07:55] *no [14:08:42] it migh [14:09:22] XioNoX: in the CI output, ctrl+f for "Diff for test case admin_ng/staging-codfw" [14:10:23] the thing is that only the 1.23 clusters use the latest version of the chart. The 1.16 clusters are pinned to an older version [14:11:08] so you need to keep the old data structure arround for now in helmfile.d [14:11:25] ohh [14:12:52] is it possible to have both old and new datastructures? [14:13:42] the order of the values files in helmfile.yaml should come into play again here [14:14:04] otherwise that seems like a hard blocker for this change, which can be fine depending on what the upgrade to 1.23 is due [14:14:09] s/what/when/ [14:15:02] if you revert the removal of the structure from all the values/*/calico-values.yaml files (all but staging-codfw and staging-eqiad) those should override helmfile.d/admin_ng/values/bgppeer-values.yaml [14:15:22] hope that makes sense :) [14:15:36] I need to go for a walk...back in ~20min [14:15:39] yeah, it's messy though [14:15:45] no pb! [14:16:09] it is, yes...but it allows a gradual rollout [14:16:18] when is 1.23 due? [14:16:46] wikikube codfw before the switchover, eqiad after [14:17:00] but no clue if there is a timeline for the others [14:17:41] don't let me hold you for you walk! [14:17:53] from* [14:17:54] thb it's fine to roll it out that way. We just need to make sure it's known to all cluster operators :) [14:18:06] nono, going now :-) ttyl [14:36:08] CR updated but still getting a similar error. Old template vs. new data structure somewhere [15:06:41] XioNoX: looks like I was wrong and the value is not overwritten but deepmerge'd. As you can see in the diff, the new peers are added [15:11:36] jayme: ok, I haven't look closely enough [15:13:39] how to cleanly workaround that? [15:14:29] rename BGPPeers to BGPPeers-new? and itterate over that in the new template? [15:22:49] hmm...I'm not sure tbh [15:23:14] but yes, using a different key would definitely help ofc. [15:24:47] jayme: would it be possible to force update the 1.16 template? [15:31:44] I think it's possible, although we never tested that. It would require (temporarily) reverting the chart back to 3785d88ad45e946cdc914ee7f30cce9a60b9f2b2, making that change + version bump, merge that to deployment-charts to have a new helm-chart packaged and released to the repo, then forward "back" to the current version. [15:31:58] feels a bit complicated as well [15:33:44] indeed [15:34:51] a) wait for 1.23 b) use a new key c) backport the change [15:35:03] that's all I can think about for now [15:36:46] there is another [15:38:01] you could conditionally include values/bgppeer-values.yaml [15:38:41] ah? [15:38:44] the helmfile.yaml is a go template as well. You can see the $version checks a couple of lines above the values include [15:39:26] hahaha. Do not trust the file extension [15:40:09] you could only include the values/bgppeer-values.yaml if $version is undefinded (which means latest) or <= 0.1.18 [15:40:42] yes, indeed. more or less every yaml file in that repo is a go template I would assume :-D [15:42:01] if that works, please add something to https://phabricator.wikimedia.org/T328291 so that we don't forget to clean that up after 1.23 [15:56:15] Added the above as comment to the CR [16:00:40] jayme: jenkins-bot Verified +2 \o/ thanks a lot! [16:13:55] yay