[05:27:23] <_joe_> maryyang, ori as I already offered, I'm up for doing a short intro to our infra for all of your team if you think it's useful [06:46:19] 10serviceops, 10Recommendation-API, 10SRE: recommendation-api alerting and api errors - https://phabricator.wikimedia.org/T262587 (10Marostegui) 05Open→03Resolved I am going to close this, it's been 1.5y and it is of course impossible to troubleshoot this specific issue anymore [09:27:54] FYI kubestagetcd1004 was just reported DOWN by icinga [09:28:23] ah, might be related to a ganeti reboot [09:30:59] back up [10:47:52] yeah, those etcd instances don't use DRBD for latency reasons, so they go temporarily down as part of the reboots [10:52:27] ah, so they can't be live-migrated I guess [13:40:53] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [13:40:56] 10serviceops, 10Kubernetes, 10Patch-For-Review: Replace kubeyaml in deployment-charts CI - https://phabricator.wikimedia.org/T306165 (10JMeybohm) [14:46:35] 10serviceops, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Automatically update Docker containers on Beta Cluster - https://phabricator.wikimedia.org/T308598 (10ori) I'd like to have an idempotent way of checking whether the image is up-to-date, but you can't get that out of the docker CLI. Came up wit... [15:01:29] 10serviceops, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Automatically update Docker containers on Beta Cluster - https://phabricator.wikimedia.org/T308598 (10Legoktm) fwiw podman has https://manpages.debian.org/bullseye/podman/podman-auto-update.1.en.html which AIUI does basically what you're lookin... [15:16:23] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE: contint/releases/hosts with helm installed: puppet - Could not find group deployment - https://phabricator.wikimedia.org/T307740 (10hashar) 05Open→03Resolved a:03jbond With https://gerrit.wikimedia.org/r/791565 deployed , the CI servers have... [15:24:24] 10serviceops, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Automatically update Docker containers on Beta Cluster - https://phabricator.wikimedia.org/T308598 (10ori) 05Open→03Resolved a:03ori [15:34:45] hello! I am still getting this rake error now that the chart has been merged, any ideas? https://integration.wikimedia.org/ci/job/helm-lint/7376/console anything to do with the chart name/helmfile name mismatch? [15:43:05] hnowlan: I think it might be about the comments in helmfile.yaml. I pasted both of the large yaml files into https://yamlchecker.com/ and while values.yaml is reported as valid, for helmfile.yaml it doesn't like the line 19 [15:43:11] bad indentation of a mapping entry (19:71) [15:43:28] ment.Values "roll_restart") }}{{ eq .Environment.Values.roll_ ... [15:43:33] doesn't like the }}{{ part [15:45:17] a bit confused though about line 19 vs line 20. it says 19 but the code above is line 20 [15:45:59] ah, it just removed empty line 14 when pasting [15:46:34] hmm, odd. that file is almost entirely boilerplate [15:51:14] there seem to be more people asking about "bad indentation of a mapping entry" in odd contexts. like https://stackoverflow.com/questions/66634843/yamlexception-bad-indentation-of-a-mapping-entry or https://github.com/yarnpkg/berry/issues/3416 but that doesn't make things obvious either [15:51:57] "Possible suspects for such an issue to pop-out again some day are as follows:" Inconsistent line-endings (like CRLF instead of LF) when a config file has been curl'ed. [15:52:37] <_joe_> hnowlan: uh didn't I fix the boilerplate? [15:52:55] <_joe_> hnowlan: what is the patch, sorry? [15:53:14] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/791324 [15:53:18] <_joe_> also you should stop finding bugs in ci at 6 pm :P [15:53:53] was checking for line endings but they seem consistent. bug sounds 'good' [15:54:01] <_joe_> the proble is again https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/791324/3/helmfile.d/services/image-suggestion/helmfile.yaml#18 [15:54:28] <_joe_> you need to point to the right chart (and set a service name) [15:55:25] D'oh, my bad :/ completely forgot about that [15:55:26] Apologies [15:55:42] <_joe_> np :) [15:55:52] <_joe_> I'm glad it's not something worse [15:56:05] <_joe_> but yeah I need to see to fix the output of CI for that mistake [16:11:49] that fixed it!