[08:07:12] I just had an interesting failure mode from the reimaging cookbook that I have seen before. `New OS is trixie but bookworm was requested`- I'll try another run and see if it happens a second time. The host is dse-k8s-wdqs2001, just in case anyone is also interested. [08:19:08] which OS did you request? [08:19:26] basically is `bookworm was requested` correct? [08:20:54] Yes, I did request bookworm. `sudo cookbook sre.hosts.reimage --os bookworm --new dse-k8s-wdqs2001` from cumin1003. It happened a second time. [08:47:11] free upgrade with every reimage ;) [09:15:07] hey folks, I just added basic ACLs to kafka logging codfw. It is part of an effort to harmonize our configs after the 3.7 upgrade. Tiziano is aware of the change, and dashboards look good. If I am not around and you see something weird, https://phabricator.wikimedia.org/T425528#12028264 is the rollback [09:15:28] something weird == clients not able to push/pull data for example, with clear error messages like "you don't have perm" [09:21:38] btullis: wait a sec, did you get Trixie installed? [09:22:41] elukey: The cookbook bailed out at the debian installer stage. [09:23:22] https://usercontent.irccloud-cdn.com/file/z1GpcSmz/image.png [09:24:11] I can get to a console. [09:24:36] https://www.irccloud.com/pastebin/9vBRVHA7/ [09:31:56] In case it helps, dse-k8s-wdqs200[2-4] are sitting and waiting patiently to be reimaged. I can create a ticket if it helps, or I can kick off one or more of the others. [09:33:37] But dse-k8s-wdqs1001 reimaged perfectly into bookworm. [10:42:08] headsup; I'm disabling the Debian mirror on mirrors.wikimedia.org now, everything should be adapted to use deb.debian.org, but if you see anything, let me know [11:20:47] I found the problem. It was two stray files in `/etc/dhcp/automation` install2004, from before the server rename. I will remove them. [11:20:54] https://www.irccloud.com/pastebin/LEi3jYEU/ [11:21:37] wdqs202[89] were the old names, before they were renamed to the much more snappy dse-k8s-wdqs100[12] [11:29:21] Correction: *200[1-2] [11:47:43] ahhh wow ok thanks for letting us know! [14:31:32] suppose I'd like to keep tabs on a gitlab-ci pipeline, in terms of the timestamp of the most recent failure and success, with the ultimate goal to alert if the pipeline has been busted for X amount of time -- is that sth that's available as a metric already ? [14:31:44] case in point, recurring pipelines such as https://gitlab.wikimedia.org/repos/sre/wikitech-static-docker/-/pipelines [14:41:09] godog: the current set of CI metrics for GitLab is very limited and do not contain any kind of project related metrics. I opened https://phabricator.wikimedia.org/T347038 some years ago, you could state your use-case there probably [14:42:48] jelto: ack thank you, will do! and yes with my o11y hat on I understand not wanting project-related metrics not to explode cardinality, I assume [14:43:52] yep, but it's probably possible to push metrics from CI to the push gateway (at least from Trusted Runners). Then you could alert if this metrics become too old [14:43:55] can we poll it though? [14:44:43] jelto: interesting, yeah maybe that could be an option [14:45:51] volans: not sure what's "it" ? the push gateway ? [14:46:23] poll the status of the pipeline and either alert on it directly or publish it as a metric [14:48:06] ah, I'm sure we can, easier with https://phabricator.wikimedia.org/T347038 in place of course [14:48:58] Yes the status should be in the API (even publicly, without API token) [14:48:58] https://gitlab.wikimedia.org/api/v4/projects/repos%2Fsre%2Fwikitech-static-docker/pipelines/latest [14:49:21] nice