[16:05:01] Where does our bookworm base image live? I'm not seeing it in https://gerrit.wikimedia.org/r/plugins/gitiles/operations/docker-images/production-images/+/refs/heads/master#go but maybe I'm missing something? [16:10:46] It lives on the build host, like all production base images [16:11:30] https://wikitech.wikimedia.org/wiki/Kubernetes/Images#Base_images [16:35:49] <_joe_> on the topic of base images - I think moritz was trying to understand if we could "just" import the official ones [16:36:26] <_joe_> because now there's some guarantees about their update rate, but I'm kinda resisting because I'm worried about support lifecycle [16:57:45] I would love a small module/mesh networkpolicy review from someone on https://gerrit.wikimedia.org/r/1174768 [16:58:39] Ah, I was wondering since the opensearch operator wants golang-1.24 and it's available in Trixie, but not bookworm. Ref https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1174751 [16:59:30] cdanis :eyes [17:02:38] cdanis possibly dumb question, but why does this need to be a patchset? Is there a deploy that needs to happen between the changes or something? [17:03:24] doing the copy patch thing makes the code review much easier [17:04:04] Ah, so it just shows the diff instead of a net-new file. ACK [17:04:43] +1'd ...I'll have to use that trick in the future ;) [17:05:00] yeah it's typical for these versioned files in deployment-charts/modules [17:10:29] thanks for the review! [17:17:48] np, anytime [17:52:20] taavi or anyone else, what are y'all's feelings on merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1140695 (creating a Trixie docker image)? It's not urgent [17:52:54] cc James_F [17:54:08] inflatador: I'm in favour, but Alex said to wait until the release. [17:54:25] (Which is still in the future; can you wait 10 days?) [17:54:44] * inflatador takes deep breath [17:56:48] in all seriousness, I'm not in a hurry. If we need to wait until the official release, that's fine w/me [17:57:00] Ack. [17:57:20] James_F: my apologies for your tracing difficulties [17:57:31] I'm keen to make progress too, but I know it creates / pulls forward extra work (e.g. packaging our custom releases of PHP). [17:58:14] cdanis: Thanks! Your patch looks probably-fine, but historically we've never done any helm template changes ourselves, and always relied on ServiceOps… Not sure about deploying it. [17:58:34] I'd be happy to deploy it [17:58:54] I can deploy and self-revert if needed, no pressure on you! [17:59:23] Adding a general service alert for "when we hit the service we see a response in OTel" is probably premature. [17:59:44] ok cool, from the helm-lint diffs it looked very safe [17:59:48] Yeah. [18:00:01] I didn't realize the pod-to-pod thing until today 🤦 [18:00:38] Welcome to AW, where there's the right way, the wrong way, and the Wikifunctions way. [18:00:55] 😅 [18:58:54] cdanis: BTW, is there a future plan/idea to support tracing in staging-k8s to test this kind of thing? [19:00:36] James_F: yeah, task open but no work yet [19:00:51] I'd also love to wind up in a spot where we have tracing integrated into some sort of minikube-esque environment [19:01:08] <3 [19:01:10] Yes. [19:02:37] cdanis: Aha. We now have https://trace.wikimedia.org/trace/e035625bf4885b617e9f4d8704e9cc11 — "unknown_service" is new. I guess we've failed to set a field somewhere? [19:03:00] it would also help if I, y'know, finished writing the docs about how to get all these things right [19:04:57] James_F: setting env vars like `OTEL_SERVICE_NAME=function-orchestrator` will fix that [19:05:13] (the other option is configuring that stuff from your own main(), basically) [19:05:22] Yeah, I'll set the ENV now and we can do that later. [19:07:00] cdanis: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1174799 look OK to you? [19:08:12] James_F: the javascript file says python, aside from that lgtm :) [19:08:31] Oh, blah. [19:10:02] +1! [19:26:07] James_F: https://trace.wikimedia.org/trace/1851d4608d32e47b3cb614208b3220c4 🎉 [19:26:45] Yup! I especially like the MW -> WF -> MW loop being exposed like that. [19:27:22] Amongst other reasons, because that's fallback-behaviour that shouldn't happen (we have an actively-pushed cache that should always be up-to-date). [19:27:28] 👀 [19:34:01] James_F: btw, any requests to mwdebug are traced at 100% -- so I triggered that trace by putting `tacocat` into https://www.wikifunctions.org/wiki/Z10096 with WikimediaDebug enabled [19:35:34] not sure if that explains the fallback/uncached behavior [19:39:41] oho https://trace.wikimedia.org/trace/9dee4728436c045e8886ec108b03e4b0 [20:09:13] James_F, cdanis: RelEng will have a hypothesis next quarter that moves test.wikipedia.org to a new MediaWiki deployment in the staging wikikube cluster, so maybe that will help prioritize getting tracing support in that environment? [20:09:49] good to know, thanks! [20:10:00] (cc hnowlan ^) [20:10:21] https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Pretrain is that project that will do that work.