[07:09:22] greetings [08:05:47] morning! [08:15:49] morning [08:44:33] dhinus: can I get a review of https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/158 ? it's blocking moving the default buildpack to the newer one (if we do, components will rebuild on every deployment) [08:45:04] dcaro: looking! [08:45:37] thanks! [08:54:43] I +1d it if you need to merge it now, ideally I would split the unrelated changes/cleanups to a separate MR [09:42:02] they are related though, I have to update the models to bring the builds-api new flag, so I can use it in the config [09:42:21] the only one unrelated is the small fix in values.yaml to fix the local deployment [09:47:21] ack, thanks for the replies, I thought JobsUpdateResponse was not required [09:49:48] it would fail as now the JobsJobResponse does not have the job_changed property anymore [09:50:15] yes I tried locally and got the failing tests... [09:50:59] I could have done the model update first, then adding the flag to the config though [09:55:06] I added a follow-up comment on the change to the default value... I'm still not understanding fully the flow :) [09:55:59] yes maybe doing model update in a separate commit can make reviews easier, not a big issue anyway [10:07:33] ooohhh, the local values fix actually breaks prod deployment, looking [11:20:01] i don't like how often toolforge prometheus has been restarting [11:20:53] dcaro: when you cleared the data earlier this week, did you do it on both servers or on one of them only? [11:21:55] One only, the one that was crashing [11:26:23] which one was that? -8? [12:13:41] Hmmm, not sure now, let me look at the logs (sorry for the delay, was having lunch) [12:15:12] Hmm, sal does not have it, maybe I did both, not sure, if both were having issues I did both [12:15:31] I am looking at T421242, anyone has opinions on the 24G/32G of ram question? [12:15:32] T421242: New flavor for the integration project with more vCPU and ephemeral disk space - https://phabricator.wikimedia.org/T421242 [12:18:33] not really strong opinions no [12:20:18] LGTM, no concerns, our cloudvirts are quite big [12:26:31] alright, sent a MR for 32 on the assumption that would actually improve things for CI [12:43:25] hmm, istio in lima-kilo is not starting up on a new instance, complaining it does not have enough memory [12:43:26] │ Warning FailedScheduling 2m29s (x5 over 3m5s) default-scheduler 0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod. │ [12:43:29] looking [12:43:53] it requests 5G [12:47:36] I also noticed istio was not starting on my lima-kilo, but did not investigate the cause [12:47:46] can we lower the request for lima-kilo only? [12:48:25] hmm, there's a configmap with the limits embedded, I'll try to see if I can template the limits out of it or we'll have to copy the whole thing for local with the resource difference [12:49:39] yeah, we should do that but the templating might be a bit tricky [13:43:40] for what are tools-cumin-1 and toolsbeta-cumin-1 used for nowadays? [13:45:27] volans: queries that need puppetdb access [13:45:49] local puppetdb I guess [13:46:04] from the tools's master [13:46:30] yes [13:47:16] I'm deciding how to release cumin 6.0.0 and was thinking to push it on apt.w.o and installing it on cloudcumin* so that we can test it a bit (including cookbooks) and if all goes well push to prod's cumin hosts [13:47:25] so trying to understand how critical are the instances in cloudvps [13:48:04] the others being: beta is not critical, integration I'll ping antoine, mariadbtest federico [13:49:12] not very [13:49:55] if there is any concern with my plan [13:51:12] according to my own .bash_history on tools-cumin-1, the last time I used it was 2 years ago :D [13:51:30] yep, i have not used it in quite long time too (I use cloudcumin mostly) [13:52:18] great, thanks for the feedback [13:53:34] are there any recent logs/jobs-api changes which might explain T421929? [13:53:35] T421929: `toolforge jobs logs` misplaces my logs - https://phabricator.wikimedia.org/T421929 [14:01:30] there's one adding a since/until options, might explain that, not sure if it was deployed before or after [14:01:38] re: istio and lima-kilo, I tried lowering the mem request, but the pod is now failing with "Failed to create temporary file" [14:02:27] my istio is up and running without errors (that I see at least) [14:45:43] ok I've release cumin 6.0.0 to pypi, apt.w.o and the cloudcumin hosts. On the latter ones the previous deb is in my home in case you need to rever in an emergency. [18:35:03] * dcaro off