[08:52:53] For https://phabricator.wikimedia.org/T355997 do we modify harbor quotas? If so how is it done? [08:57:25] rook: was just looking at the same task... we don't have any policy around harbor quota increase requests yet. the only way to change a tool's quota right now is via the harbor UI with an admin user account [08:59:52] hmm, I also see we have 36 pending jobs https://tools-harbor.wmcloud.org/harbor/job-service-dashboard/pending-jobs [09:02:38] What is the scope of harbor? All of SRE? [09:03:23] harbor is specific to the toolforge buildservice [09:03:51] Ah ok, that makes things easier [09:04:55] 1gb seems like a reasonable amount to bump up on request. Though in our current situation would we be digging ourselves into a manual update mess. Where upgrades and other things may wipe away custom quotas? [09:05:50] custom quotas should not be wiped away, afaik [09:06:47] for now, it's the only method we have until upstream fixes robot permissions https://phabricator.wikimedia.org/T352417 [09:07:07] manual increases, that is [09:07:25] dcaro: do you see any issues with (at least temporarily) increase lucas' harbor quota for now? we should probably figure out/document quota increase requests for harbor sooner rather than later [09:12:12] rook: I can follow up on this request [09:12:29] If you like, thank you [10:08:12] blancadesal: `toolforge build delete` deletes logs for a specific build, and not the build artifacts itself, right? [10:08:53] taavi: correct; `toolforge build clean` deletes the artifacts [10:09:48] ok. as old build logs are automatically cleaned up when new ones are started, what's the point of the option to delete those logs? [10:12:13] iirc, they are not all automatically deleted. you can still see them with `toolforge build logs ` I think we keep the X (5?) most recent. [10:15:38] that's what I mean by automatically deleted [10:17:12] I think (but am not entirely sure) that the delete command was introduced before the automatic deletion of older runs, so that might be the reason [10:19:25] I wonder if we should just remove that, seems like during the weekend Lucas thought it was deleting the artifacts and I can easily see others getting confused the same way [10:22:47] eventually the goal was to somewhat merge both, but yes, probably deleting a specific build does not make sense with the automatic cleanup now (the cleanup was introduced later, and the delete was a strong request for us to be able to start the build service beta, so that's why the delete command was introduced) [10:22:58] I agree it's somewhat confusing, but from what Lucas pasted in the task and his command history, it does seem like he was using the right command to delete images [10:23:36] blancadesal: no issues with extending the quota right now, we will have to have it into account when we automate/manage them [10:23:53] blancadesal: not at the start: [10:23:57] 18:31:43 <+wm-bb> how do I clear out my storage in harbor? [10:23:57] 18:31:54 I already `toolforge build delete`d all my old images, but still got this error in a new build: [10:23:57] 18:32:00 DENIED: adding 236.4 MiB of storage resource, which when updated to current usage of 958.9 MiB will exceed the configured upper limit of 1.0 GiB. [10:23:57] 18:37:06 looks like `toolforge build clean` helped [10:24:05] he started with the delete, but moved to the clean [10:24:06] yep [10:24:18] ok, I missed those [10:24:53] dcaro: ok, I'll bump his quota [10:25:06] i was able to reproduce the issue he sees too, as in the quota would not go to 0 [10:25:41] interesting. did the quota also not go to 0 in the UI? [10:25:56] I suspect it might be related to cleanup jobs taking some time to cleanup unreferenced images/objects [10:26:00] yep [10:26:04] ui and cli [10:26:35] https://github.com/goharbor/harbor/issues/17255 [10:26:42] ^ that might be related [10:26:53] do you think it has to do with the pending EXECUTION_SWEEP jobs? [10:26:59] maybe yes [10:27:20] https://www.irccloud.com/pastebin/gBi5TnSJ/ [10:27:58] that's interesting [10:28:57] I had started looking into the non-deleted +3k scheduled jobs (from the projects we deleted, it seems that the cleanup jobs are still there and getting triggered), might be related too [10:31:46] I don't see the quota issue anymore btw. (now it says 0 when I clean) [10:35:51] same, I wasn't able to reproduce it [10:38:22] I've manually bumped the quota on the two tools lucas asked for. should that be logged somewhere? [10:38:41] tools project [10:38:48] and tools. project xd [10:39:53] like !log tools quota bump for ..., and !log tools. quota bump I'd say (no strong need though, just recommendation) [10:43:59] https://www.irccloud.com/pastebin/zkULs32J/ [10:44:01] xd [10:48:11] that's... a few? xd [10:50:01] that might explain why the db ran out of disk at some point :D [10:54:08] yep, I think that's also what makes redis use so much memory [10:54:23] (as it seems to cache the logs for the executions there) [10:55:59] I wonder if we can just truncate the tasks + execution tables [10:56:13] I don't mind so much losing the logs of those [10:56:42] https://www.irccloud.com/pastebin/I4Qu5aFr/ [10:57:14] so only a quarter (bit less) of the executions have a related existing task it seems, I guess the rest are leftovers [11:00:31] this seems related https://github.com/goharbor/harbor/issues/17611 [11:00:36] I'll open a task to follow up [11:03:06] T356037 [11:03:07] T356037: [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 [11:08:37] btw I'll replace some more older k8s workers with bookworm based ones as I have not seen or heard about any issues [11:21:49] nice :) [12:47:15] hrmh. google's requirements mean that toolforge needs to support https://en.wikipedia.org/wiki/Authenticated_Received_Chain, but that is an experimental feature in exim and would require a custom exim build to enable which I'd rather not do :-/ [12:53:59] that's unfortunate [14:35:58] reviews welcome: https://gerrit.wikimedia.org/r/c/operations/puppet/+/993693 https://gerrit.wikimedia.org/r/c/operations/puppet/+/993697 [15:48:07] I want to edit the description of https://phabricator.wikimedia.org/project/view/2880/ -- do I need a special membership to do that, or am I missing a hidden 'edit' button? [15:49:10] try "manage" from the left sidebar and then you should see a "edit details" on the right [15:50:47] that's it! thank you [17:21:01] * dcaro off [18:25:31] taavi: next time you're about to delete a set of old worker nodes can you let me do it instead? Something interesting is happening in designate and that'll give me a set of test subjects. [18:26:08] andrewbogott: sure, I can give you a few workes to delete right now if you want [18:26:15] yes please! [18:26:24] Do they need to be deleted via cookbook or just horizon? [18:29:10] there's a cookbook, wmcs.toolforge.remove_k8s_node, and if you're running that yourself you want to have https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/993670 and its parents applied. you can start from tools-k8s-worker-36 and work the way up in the numbers, just let me know how many you'll end up removing in total and I'll spin up a [18:29:10] matching amount of the newer nodes [18:39:45] thanks! I'll give it a try [18:40:03] are those patches applied on cloudcumin1001? [18:40:48] no, they're still pending reviews [18:41:55] ok [19:57:46] Could I get a +1 for T356089 and T356090 ? [19:57:46] T356089: Request temporary quota increase for videowiki - https://phabricator.wikimedia.org/T356089 [19:57:47] T356090: Request temporary quota increase for owidm - https://phabricator.wikimedia.org/T356090