[09:05:47] rook taavi: I guess even after that log fix it's still having issues? [09:05:50] https://www.irccloud.com/pastebin/loLOgh3m/ [09:23:46] Aside from that, I get a bunch of these messages whenever running any kubectl command [09:23:48] E1118 09:23:27.208950 884446 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request [09:23:50] is that a known issue? [10:47:07] proc: I feel like sometimes when a cluster underwent disk pressure from the log thing it didn't really recover well after a control node restart to clear the disk. I don't have a lot of data with it as I only saw it on a few test clusters that I ignored for several months. Though I recall similar issues with "couldn't get resource..." from any kubectl command. I would recommend deploying a new cluster and updating the podman log [10:47:07] retention after deploying it, and go from there. [11:40:23] Rook: gotcha. can I reuse my old cluster template, or should I create a new one? (my old one is probs from June or so, dunno if anything has changed) [12:06:57] https://www.irccloud.com/pastebin/2u1qcRex/ [12:07:00] time to take a break :D [12:37:28] proc: the template should be fine. Though in full disclosure I usually deploy with terraform (or more recently opentofu) and deploy a fresh template with a new cluster. But I don't see a reason that reusing the template would cause any issue [14:09:21] !log metricsinfra reboot metricsinfra-alertmanager-1 to see if it stops flapping a puppet alert [14:09:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Metricsinfra/SAL [15:43:45] would it be possible to get my disk/volume quota on WMCS temporarily increased (by 30GB?) to spin up a new cluster. I'd destroy the existing one once it's operational [17:31:49] !log tools.lexeme-forms deployed 8c123e032e (l10n updates: br, he, ko) [17:31:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [18:46:51] Ooke