[00:01:56] andrewbogott: quarry is down again [00:04:22] My house is flooding and I can't do anything else right now sorry [00:08:15] andrewbogott: yikes! good luck with that. I'm poking at the redis [00:24:28] quarry is back up for the moment. I logged what I did on the task/in SAL. It was taavi's `kubectl delete pod -n quarry --all` that got things restarted. I tried just the redis pod first, but things were still wedged. [01:02:48] thanks bd808! I assume this will just happen every few hours now until we rebuild :/ [01:02:58] I still wonder if adding a new worker would help [01:25:12] The problem both times has been that the Redis container’s ephemeral storage has filled up. [01:25:54] Which seems like a “something about what’s happening in redis changed problem” [01:26:25] Meaning I would guess it is usage related not platform related [01:27:45] The redis pod could certainly be deployed differently which might help things go on longer. [07:08:40] quarry is up now (or at least replying), but node1 is failing to start/run pods [07:08:46] I think it might be out of inodes [07:08:50] https://www.irccloud.com/pastebin/BCjkcCOs/ [07:22:38] having ssh access to the workers would be really useful :/ [07:59:29] dcaro, dhinus: good morning, FYI with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136973 we've enabled the IRC notifications on awaiting user input also for the cloudcumin hosts (they share the same config). If there will be problems we'll disable it globally but in case there is need to have different values between cumin* and cloudcumin* it's easy to do it, just let us know :) [08:21:27] nice, I'll keep an eye, thanks! [08:42:48] thanks volans, I think we can test the feature and if we find good use cases we might want to redirect messages from cloudcumin to a different IRC channel. but no need to do it immediately. [08:44:47] yeah, it might need some changes in logmsgbot too [10:53:00] dhinus: i'm going to deploy the few quarry PRs i've just merged [10:53:16] taavi: ack [10:55:49] hrm, now a bunch of pods are stuck in `Terminating` again [11:09:54] i had forgotten how terrible the manual `mariadb` interface for editing metricsinfra config and alerts is [11:10:14] if only the person who implemented that could have come up with a better one [11:10:24] :) [11:22:15] yeah i think quarry-127a-g4ndvpkr5sro-node-1 is still broken, probably with its disk full [11:22:26] everything's running on the other node, but it's still not great [16:15:27] taavi: yep! metricsinfra is another "we want to do" project in case you are interested :) [17:39:20] * dcaro off [17:39:23] cya \o [18:53:09] taavi: still around? I'm trying to understand the web_proxy_resource.go tf provider [18:53:44] somewhat [18:55:07] Read() is erroring out when it doesn't find the proxy it's looking for. But isn't a 404 from the remote expected? It means the resource isn't there, which is what we're checking. [18:55:43] https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/blob/77e85852500c717605d39484b90347eeedd39ccf/internal/provider/web_proxy_resource.go#L191 [18:56:29] or maybe proxyClient() is supposed to return success and [] instead? [18:58:36] hmmm [18:59:22] updating https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/blob/77e85852500c717605d39484b90347eeedd39ccf/internal/provider/web_proxy_resource.go#L223 to return nil instead of the http request returning an error makes some sense to me [18:59:37] and looking at https://developer.hashicorp.com/terraform/plugin/framework/resources/read#define-read-method it seems like the provider needs to call `RemoveResource()` in that case [19:04:03] does the resource need a change? Won't the 404 show up for the provider? [19:05:02] not sure if i understand that question [19:05:51] oh no this file uses TABs [19:07:15] I mean like this: [19:07:27] https://www.irccloud.com/pastebin/bobQ2WN4/ [19:08:08] i guess that (but properly formatted) would work as well [19:08:09] try it? [19:12:48] This will be the first lines of go that I have ever written. I need to rebuild, don't I? [19:14:15] `make build` will produce a binary for you, and then you need something like this https://phabricator.wikimedia.org/P75224 in your `~/.terraformrc` to actually use it [19:15:50] 'make build' is making a lot of assumptions but I will try to figure it out. [19:16:50] I also can't make sense out of the gitlab pipeline failures, unless this just never passed the gitlab tests [19:19:21] it definitely did before [19:20:05] ok [19:20:21] * andrewbogott will now spend the next day and a half configuring a build environment