[07:47:00] <taavi>	 please review: https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1142547
[08:02:26] <taavi>	 also https://gerrit.wikimedia.org/r/c/operations/puppet/+/1142546
[08:15:31] <arturo>	 taavi: LGTM
[09:35:37] <arturo>	 taavi: please review https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/57
[09:36:37] <taavi>	 arturo: can you update https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/25 to show it working?
[09:37:05] <arturo>	 taavi: https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/jobs/508968
[09:37:36] <taavi>	 thanks, lgtm
[09:37:45] <arturo>	 thanks
[10:04:36] <btullis>	 Morning! I have this change that affects `ceph.conf` - https://gerrit.wikimedia.org/r/c/operations/puppet/+/1144583
[10:06:58] <btullis>	 This touches your clusters too, and will need a rolling restart  to implement, I believe. Would you like to co-ordinate around the timing and testing? 
[10:13:36] <arturo>	 btullis: could the patch be merged without a restart of the daemons?
[10:15:13] <btullis>	 Oh, sorry. Yes, I just meant that it won't take effect until a restart happens. Which is fine. It would just be nice to know that it restarts cleanly afterwards, but you could do that at any time you like.
[10:16:52] <btullis>	 I plan to do a rolling restart of the cephosd100[1-5] cluster as soon as it is merged, but I am running reef.
[10:16:54] <arturo>	 so I guess the answer to your original question is yes --  we would like to coordinate the timing
[10:18:27] <btullis>	 I suppose I could re-work the patch to make it select on clusters. But I'n not sure it is worth it for this change. What do you think?
[10:18:34] <arturo>	 btullis: I've sent a calendar invite for tomorrow
[10:18:55] <btullis>	 Ack, nice.
[10:19:12] <arturo>	 david is out today. I guess we can do the rollout restart tomorrow in that slot, if that works for you
[10:19:31] <btullis>	 Perfect. Thanks.
[10:20:23] <arturo>	 thanks you :-)
[10:51:45] <arturo>	 taavi: https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/26 and https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/60
[10:53:42] <taavi>	 arturo: lgtm
[10:56:17] <arturo>	 thanks
[10:57:10] <arturo>	 taavi: please approve as well this one: https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/26
[10:57:55] <taavi>	 done
[10:58:00] <arturo>	 thanks
[11:14:48] <arturo>	 taavi: are you interested if I migrate https://gitlab.wikimedia.org/repos/cloud/metricsinfra/tofu-provisioning to the new layout used by the toolforge one?
[12:03:34] <arturo>	 created https://gitlab.wikimedia.org/repos/cloud/metricsinfra/tofu-provisioning/-/merge_requests/2 but it is missing the creds, which I will only generate if you agree with the change
[12:04:35] <taavi>	 arturo: sure. the main question is figuring out how to handle the various database credentials etc it provisions, which currently just live in a gitignored file in my local checkout of that repo
[12:04:57] <taavi>	 i think we want to structure that in a way where we can at some point provision that at codwf1dev
[12:05:12] <arturo>	 I guess puppet is the way to go for such secrets
[12:06:22] <taavi>	 the opentofu code needs those secrets as they're fed to the trove api, how do you get puppet to do that?
[12:06:36] <arturo>	 mmm right
[12:06:41] <arturo>	 so they need to live in the repo
[12:06:43] <taavi>	 also if we're going to have a lot more of those service accounts soon we're in need of something more scalable than https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/base/files/labs/notify_maintainers.py#31
[12:06:55] <taavi>	 i guess you could use gitlab secrets
[12:07:07] <taavi>	 but creating them by hand and the copying to puppet is not the best
[12:09:05] <arturo>	 yes gitlab secrets could be nice
[12:09:47] <arturo>	 could you deploy a secret into a VM filesystem from opentofu? :-S
[12:11:05] <taavi>	 no idea
[12:11:21] <arturo>	 this maybe just be another instance of not having a good secrets solution overall
[12:12:34] <arturo>	 didn't andrew try to deploy openstack barbican at some point?
[12:32:54] <arturo>	 I just created https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/OpenTofu with the intention of it being the entry point for how we use tofu across projects
[12:36:52] <dhinus>	 arturo: nice, thanks!
[12:39:20] <arturo>	 yw
[12:53:46] <arturo>	 please review https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/69
[13:13:54] <volans>	 Hi, FYI cloudbackup200[3-4] and cloudrabbit200[1-3]-dev have puppet disabled since almost a week. They are linking T390914. We shouldn't leave hosts with puppet disabled for longer period of time.
[13:13:54] <stashbot>	 T390914: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914
[13:17:34] <taavi>	 andrewbogott: ^
[13:17:55] <andrewbogott>	 volans: thanks for the poke, I will resolve that shortly
[13:18:02] <volans>	 thanks!
[13:21:12] <andrewbogott>	 arturo: I have a couple of codfw1dev networking questions. First, new VMs created there with the dual-stack network look like this:
[13:21:17] <andrewbogott>	 https://www.irccloud.com/pastebin/snkhSLfY/
[13:21:32] <andrewbogott>	 My very sophisticated question is: what's the deal with having two v6 addresses?
[13:22:35] <andrewbogott>	 And my followup question is... is there any chance that's related to me getting a 503 from the cloudlb when that VM tries to talk to radosgw?
[13:22:51] <taavi>	 one of them is a link-local address and the other is the globally routable "real" address
[13:23:15] <taavi>	 unlikely
[13:23:28] <taavi>	 which 503 are you exactly getting from and where?
[13:25:00] <andrewbogott>	 ok but eqiad1 VMs don't seem to have that link-local address do they?
[13:25:19] <andrewbogott>	 taavi: the 503s are happening here:
[13:25:20] <andrewbogott>	 root@tfbastion:~/tf-infra-test# TF_LOG=DEBUG tofu apply -var datacenter=codfw1dev
[13:25:56] <andrewbogott>	 it can talk to everything except radosgw.  And I /can/ talk to radosgw from labtesthorizon
[13:26:06] <taavi>	 they do? in general their ipv6 connectivity would be totally broken without it?
[13:26:17] <taavi>	 can you just paste the error?
[13:28:07] <andrewbogott>	 So even if a VM is only set up in the legacy network it still has the v6 link-local address
[13:28:15] <andrewbogott>	 I think that's what was confusing me
[13:29:08] <andrewbogott>	 OK, so I will ignore ipv6 as a candidate for this
[13:29:16] <andrewbogott>	 Here's a snip of a tofu debug output:
[13:30:10] <andrewbogott>	 https://www.irccloud.com/pastebin/9iQVIpiN/
[13:30:21] <andrewbogott>	 the same action works in eqiad1.
[13:30:41] <andrewbogott>	 Last night I was sure that the 503 was coming from haproxy and not from radosgw but today I'm not longer sure about that
[13:39:26] <taavi>	 trying the url mentioned in the stack trace with curl manually results in a 403
[13:39:38] <taavi>	 and i don't see anything strange in any of the haproxy metrics
[13:39:48] <taavi>	 so that to me suggests an issue with one of the rados backend services
[13:41:18] <andrewbogott>	 ok. I spent ages trying to pry logs out of rados and never saw evidence that anything was hitting it other than health checks. But I can take another stab at that.
[13:41:42] <andrewbogott>	 (that lack of rados logs was why I started to blame the proxy)
[13:47:59] <andrewbogott>	 huh, when I curl I see it in the rados logs. But when tofu tries the same thing... no logs.
[13:53:38] * andrewbogott wants to read hidden RH docs for the first time ever https://access.redhat.com/solutions/6986506
[14:12:10] <andrewbogott>	 same creds and same action work with the openstack cli.
[15:51:57] * arturo offline