[09:13:02] hi folks [09:13:28] I'm seeing some issues in deployment-acme-chief05 to validate DNS challenges against 208.80.154.148 [09:13:57] maybe there was some DNS infrastructure change and acme-chief config wasn't updated there? [09:14:41] beta.wmflabs.org. NS records currently point to ns0.openstack.eqiad1.wikimediacloud.org. (185.15.56.162) and ns1.openstack.eqiad1.wikimediacloud.org. (185.15.56.163) so it looks to me like 208.80.154.148 isn't there anymore :) [09:17:20] sounds likely! iirc we now set authdns_servers on cloud-wide hiera, so beta probably has an outdated override that can just be removed [09:18:09] I've updated the override and now acme-chief is back to issuing certs as expected there [09:19:20] taavi: I see on cloud/eqiad1.yaml that it's using the FQDN and for some obscure reason that I don't remember at the moment in cloud we have the IP as the key in authdns_servers as well [09:19:49] so 185.15.56.162: 185.15.56.162 rather than ns0.o.e.wc.org: 185.15.56.162 [09:20:07] hmm, I'm very curious what that reason would be [09:20:34] probably related to avoid acme-chief avoid attempting to reach the DNS servers via IPv6? [09:21:15] but I'm not seeing any AAAA records there [09:26:08] yeah. and it's been that way for a while now and everything seems to be working [11:45:05] !log tools reboot tools-sgeexec-10-8 which had high load average [11:45:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:15:12] Hi there, I was having some trouble working with volumes in a cloud vps project the other day and was hoping to get some help with it: https://phabricator.wikimedia.org/T350586 [15:34:32] andrewbogott: ^ fun with cinder volumes if you have some time to help JSherman [15:34:45] * andrewbogott looks [15:37:17] JSherman: I responded, please follow up on the task and I'll see what I can do. [15:38:07] andrewbogott: ack; thanks! [15:38:20] thx for the ping bd808 [15:40:13] * bd808 is brilliant at volunteering others to help [16:05:33] JSherman: I did the 'requires root' bits and assigned the task back to you, lmk if you wind up stuck again. [16:12:13] andrewbogott: thanks! [17:53:16] Hello, poofy cloud apparitions. Our traffic namespace is still trying to use the since-removed ns-recursor1.openstack.eqiad1.wikimediacloud.org by default when launching an instance. Where is this default set so that we can use the anycast stuff by default? I can't seem to locate it [18:00:28] brett: hi, which Debian version? it might be embedded in the base images, although I'd imagined we'd rebuilt them at some point after flipping the defaults (cc andrewbogott) [18:00:46] taavi: Thanks for the help. It's Debian 12 [18:00:54] I can check other versions if that's helpful [18:10:10] Not sure, I may need to built a new one. But of course puppet ought to be taking care of it unless you're running w/out puppet [18:20:59] andrewbogott: I am running with puppet: It's failing on run-puppet-agent: Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Method call, DNS lookup failed for ns-recursor1.openstack.eqiad1.wikimediacloud.org Resolv::DNS::Resource::IN::A (file: /etc/puppet/modules/resolvconf/manifests/init.pp, line: 25, [18:21:02] column: 34) on node traffic-acmechief01.traffic.eqiad1.wikimedia.cloud [18:21:28] That said, I tested a random instance on both debian 11 and debian 12 and they're running run-puppet-agent successfully after boot [18:21:36] ok. I can't look at that right this second, can you make me a ticket? [18:21:55] Wait, I'm confused... if it works for 'a random instance' then where is it not working? [18:22:18] It's not working on traffic-acmechief01 which required a little finnagling to get going [18:22:43] ok, but to be clear that's not a new VM right? (You initially said this was happening 'when launching an instance') [18:22:46] But that's using the same debian 12 image on launch.... [18:23:01] It is a new one [18:24:13] brett: random thought: does that project have a local puppetmaster? if so, is the puppet tree up to date? [18:24:48] taavi: It does, and it's an old one. I'll check to make sure it's up to date [18:24:56] Sounds like a likely culprit [18:27:03] Ugh, looks like it's last updated in July and it's a dirty tree. Lovely [18:27:15] I imagine this is the issue. Thanks for the help, both of you [18:28:08] * andrewbogott looked away and found the problem solved on his return :)