[13:41:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) cr2-eqdfw upgrade completed successfully today. [13:42:24] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) [15:50:26] topranks: would be a good time in a few minutes to deploy the homer upgrade with the fix? [15:50:41] I suspect after running sre.dns.netbox the dns servers are out of sync. Sometimes the AAAA addresses are reported and sometimes not. I'm unsure how to proceed. [15:51:43] volans: yep fire away [15:51:59] cwhite: did you query the AAAA records before adding them via the cookbook? [15:52:25] the cookbook updates the authdns servers, but the recursor might have cached the negative result [15:52:36] *you or any software, that is [15:53:07] indeed volans. [15:53:17] I did not query the IPs, but I did query the fqdns prior to adding them to netbox. [15:53:18] cwhite: what hostnames are you seeing the issue with? [15:53:40] example host: logstash2001.codfw.wmnet [15:53:41] one thing you can easily do is to wipe the reursors cache for those hostnames [15:54:16] see the sre.dns.wipe-cache cookbook's help [15:55:34] * cwhite gives that a try [15:55:45] All the auth servers return records for that [15:55:48] https://www.irccloud.com/pastebin/J0rhwewJ/ [15:56:13] It must have been cached. I'm getting consistent responses now. Thanks! [15:56:16] So like a cached negative entry as volans suggested [15:56:21] *likeyly [16:06:20] topranks: Connecting to device mr1-eqsin.wikimedia.org (user=homer ssh_config=None timeout=120) [16:06:42] yay [16:06:42] That looks good :) [16:06:57] There is a diff to be applied right? [16:07:04] still running [16:07:06] So we can test the commit works? [16:07:14] yes [16:07:27] netmon groups are changed [16:09:16] topranks: can I leave it to you to do the commit so you can validate the diff? [16:09:24] I'm not familiar with that diff [16:09:36] sure, I think it's safe (based on diff yesterday), but let me run it to be usre [16:10:34] thanks [16:12:30] worked on mr1-codfw.wikimedia.org :) [16:13:17] yay [16:14:19] same with drmrs - I think we're good :) [16:14:58] great, I'll do a full diff against *, but if you want to commit all the mr* first feel free [16:15:15] or I can do it if you tell me the diffs are fine :D [16:20:15] It's still running through the last of the mr* there [16:20:31] When I'm done you can run against "*" should be ok [16:24:45] perfect, thanks a lot [16:25:52] cool all done now [16:37:02] great [16:37:07] running diff [16:41:16] 10SRE-tools, 10Infrastructure-Foundations, 10Release-Engineering-Team: Investigate sharing releng common python code to pywmflib - https://phabricator.wikimedia.org/T316757 (10thcipriani) 05Open→03Declined For the time being, I don't think we have anything appropriate to upstream. [17:23:12] yay, Changes for 53 devices: No diff [17:23:53] 10SRE-tools, 10Infrastructure-Foundations, 10Release-Engineering-Team: Investigate sharing releng common python code to pywmflib - https://phabricator.wikimedia.org/T316757 (10Volans) Ok, no problem. Feel free to re-open if that changes. [20:56:11] 10SRE-tools, 10Infrastructure-Foundations, 10Release-Engineering-Team: Investigate sharing releng common python code to pywmflib - https://phabricator.wikimedia.org/T316757 (10thcipriani) >>! In T316757#8236998, @Volans wrote: > Ok, no problem. Feel free to re-open if that changes. Thank you for answering a... [21:34:35] 10Puppet, 10Infrastructure-Foundations, 10SRE: Facter is slow on a few hosts - https://phabricator.wikimedia.org/T251293 (10colewhite) raid_mgmt_tools does not detect raid on `clouddb1021` ` cwhite@clouddb1021:~$ sudo /usr/bin/ruby /var/lib/puppet/lib/facter/raid.rb | jq . { "raid": [ "megaraid" ] }...