[04:03:16] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 2 others: VRTS e-mail address unreachable / e-mail routing issue - https://phabricator.wikimedia.org/T380009#10334553 (10Krd) I'd say this or any such problem should not occur again, as we definitely lost tickets, and the actual imp... [08:59:08] 10SRE-tools, 06Infrastructure-Foundations, 06SRE, 07IPv6: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136#10334896 (10Volans) Full list of hosts without AAAA records for `A:owner-infrastructure-foundations` ` ganeti[2017-2024].codfw.wmnet,ganeti[1009,1011-... [09:32:47] 10SRE-tools, 06cloud-services-team, 06Infrastructure-Foundations, 07IPv6: Some WMCS clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271139#10334985 (10Volans) I think it can be resolved, list updated as of today: ` an-redacteddb1001.eqiad.wmnet,clouddb2002-dev.codfw.wmnet,cloud... [10:22:39] I'm upgrading postgresql on netbox-dev [10:31:05] 07Puppet, 06cloud-services-team, 10Cloud-VPS: Preserve formatting and comments etc. in ENC Hiera - https://phabricator.wikimedia.org/T250622#10335164 (10dcaro) Currently not supported by the pyyaml https://github.com/yaml/pyyaml/issues/90 [12:41:21] moritzm: are you finished working on netbox-dev? [12:53:02] yes, go ahead! [12:55:35] 10CAS-SSO, 06Infrastructure-Foundations, 06SRE: Registry of multiple webauthn devices - https://phabricator.wikimedia.org/T380180#10335531 (10SLyngshede-WMF) After trying, and failing, to register a passkey, I've been digging through CAS and the java-webauthn-server source code. If we want passkeys we'll nee... [13:05:53] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 2 others: VRTS e-mail address unreachable / e-mail routing issue - https://phabricator.wikimedia.org/T380009#10335562 (10Ruthven) Hi, I've got the information that someone wrote to `permissions-it@wikimedia.org` on 15/11/2024-11-15... [13:14:03] thanks! [13:14:36] for some reason netbox complained but restarting rq-netbox.service made it happy again [13:19:37] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 2 others: VRTS e-mail address unreachable / e-mail routing issue - https://phabricator.wikimedia.org/T380009#10335590 (10revi) >>! In T380009#10335562, @Ruthven wrote: > Hi, > I've got the information that someone wrote to `permissi... [13:35:18] 07Puppet, 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10observability: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10335639 (10jcrespo) With BBU: ` root@backup1012:~$ ./storcli64 show all J { "Controllers":[ {... [14:10:15] 10netops, 06Infrastructure-Foundations, 06serviceops, 07Kubernetes: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10335773 (10ops-monitoring-bot) depool host wikikube-worker1290.eqiad.wmnet by a... [14:10:57] 10netops, 06Infrastructure-Foundations, 06serviceops, 07Kubernetes: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10335776 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node star... [14:19:29] 10netops, 06Infrastructure-Foundations, 06serviceops, 07Kubernetes, 13Patch-For-Review: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10335855 (10akosiaris) >>! In T379790#10330660, @cmooney w... [15:33:41] o/ elukey, mostly failure with the thanos node, but I will do some more testing today [15:33:49] How do I reach out to our supermicro support? [15:38:48] jhathaway: o/ sent in pvt the email address of our repr [15:39:07] Cc also Willy if you can so he is aware [15:39:48] I asked Valerie to give me the BMC pass for thanos-be1005, so I'll provision it in the meantime [15:39:55] anything that I can do to help in your tests? [15:48:00] elukey: not sure yet, I think the issue is in the BMC firmware, but black box testing the firmware is tricky, especially with long reboots and the difficulty of resetting to the initial factory settings. [15:48:20] :( [15:49:56] as an example I am unable to even delete debian from the boot options in the bmc. delete seems to work, but the entry is still present [16:02:58] do you mean via redfish? [16:23:31] no, manually via the bios screen [16:31:27] ahh okok [17:47:59] 10netops, 06Infrastructure-Foundations, 06SRE: Consolidate Automation Templates for DC Switches - https://phabricator.wikimedia.org/T312635#10337207 (10Aklapper) a:05cmooney→03None @cmooney: Removing task assignee as this open task has been assigned for more than two years - See the email sent to task as... [18:00:58] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 3 others: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10337359 (10cmooney) Ok. So I've tested the "[[ https://netbox.wikimed... [18:16:22] re: T378835 , if I wanted to auto-detect the NICs using existing spicerack capabilities, is that easier via netbox or redfish (or something else)? [18:16:22] T378835: Test 1G NIC compatibility, default to TFTP in sre.hosts.reimage cookbook - https://phabricator.wikimedia.org/T378835 [19:14:50] inflatador: Netbox is largely correct but on the server side we don't rely on the port speed setting for anything in our automation, so I'm not sure I'd 100% trust it [19:15:18] one of the trickier parts here is a host with a PCIe NIC will also have on-board 1G ports [19:15:36] so we need to know what the cabled-up port is, and what speed it is [19:15:54] what's the exact use-case? the switch-side of the interface is probably accurate in netbox [19:19:46] topranks It's just so the cook-book can run successfully on 10G Broadcom NICs (setting TFTP). It's not the end of the world if we accidentally use TFTP on 1G, but as of now the reimage cook-book doesn't work out of the box for hosts w/10G NICs [19:19:50] In Pynetbox you can probably get it from: [19:19:51] nb_server_object.primary_ip4.assigned_object.connected_endpoints[0].type.value [19:20:18] inflatador: I believe this is being worked on to set tftp as the default for 10/25G ports [19:20:20] That being said, I know UEFI is moving very quickly, so if y'all think it's better to wait, that's fine too [19:20:32] yes there is progress on both of these fronts [19:20:34] oh, is someone already working this, nice [19:20:53] https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1092802 [19:21:02] Luca has been working on it today yes [19:21:53] topranks excellent, it's in much better hands than mine ;P [19:22:21] yep, mine too :) [19:22:35] in which case I will happily defer to that experts =) [19:22:44] errr...the experts