[09:46:29] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10720712 (10cmooney) @ayounsi I've noticed a few gaps starting to appear in the gnmic graphs in Grafana since the newer devices... [10:05:39] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: Link down between cr3-ulsfo and cr4-ulsfo - https://phabricator.wikimedia.org/T390731#10720747 (10cmooney) It seems the work yesterday has not stopped the carrier transitions reported, although the number has decreased: {F59013584 wid... [11:29:59] volans: maybe you could advise me, I need to do some testing of DHCP on the Nokia switches, within current limitations [11:30:52] basically I need to somehow manually adjust the option-82 string our install server will try to match on so it will reply to a request formatted the only way the Nokia switches will send it right now [11:31:12] *or* I could adjust the snippet to match on the MAC if that is better [11:31:27] *or* [11:31:42] but basically I want to make sure the rest of the workflow completes as expected [11:33:01] running the sre.hosts.dhcp cookbook, then manually editing the snippet file it adds is the dumb/simple way forward, not sure if that's really good practice though [11:35:02] topranks: sure, I'm at lunch, can we do this after ? [11:35:09] no rush at all [11:35:20] also see the sre.hosts.dhcp cookbook if by any chance suits your needs [12:15:25] topranks: I'm back [12:16:47] volans: thanks [12:17:07] so yeah is manually editing the snippet created by the cookbook an option? Or is there a better way to achieve this? [12:17:09] yes the sre.hosts.dhcp is meant for debugging dhcp issues so editing the file generated by it to find the right config is ok I think [12:17:22] depends what you need to edit [12:17:29] if we can make the software do it for you or not :) [12:17:34] ok cool I will do that, I think it should be fine I'm familiar with the workflow [12:17:48] you need to run 'sudo dhcpincludes -r commit' [12:17:48] I'll just edit the option-82 string to match what the switch will send [12:18:06] on the install server I should run 'dhcpincludes -r commit' ? [12:18:17] after editing the string? [12:18:26] yes after editing, or probably just restart the isc-dhcp as the sre.hosts.dhcp would have probably already setup the include [12:18:42] but safer with dhcpincludes -r commit, one less thing to worry about [12:19:03] ok will do, I wasn't aware of that I think I did this way back and just restarted dhcpd [12:19:08] but indeed sounds like the better way [12:22:02] potentially you could also just add the snippet manually and run dhcpincludes, but realistically the cookbook saves you some steps [12:22:21] it might complain on exit that the file changed, I don't recall by heart the behavior, we'll see [12:22:34] using the cookbook and editing is less manual messing about so probably a little safer [12:22:58] I'll copy it somewhere first and can move it back when I'm done just in case [15:02:51] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10722080 (10Jgreen) 05Open→03Resolved Closing this as completed because we know it works. There is still m... [15:51:45] topranks: I didn't follow up on the DHCP/Supermicro discussion, but I had a chat only with Riccardo. From Redfish we can get the mac addresses afaics, so there shouldn't be any blocker [15:52:17] the main "issue" is the fact that if we don't find a NIC with LinkUp, we'll have to figure out what MAC address to save [15:52:23] with some heuristic [15:52:27] and/or ask to the operator [15:53:02] because the naming for BIOS/PXE setting is different compared to what we get via Redfish for the Network settings/values [15:53:16] but best if we do it during provision for sure [15:53:25] lemme know if there are other things that you want me to check [15:54:36] elukey: ok thanks for the info! [15:55:10] it sounds sort-of workable, we need to discuss with dc-ops at what stage of the process the provision normally takes place [15:55:42] if they've already run the netbox provision script - to allocate IPs and switch interfaces - and run the cookbook to configure the switch then the port should have a link [15:55:47] if they do it before that then we might have an issue [15:58:31] yes yes the port should have a link but sometimes we don't see the LinkUp status, I noticed it when testing the auto-discovery for PXE in provision [15:58:37] it happens with Supermicro and also for Dell [15:58:59] for Supermicro sometimes for the onboard 1g nic ports there is no "LinkStatus" [15:59:02] no idea why [15:59:13] in that case, the provision cookbook asks to the operator of the cookbook [16:09:07] ok yeah it's not ideal but it could work [16:09:30] The sensible defaults are probably "first 10G port if secondary 10G NIC present" or "first on-board 1G if no second NIC present" [16:09:51] but I know we've a load of existing servers connected to 1G that do have a 10G NIC in them unused, so this is far from universal either [17:59:47] 07Puppet, 06Infrastructure-Foundations: Improve the user experience adding new nodes to puppet - https://phabricator.wikimedia.org/T389932#10722979 (10bking) @jhathaway in addition to site.pp (which everyone uses), we are also using it to add row/rack awareness to our Elastic ([[ https://phabricator.wikimedia... [20:05:44] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [20:10:44] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [20:25:44] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [20:32:14] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [20:33:44] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/18/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [20:38:44] FIRING: [2x] NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/18/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [20:42:14] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [20:43:44] RESOLVED: [2x] NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/18/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [20:48:14] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [20:53:59] FIRING: [2x] NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/18/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [20:58:14] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [21:03:59] RESOLVED: [2x] NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/18/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [21:15:44] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [21:25:44] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [21:45:44] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [21:55:44] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [22:15:44] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [22:25:44] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [22:45:44] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [22:55:44] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [23:15:44] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [23:25:44] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [23:45:44] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [23:55:44] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting