[05:52:28] I see a lot of ssh-rsa keys in data.yaml, is there a plan to migrate away from them (to ssh-ed25519)? Or have I been too strong to force people to migrate on network devices with https://phabricator.wikimedia.org/T336769 ? [06:27:43] RSA keys are still fine, ed25519 have a lot of technical benefits (what's why we are recommending them on wikitech), but even to this day ssh-keygen still defaults to RSA keys as well [06:27:54] the reason is mostly compatibility [06:28:44] older sshd (especially embedded on hardware might not yet support ed25519), like those old JunOS we recently got rid off :-) [06:29:15] also older security tokens often don't support it [06:30:03] Bitu (the software behind the emerging IDM) tracks SSH key types specifically, this will allow us to eventually better track this [06:30:31] e.g. by showing people a warning to upgrade to a newer key type etc. [06:31:37] but specifically for your question, I think for SRE (and those with network access is an even smaller set) it's fine to trigger a change [06:32:26] if folks have a large RSA key and the Juniper hardware is slow, there's likely also be a notable speedup in login times [07:02:37] thanks for the detailed answer :) [08:53:21] 10SRE-tools, 10Infrastructure-Foundations: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10MoritzMuehlenhoff) @ayounsi There's now netflow2003 running Bookworm with FNM 1.2.4. If that works fine, we can reimage the other netflow* VMs in-place once Bookworm is stable. I copied over s... [09:23:51] 10netops, 10Infrastructure-Foundations, 10SRE: Netbox network report failing - timeout error getting connected_endpoint prefix - https://phabricator.wikimedia.org/T321704 (10cmooney) This appears to be happening more often now, and is starting to cause considerable noise in the dc-ops irc channel. @volans... [09:33:57] 10netops, 10Infrastructure-Foundations, 10SRE: Netbox network report failing - timeout errors - https://phabricator.wikimedia.org/T321704 (10cmooney) [12:22:19] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10cmooney) @volans I had a word with @ayounsi on this and we both feel if we can make it work via HTTP to the apt server that's probably best. I... [12:39:31] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769 (10ayounsi) [13:07:39] 10CAS-SSO, 10Infrastructure-Foundations: Add ApereoCAS as SSO provider for Semgrep Cloud Dashboard - https://phabricator.wikimedia.org/T336688 (10MoritzMuehlenhoff) So far we've been only using Apereo CAS for authentication against our self-hosted infrastructure. Given that SemGrep is more along the linesof ot... [13:34:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=c4ef01af-e7d5-458f-ae46-17500f124165) set by cmooney@cumin1001 f... [14:03:35] 10CAS-SSO, 10Infrastructure-Foundations: Add ApereoCAS as SSO provider for Semgrep Cloud Dashboard - https://phabricator.wikimedia.org/T336688 (10sbassett) 05Open→03Declined p:05Triage→03Low >>! In T336688#8864603, @MoritzMuehlenhoff wrote: > So far we've been only using Apereo CAS for authentication a... [14:50:59] 10netops, 10Infrastructure-Foundations, 10SRE: Netbox network report failing - timeout errors - https://phabricator.wikimedia.org/T321704 (10ayounsi) Upgrade to 3.2.9 didn't help, but we were expecting it a bit. At this point I guess that it's related to the steady increase of Netbox usage and we should loo... [16:08:42] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:10:10] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ayounsi) >>! In T330884#8863775, @MoritzMuehlenhoff wrote: > @ayounsi There's now netflow2003 running Bookworm with FNM 1.2.4. If that works fine, we can reimage the other... [17:08:42] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:11:44] 10netops, 10Infrastructure-Foundations, 10SRE: Create mechanism to disable IPv6 RA generation on irb interfaces when required - https://phabricator.wikimedia.org/T337057 (10cmooney) p:05Triage→03Medium [17:15:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Create mechanism to disable IPv6 RA generation on irb interfaces when required - https://phabricator.wikimedia.org/T337057 (10cmooney) [17:15:47] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10cmooney) [17:37:55] have a hiccup with Phabricator. msw-c5-eqiad was just replaced. I was able to do "Move devices attributes" script with out any issues. When I manually tried to set rack / U it would not let me with error "Invalid procurement ticket (must start with RT or T then digits) error" and was not able to get around it Manually. The script did bypass it error [17:42:18] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10cmooney) @Jclark-ctr all went well with that today thank you for your help. For the next phase we need to move the following links: |No|Ra... [17:44:03] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10cmooney) The migration went fine today, very quick move and all came up as expected. EVPN MAC-move BGP signalling worked flawlessly was nice to see in... [17:46:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Create mechanism to disable IPv6 RA generation on irb interfaces when required - https://phabricator.wikimedia.org/T337057 (10cmooney) Another option, similar to the above patch, would maybe to make it a global toggle for a device. So like... [17:46:47] I can repro it on netbox-next, checking the code looks correct [17:46:52] I'm having a look [17:52:32] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10ssingh) >>! In T292095#8865511, @cmooney wrote: > @Jclark-ctr all went well with that today thank you for your help. > > For the next phase... [17:56:58] jclark-ctr: ok I've found the issue, with the new validation we had custom fields validated twice and somehow didn't work, so I removed the one in the custom field definition for now [17:58:28] it should be solved [17:59:39] FYI XioNoX I've removed the regex validation from the procurement ticket custom field as it's also in the validator's code. Seemed better to keep the validation all in one place [18:01:32] agreed [18:01:58] thx :) [18:28:07] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Volans) Ok that sounds like a plan, let's try first if the FQDN link works and if not we'll fallback to the IP. Based on the test we might add t... [18:30:58] 10netops, 10Infrastructure-Foundations, 10SRE: Netbox network report failing - timeout errors - https://phabricator.wikimedia.org/T321704 (10Volans) I'll try to have a look next week, but for now I downtimed the alert so it doesn't spam too much until the end of the month. https://icinga.wikimedia.org/cgi-bi... [18:51:04] topranks: FYI you need to run the sre.dns.netbox cookbook because of the addition of irb-1035 on ssw1-f1-eqiad [21:18:44] volans: my bad sorry, added and removed that a bunch of times must have forgot last one [21:18:51] running now thanks