[07:17:10] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10User-jbond: fetch_external_clouds_vendors_nets.py fails to update DigitalOcean network ranges - https://phabricator.wikimedia.org/T313206 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez DigitalOcean restored the CSV and it's now working as... [07:17:18] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, and 2 others: varnish filtering: should we automatically update public_cloud_nets - https://phabricator.wikimedia.org/T270391 (10Vgutierrez) [07:17:24] 10netops, 10Infrastructure-Foundations: asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10ayounsi) p:05Triage→03High [07:30:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Marostegui) Critical DB infra there: - dbproxy1020 (m3 current proxy): needs failover. - pc1013 active pc3 master: needs failover - db1181 s7 master: needs failover T313383... [07:30:24] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Marostegui) [07:31:45] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10ayounsi) p:05Triage→03High [07:33:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Marostegui) [07:47:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10ayounsi) This didn't get caught by monitoring. We have a LibreNMS alert that triggers when any "emergency" log is sent by a device, but loo... [07:49:21] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10Peachey88) [08:14:07] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10ayounsi) [08:14:13] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10ayounsi) [08:14:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10ayounsi) [08:43:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: cr2-eqiad:FPC3 partial failure (PIC2/3) - https://phabricator.wikimedia.org/T312745 (10ayounsi) 05Resolved→03Open Since the replacement errors rate on one of the interfaces went though the roof: https://librenms.wikimedia.org/graphs/to=1658306... [08:50:29] 10puppet-compiler, 10Infrastructure-Foundations: investigate state of puppet 7 - https://phabricator.wikimedia.org/T313387 (10jbond) p:05Triage→03Medium [08:53:04] 10puppet-compiler, 10Infrastructure-Foundations: investigate state of puppet 7 - https://phabricator.wikimedia.org/T313387 (10MoritzMuehlenhoff) Since puppetdb missed the Bullseye release, we should work on a backport to Bullseye? Seems doable, possibly needs backporting various Clojure libs alongside. We can... [08:54:11] 10puppet-compiler, 10Infrastructure-Foundations: investigate state of puppet 7 - https://phabricator.wikimedia.org/T313387 (10MoritzMuehlenhoff) [08:54:30] 10puppet-compiler, 10Infrastructure-Foundations: investigate state of puppet 7 - https://phabricator.wikimedia.org/T313387 (10MoritzMuehlenhoff) [09:15:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10ayounsi) Opened high severity JTAC case 2022-0720-513915. In the meantime we need to discuss if we want to preemptively replace FPC5 with a... [10:22:59] 10puppet-compiler, 10Infrastructure-Foundations: investigate state of puppet 7 - https://phabricator.wikimedia.org/T313387 (10fgiunchedi) Just a note to add that this will also move host certificates to use SAN instead of CN and thus we can revert https://gerrit.wikimedia.org/r/c/operations/puppet/+/815304 whe... [10:37:01] 10puppet-compiler, 10Infrastructure-Foundations: investigate state of puppet 7 - https://phabricator.wikimedia.org/T313387 (10jbond) >>! In T313387#8090619, @fgiunchedi wrote: > Just a note to add that this will also move host certificates to use SAN instead of CN and thus we can revert https://gerrit.wikimedi... [10:37:18] 10puppet-compiler, 10Infrastructure-Foundations: investigate state of puppet 7 - https://phabricator.wikimedia.org/T313387 (10jbond) [10:37:20] 10CAS-SSO, 10Puppet, 10Infrastructure-Foundations, 10Orchestrator, and 2 others: Puppet host certs do not contain Subject Alt Name entries - https://phabricator.wikimedia.org/T273637 (10jbond) [11:17:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Marostegui) m3-master dbproxy has been failed over. [11:34:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10dcaro) [13:20:22] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: cr2-eqiad:FPC3 partial failure (PIC2/3) - https://phabricator.wikimedia.org/T312745 (10ayounsi) 05Open→03Resolved Nevermind, tracked in T313337 [14:02:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10cmooney) Agreed this is a good idea. I can see why it may have been "left alone" previously but given we'd had issues best to bite the bullet and do it. The 40G u... [14:04:03] reprepro now has a new maintainer (and also new upstream) https://tracker.debian.org/news/1347210/accepted-reprepro-531-1-source-into-unstable/ [14:04:23] this is great new since the former maintainer was also the original author and got inactive since ~ 3 years [14:04:40] nice [14:05:05] and there's various thirdparty patches which now can end up in the archive again (e.g. to support multiple versions across components or SHA-512 support) [14:10:20] are they going to rename it repreprepro? [14:15:09] nice! [14:15:51] moritzm: just to make sure i understand -- does reprepro currently not support multiple versions of the same package each in a different component? or by across components did you mean multiple versions of the same package inside the same component? [14:22:01] currently the versions are stored are simplicistic as a "tuple" of (distribution, arch, component, software) [14:22:22] I haven't looked at the patch yet, but it should allow things like rollback and also to host two versions within the same component [14:22:40] awesome [14:22:51] there's a very long task at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=570623 with all the gory details. but the old maintainer was reluctant to change it [14:23:14] and the Ionos people the forked reprepro to carry these patches: https://github.com/ionos-cloud/reprepro [14:23:32] and that will eventually allow to properly upstream all of these and retire all those forks [14:31:38] moritzm: that is exciting! [14:32:02] yeah the multiple versions thing seems quite nice for if we want to do some longer-lived canarying of new versions [14:32:41] yeah, we explicitly did not use reprepro at my last job, due to the lack of multi version support [20:39:06] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10wiki_willy) a:03Jclark-ctr [21:41:52] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10nskaggs) >>! In T313382#8090176, @Marostegui wrote: > - dbproxy1018 and dbproxy1019 are active WMCS proxies, need to be handled by them cc @nskaggs (they should...