[08:39:10] 10netops, 10Infrastructure-Foundations, 10SRE: cr1-esams:fpc0 errors - https://phabricator.wikimedia.org/T346779 (10ayounsi) 05Open→03Resolved All good. [08:41:01] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade asw1-eqsin - https://phabricator.wikimedia.org/T332395 (10ayounsi) a:03ayounsi [10:02:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade asw1-eqsin - https://phabricator.wikimedia.org/T332395 (10ayounsi) Latest Junos recommended has been copied to /var/tmp/ Next steps: downtime the site and proceed with the upgrade : https://wikitech.wikimedia.org/wiki/Juniper_switch... [10:39:37] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: puppet7 on cumin breaks database connections - https://phabricator.wikimedia.org/T352974 (10Marostegui) p:05Medium→03High This can also prevent schema changes to be fully applied to all the replicas. [10:41:40] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337 (10dcaro) Currently the only workaround I've found (as we don't use elasticsearch itself) is to install in the local v... [10:49:37] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337 (10dcaro) Note that as of jan 2024, you will need also to workaround that python-kafka<=2.0.2 does not work with pytho... [10:53:53] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: [spicerack] python-kafka does not support python 3.12, there's a fix but there has not been any releases since 2020 - https://phabricator.wikimedia.org/T354410 (10dcaro) [10:55:44] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: [spicerack] python-kafka does not support python 3.12, there's a fix but there has not been any releases since 2020 - https://phabricator.wikimedia.org/T354410 (10dcaro) [10:55:58] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Revert dbstore migration to puppet7 - https://phabricator.wikimedia.org/T354411 (10Marostegui) [10:56:23] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Revert dbstore migration from puppet7 to puppet5 - https://phabricator.wikimedia.org/T354411 (10Marostegui) [11:29:35] 10netops, 10Ganeti, 10Infrastructure-Foundations, 10SRE: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152 (10ayounsi) Next steps to create a production grade routed cluster: # {T353935} # Assign a private and optionally public IPv4 and v6 range for codfw # Add a Hiera key `pro... [11:34:16] 10netops, 10Ganeti, 10Infrastructure-Foundations, 10SRE: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152 (10MoritzMuehlenhoff) Let's start the routed ganeti setup directly on Bookworm (IOW reimage ganeti2033/2024 after the move); the regular Ganeti clusters are still on Bullse... [12:22:10] (SystemdUnitFailed) firing: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:46:59] 10netops, 10Ganeti, 10Infrastructure-Foundations, 10SRE: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152 (10ayounsi) Will do, thanks. For prefix allocation I'm suggesting the following, let me know what you think (especially @cmooney ! ) * eqiad * 10.64.24.0/23 - private1-v... [12:47:10] (SystemdUnitFailed) resolved: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:22:55] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Znuny, 10collaboration-services: OTRS/mail: investigate why "T=remote_smtp_signed: all hosts for 'ticket.wikimedia.org' have been failing for a long time" - https://phabricator.wikimedia.org/T297160 (10LSobanski) a:05Arnoldokoth→03None [14:10:53] 10SRE-tools, 10DBA, 10Data-Platform-SRE, 10Infrastructure-Foundations, and 3 others: Revert dbstore migration from puppet7 to puppet5 - https://phabricator.wikimedia.org/T354411 (10BTullis) a:03BTullis I've got no problem with this. I think that I can run the **rollback** steps from T349619. [14:12:06] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 3 others: Revert dbstore migration from puppet7 to puppet5 - https://phabricator.wikimedia.org/T354411 (10BTullis) [14:12:16] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 3 others: Revert dbstore migration from puppet7 to puppet5 - https://phabricator.wikimedia.org/T354411 (10Marostegui) I am not sure if that'll bring us everything back or we'll need to do something with the mariadb certificates too cc @AB... [15:59:00] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: puppet7 on cumin breaks database connections - https://phabricator.wikimedia.org/T352974 (10jbond) >>! In T352974#9392688, @ABran-WMF wrote: > it appears that most of our hosts are still using `/etc/ssl/certs/Puppet_Internal_CA....