[02:48:12] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:48:12] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:23:57] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9788897 (10MoritzMuehlenhoff) [10:07:48] 10netops, 06Infrastructure-Foundations, 06SRE: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929#9789580 (10taavi) >>! In T187929#9748100, @cmooney wrote: > The aggregate that is used for the cloud-private allocations should come from IPv6 space not announced to the internet/DFZ, or space that i... [10:25:11] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE: CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606#9789660 (10taavi) 05Stalled→03Declined Closing this in favour of the slightly different approach in {T358868} that's likely going t... [10:28:35] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 07Epic: CloudVPS: network architecture - https://phabricator.wikimedia.org/T209460#9789669 (10taavi) 05Open→03Resolved Closing this task since I don't see a clear end goal here. Current ongoing and planned work is already... [10:48:12] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:53:07] 10netbox, 10ChangeProp, 06collaboration-services, 06Infrastructure-Foundations, and 10 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9789858 (10MoritzMuehlenhoff) Redict is now packaged in Debian: https://tracker.debian.org/pkg/re... [11:53:48] 10netbox, 10ChangeProp, 06collaboration-services, 06Infrastructure-Foundations, and 10 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9789859 (10MoritzMuehlenhoff) [14:48:31] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Create (or teach Andrew how to create) private connections+dns entries for new cloudcontrols - https://phabricator.wikimedia.org/T364559#9790593 (10cmooney) 05Open→03Resolved p:05Triage→03Medium >>! In T364559#... [14:48:51] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:18:15] btw the other day there was a force pu [15:18:20] *forced [15:19:02] puppet run on more than 64 hosts (cumin's concurrent ssh connections limit) without --batch and it seems it didn't affect in any obvious way the puppetservers [15:20:08] while it's good to use smaller batches it's probably worth resurrecting T280622 and checking if maybe, in particular for emergency runs, we don't need to use batch anymore [15:20:08] T280622: Determine safe concurrent puppet run batches via cumin - https://phabricator.wikimedia.org/T280622 [15:31:53] volans: we could probably calculate the right number based on number of puppetservers + cores + jruby threads configured [15:39:50] I guess also what helped in that case was that the servers were spread across all DCs [15:40:01] so hitting all puppetmasters [15:40:17] while the same batch within the same DC would hit half of them [15:40:54] nod, can we batch per dc? [15:41:06] using different aliases :D [15:41:27] :) [18:53:02] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:53:02] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed