[01:05:32] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:05:33] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:45:34] (SystemdUnitFailed) firing: (3) ifup@eno12399np0.service Failed on ganeti1037:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:36:24] who is the PKI expert now ? [09:41:59] Chris [09:48:04] 10CFSSL-PKI, 10Infrastructure-Foundations: CFSSL gencert "remote error: tls: certificate require" - https://phabricator.wikimedia.org/T355750 (10ayounsi) [09:48:25] noted, thx, not sure where to start to debug this : https://phabricator.wikimedia.org/T355750 [10:04:07] 10CFSSL-PKI, 10Infrastructure-Foundations: CFSSL gencert "remote error: tls: certificate require" - https://phabricator.wikimedia.org/T355750 (10taavi) [12:49:20] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:59:21] (SystemdUnitFailed) firing: (2) prometheus-ganeti-exporter.service Failed on ganeti1038:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:14:20] (SystemdUnitFailed) firing: (2) prometheus-ganeti-exporter.service Failed on ganeti1038:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:26:26] 10Puppet, 10Infrastructure-Foundations, 10Toolforge, 10Goal, 10cloud-services-team (Kanban): Fully puppetize Grid Engine - https://phabricator.wikimedia.org/T88711 (10dcaro) [15:27:02] 10Puppet, 10Toolforge, 10Documentation: Document our GridEngine set up - https://phabricator.wikimedia.org/T88733 (10dcaro) 05Open→03Declined No more grid work is going to be done, we are retiring it :) [15:29:06] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [15:55:57] cdanis: do you have any thoughts on pki2002 in relation to our network move in codfw rack b5 tomorrow? [15:56:49] we're just moving the host's uplink so will be a brief outage while it's moved, I was thinking possibly we could just go ahead but I'm not too familiar with it [16:03:06] topranks: i am pretty sure a brief outage will be fine [16:03:11] and hey if it isn't I'll learn something ;) [16:03:46] haha. yeah it'll be fairly brief, I expect most requests will retry. and yes, free education otherwise :) [16:13:06] 10CFSSL-PKI, 10Infrastructure-Foundations: CFSSL gencert "remote error: tls: certificate require" - https://phabricator.wikimedia.org/T355750 (10CDanis) p:05Triage→03Medium a:03CDanis [17:58:10] 10netops, 10Data-Persistence, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10cmooney) [18:03:46] 10netops, 10Data-Persistence, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10Marostegui) @cmooney will you issue a downtime before the maintenance for each host? [18:15:33] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:02:57] 10netops, 10Data-Persistence, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10hashar) + @jnuche from release engineering who knows even more about Jenkins than me :-) `contint2002` hosts... [22:15:33] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed