[02:14:00] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [06:14:00] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [08:55:26] 10CFSSL-PKI, 10Ganeti, 10Infrastructure-Foundations, 10Patch-For-Review: Migrate Ganeti-rapi to use pki - https://phabricator.wikimedia.org/T350686 (10MoritzMuehlenhoff) [09:31:20] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 3 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [09:48:16] 10CFSSL-PKI, 10Ganeti, 10Infrastructure-Foundations, 10Patch-For-Review: Migrate Ganeti-rapi to use pki - https://phabricator.wikimedia.org/T350686 (10MoritzMuehlenhoff) [10:14:01] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [11:52:58] topranks: thanks for the patch for the provision cookbook , out of caution could you try to test it (after running puppet on the cumin host of your choice) on both an old and new model? lmk if you need help with the flags to pass [11:54:22] volans: hey thanks yes. I'll try to find one example of each, I guess it's safe to run any time right? [12:13:10] volans: I'm assuming for existing host just run it with "--no-dhcp --no-switch --no-users" ? [12:13:15] if it's a noop yes, if it changes things it could reboot the host [12:13:40] sretest anytime, any insetup role hosts anutime (like one provisioned yesterday by pap.aul for example) [12:13:46] flags are correct [12:14:12] sretest1001 is a poweredge r440 so maybe I'll use that, and one of the new r750s not yet in service [12:14:19] +1 [12:14:43] thanks [12:56:05] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 3 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [13:08:04] volans: that seems to be ok based on tests with sretest1001 (r440) and elastic2108 (r450) [13:08:14] R440: Skipped set of attribute BIOS.Setup.1-1 -> SerialComm, has already the correct value: OnConRedirCom2 [13:08:14] R450: Skipped set of attribute BIOS.Setup.1-1 -> SerialComm, has already the correct value: OnConRedir [13:08:54] R450: Skipped set of attribute BIOS.Setup.1-1 -> SerialPortAddress, has already the correct value: Com2 [13:11:09] topranks: great, thanks [13:57:38] volans: https://gitlab.wikimedia.org/slyngshede/apereo_cas_social_pipeline <- Then we can use either the CAS backend I wrote or the default OIDC and just install the module into the virtualenv running Netbox. No need for patching netbox [14:04:17] 10CFSSL-PKI, 10Ganeti, 10Infrastructure-Foundations, 10Patch-For-Review: Migrate Ganeti-rapi to use pki - https://phabricator.wikimedia.org/T350686 (10MoritzMuehlenhoff) [14:14:59] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [14:54:01] (MDRAIDFailedDisk) resolved: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [15:00:22] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [15:00:40] topranks: I'm bck [15:01:51] do we still support the CSV ? I think arzh.el was asking dcops if any script is not used anymore [15:02:56] and yes with the current refactor from arzhel I think we added the filter to the dropdown [15:02:56] volans: csv yes [15:03:18] so tht would mean it's not needed in generl, but might be needed if the same code is called from teh csv code [15:03:30] yep [15:05:50] thanks - I was wondering that - the CSV input is still used then, dc-ops confirmed? [15:08:23] yup [17:49:59] (SystemdUnitFailed) firing: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:19:02] (SystemdUnitFailed) resolved: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:00:38] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [23:00:39] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [23:49:02] (SystemdUnitFailed) firing: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed