[09:49:15] cdanis: not that I can think of right now. Probably indeed an oversight [11:51:25] FIRING: SystemdUnitFailed: update-tails-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:51:25] RESOLVED: SystemdUnitFailed: update-tails-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:02:28] elukey: per Matthew's email on a supermicro ticket for the JBOD issue, shall I craft one, or would you rather do it, given your additional context? [15:04:33] jhathaway: o/ [15:05:17] I added some extra context in https://phabricator.wikimedia.org/T382874#10466701 [15:05:24] I think that we have two issues [15:05:50] thanks and sorry, you did not get the email it was sent to the wrong luca :(, forwarding [15:06:04] lol [15:06:13] ah sorry I thought you meant the meail thread with supermicro [15:06:22] but you are not in Cc [15:06:24] me too [15:06:25] * elukey cries in a corner [15:11:48] jhathaway: okok so the extra context that I highlighted above is still valuable :D [15:11:56] basically we have two issues: [15:12:19] yes thanks, I didn't realize these boxes had drive only accessible by pull them out [15:12:28] me too :( [15:12:36] 1) cabling on the dcops side etc.. [15:13:04] 2) assuming that how swap can be done without a reboot, what happens between the controller and the OS? [15:13:34] from the ms-be1090 use case it seems to me that the new drive was recognized correctly after the shutdown/power-up [15:14:09] but, Papaul and others mentioned that it may happen that the new disk gets into "Foreign" mode (like for RAID) and a tool like megactl is needed to set the proper mode [15:14:45] in this case we'd be in trouble since neither megactl nor storecli seems to support JBOD for the config j controller [15:15:57] who makes the config j controller? [15:17:51] broadcom, 3908 series IIRC, we bought all the new ms-beXXXX with it without extensively reviewing options [15:18:03] so we now have that controller in a lot of ms-be nodes [15:18:32] it can be extended with proper BBU and memory for caching, but the bulk of the functionality stays the same (for JBOD I mean) [15:19:08] Supermicro seems to indicate that the controller is not meant to be used like that, or IIUC it is not its primary goal, so the JBOD support is limited in the firmware [15:20:09] my take is that Data Persistence needs to decide what to do, we cannot take this decision [15:20:47] there may be the possibility of swapping the controllers to get something that properly support JBOD at the OS level [15:20:57] but the high density rack issue would remain [15:22:48] nod, thanks that helps [15:23:22] lemme know your thoughts, this is my understanding, I am braindumping to get opinions [15:23:35] I don't see many other solutions :( [15:29:34] seems like resolving the cabling is the first priority, then running some more jbod swap tests to confirm the behavior [15:29:49] after which we can help data persistence make a decision [15:31:01] would it be valuable to have a supermicro tag in phabricator? what is the policy on tags of that nature in phabricator? [15:34:29] no idea, but probably we could ask to Andre to create one in case [15:34:38] I can open a couple of tasks for cabling and testing [15:34:46] and then we can decide about the tag etc.. [15:34:48] how does it sound? [15:41:57] sounds good [15:59:57] created https://phabricator.wikimedia.org/T383903 that summarizes both, if there is the need to have two tasks I'll file subtasks [16:00:21] great, thanks [17:16:05] 10netops, 06Infrastructure-Foundations, 06SRE: Netbox: execute interface validator in provision script for switch interfaces - https://phabricator.wikimedia.org/T383915 (10cmooney) 03NEW p:05Triage→03Low