[10:29:41] GitLab needs a short maintenance break in a few minute [10:46:42] GitLab maintenance finished [14:12:25] andrewbogott: :hugops: [14:13:35] is partman the tool of choice around the world or does everyone but us use something else? Whenever I search for docs on google I just wind up back on wikitech, so we're definitely the only ones who have ever written docs [14:15:50] I always assumed it's the only tool, I've just used that in the past, but I'd be glad other tools may exist for this [14:16:18] it's the only method supported in d-i, there was some push to replace it with https://nick-black.com/dankwiki/index.php/Growlight, but that didm [14:16:28] materialise so far [14:16:45] andrewbogott: I got to ask, the standard recipes don't work in this case? me and moritzm spent significant time making sure to take partman pain away :| [14:16:50] painman [14:17:17] godog: This is for ceph storage nodes, weird drive setup [14:18:05] btullis reworked things and I was following up on his work. He might have an explanation for why standard recipes didn't work but I think it has to do with the unpredictable drive label thing [14:18:05] andrewbogott: ah I see, yeah now I'm wondering how the other two ceph clusters we have address the same issue [14:18:13] I don't believe anyone voluntarily not uses the standard recipes at this point :-) [14:18:25] godog: in theory they all use the same recipe but the recipe was extremely broken when I tried to use it [14:18:57] sighsob [14:19:03] Anyway it only took a day to get it working, not bad for partman [14:19:23] all things considered indeed [14:19:37] godog: your work is not in vain! I use the standard recipes most of the time [14:19:52] swift and apus-ceph do something odd too [14:20:04] thank you, yeah it was some painful and rewarding work [14:20:14] And it might be that we can standardize ceph + some other uses cases since the pattern is basically "two OS drives and then a bunch of other drives that need to be totally ignored by partman" [14:20:20] The cephosd* nodes have an HBA instead of a RAID controller, so it made the unpredictable drive thing even worse. [14:20:50] btullis: tragically some of our OSDs have raid controllers and some don't. Although in theory if the drives opt out of hw raid the partman behavior is the same for both [14:20:53] andrewbogott: Sorry that I didn't get back to you much over the past day or so. [14:21:10] btullis: no problem, it would've just ruined two people's days instead of one :) [14:21:32] but also is it possible that the recipe as you left it never worked? Because it sure looked like it never could have. [14:23:21] (I ask because if it /did/ work before then I maybe broke it for your use case) [14:23:25] Last known to work on April 30th. https://phabricator.wikimedia.org/T362993#9759769 [14:24:38] ok! so that means the last round of changes weren't tested, that's good, probably means I made things better rather than worse [14:28:08] I had wondered whether you could just set your RAID controllers to HBA mode, or use a RAID0 volume for each device. [14:28:20] But maybe it's too late now, if it's working for you. [14:29:35] yeah, we had some OSDs where every drive was individually raid0 [14:29:45] but I think we don't actually need that... we'll see when we image more hosts [14:35:00] I found individual-raid0 a total pain in the backside and have been moving away from it wherever possible [14:35:03] YMMV :)