I am installing a 5 node Cassandra installation and will have 10 SSD 1.9 TB drives available per node. I want to use LVM to combine the 10 drives and distribute the disk space needed for each node.
The only documentation that I can find for Cassandra says JBOD or RAID. Can I use LVM or is that going to cause issues within Cassandara?
This is more of an informational question before I get started, I haven't actually tried anything yet.
Yes, you can use LVM, and we have done so with one of our clusters. If using LVM, be sure the devices are striped instead of linear. If you use linear, the first device will get consumed, then the second, then the third, etc. So, many devices could be sitting idle while others very busy. The downside to using LVM in striped mode is that if you have to modify the configuration (either grow or shrink the LVM size), you can't (i.e. you can't expand a striped volume). We have also used JBOD as well. With JBOD, you'll have directories duplicated on every device and sometimes sstables will reside on one v.s. the other - unpredictable and somewhat "messy". As sstables reside on a device, you don't really get "striping" per say, either. sstables are attempted to be distributed evenly across the devices. Also, as the individual devices are smaller, you could run into a space/compaction problem if there is not enough room on, say, one of the devices to compact the sstables that exist. So for me, personally, I would choose LVM as it's much more clean. I believe you might see some slight overhead with using LVM as I believe LVM may batch up some operations before performing them, but it hasn't seemed significant to me. To me, LVM is a bit less "messy".
-Jim