cassandrahard-drivethroughputsolid-state-drive

Cassandra - HDD vs. SSD usage makes no difference in throughput


The Context
I'm currently running tests with Apache Cassandra on a single node cluster. I've ensured the cluster is up and running using nodetool status, I've done a multitude of reads and writes that suggest as such, and I'm confident my cluster is set up properly. I am now attempting to speed up my throughput by mounting a SSD onto the directory where Cassandra writes its data to.

My Solution
The write location of Cassandra data is generally to /var/lib/cassandra/data, however I've since switched mine using cassandra.yaml to write to another location, where I've mounted my SSD. I've ensured that Cassandra is writing to this location by checking the size of the data directory's contents through watch du -h and other methods. The directory I've mounted the SSD on includes table data, commitlog, hints, a nested data directory, and saved_caches.

The Problem
I've been using YCSB benchmarks (see https://github.com/brianfrankcooper/YCSB) to test the average throughput and ops/sec of Cassandra. I've noticed no difference in the average throughput when mounting HDD vs. SSD on the location where Cassandra writes its data to. I've analyzed disk access through dstat -cd --disk-util --disk-tps and found HDD caps out on CPU usage in multiple instances whereas SSD only spikes to around 80% on several occassions.

The Question
How can I speed up the throughput of Cassandra using a SSD over a HDD? I assume this is the correct place to mount my SSD, but does Cassandra not utilize its extra processing power? Any help would be greatly appreciated!


Solution

  • SSD should always win over the HDD in terms of latency, etc. It's just a law of physics. I think that your test simply didn't provide enough load on the system. Another problem could be that you mount only data to SSD, but not the commit logs - on HDDs they should be always put onto a separate disk to avoid clashes with data load. On SSDs they could be put on the same disk as data - please point all directories to SSD to see a difference.

    I recommend to perform a comparison by using following tools: