All of the keys in rocksDB's level 1 are already sorted. Therefore we can get key quickly in this level. Why does rocksDB still need to compact the files in level 1 to level 2?
I find an explanation on LevelDB's doc: Open file in one directory is slow if too many files are in the same directory. However, as mentioned in the doc, we can use sharding to solve this problem. I think sharding is much eaiser than compaction. Am I right?
Thanks in advance!
Because the data on level 0
are not sorted, so it's likely overlap with majority of data on level 1
. Which means compaction from level0 -> level1
needs to compact majority of the data, which happens pretty often.
If you have 1 more level: level2
, then >90% data is on level 2, so level0->level1
compaction is going to much smaller, which reduces write amplification from compaction. The more level you have, in generally, the smaller the write amplification is, but higher the read amplification is, as it's need to read more levels.
It's not just RocksDB, but for all LSM-tree. And in RocksDB you can actually configure the number of level to be 2, by setting num_levels
, which we don't recommend for normal usecase: https://github.com/facebook/rocksdb/blob/0ed8cb666de61d2e18bbb1fc90e05b734cee02de/include/rocksdb/advanced_options.h#L433-L434