indexingkey-value-storerange-querylsm-tree

how do range queries work on a LSM (log structure merge tree)?


Recently I've been studying common indexing structures in databases, such as B+-trees and LSM. I have a solid handle on how point reads/writes/deletes/compaction would work in an LSM.

For example (in RocksDB/levelDB), on a point query read we would first check an in-memory index (memtable), followed by some amount of SST files starting from most to least recent. On each level in the LSM we would use binary search to help speed up finding each SST file for the given key. For a given SST file, we can use bloom filters to quickly check if the key exists, saving us further time.

What I don't see is how a range read specifically works. Does the LSM have to open an iterator on every SST level (including the memtable), and iterate in lockstep across all levels, to return a final sorted result? Is it implemented as just a series of point queries (almost definitely not). Are all potential keys pulled first and then sorted afterwards? Would appreciate any insight someone has here.

I haven't been able to find much documentation on the subject, any insight would be helpful here.


Solution

  • RocksDB has a variety of iterator implementations like Memtable Iterator, File Iterator, Merging Iterator, etc.

    During range reads, the iterator will seek to the start range similar to point lookup (using Binary search with in SSTs) using SeekTo() call. After seeking to start range, there will be series of iterators created one for each memtable, one for each Level-0 files (because of overlapping nature of SSTs in L0) and one for each level later on. A merging iterator will collect keys from each of these iterators and gives the data in sorted order till the End range is reached.

    Refer to this documentation on iterator implementation.