[SOLVED] Why Bloom filters cannot handle range queries?

Why Bloom filters cannot handle range queries?

Context: I'm reading about RocksDB and LSM trees, from my understanding Bloom filter is used to avoid multiple I/Os for item retrieval in all the storage levels. And I'm ok with that.

Apparently, one of the challenges is that Bloom filter cannot be used in range queries. What is the reason? If I want to check if there is a key between 32 and 200, I can do a single-key lookup for each value in between (or stop at the first "true" response). Is it really inefficient?

Solution

You can do that but it is inefficient because single point lookup are slow (even with bloom filters) comparing to seeking the first value (32) and iterating towards 200. Leveldb/rocksdb are optimized for such iterations.

Furthermore in your case you just want any first key between 32 and 200 - you just do one seek and that's it while otherwise you'd have to do in worst case 200-32 = 168 lookups. Bloom filter can quickly answer whether the key is not present if there are no collisions, but it still requires a disk lookup if there is.