bloom-filter

Is a bloom filter with only one input hash still a bloom filter?


If I implement a bloom filter where only one hashing algorithm is used (e.g. murmur), is this still considered a bloom filter?

For example, if a hashes to 5, then bit 5 of the filter will be set. If b hashes to 1, then bit 1 of the filter will be set, and so on...

For something to be considered a bloom filter, do at least two bits in the filter have to be set? If only one bit is set, is it called something else?


Solution

  • It is still a Bloom filter: one with k=1. Depending on the bits per element, it is probably not be the most space-saving one. But there are various reasons why one might pick a k that is not round(bitsPerKey * log(2)), the main ones are:

    By the way, you can still pick k to be the most space-saving one, even if you only use one "application hash function" (like Murmur hash with 64 bits). You just pick the "Bloom hash functions" to be a function of this "application hash function" (64-bit Murmur hash), like so (assuming int is 32 bit and long is 64 bit):

    long m = murmur(x)
    h(x, i) = (int) (m >> 32) + i * (int) m 
    

    And that's actually easier and faster than calculating multiple "application hash functions". In a loop, that looks like this:

    long m = murmur(x)
    int hash = (int) (m >> 32);
    int add = (int) m;
    for (int i = 0; i < k; i++) {
        ... test / set the bit depending on "hash" ...
        hash += add;
    }
    

    Many Bloom filter libraries do it like this, for example the Bloom filter implementation in Guava.