If I implement a bloom filter where only one hashing algorithm is used (e.g. murmur), is this still considered a bloom filter?
For example, if a
hashes to 5
, then bit 5 of the filter will be set. If b
hashes to 1
, then bit 1 of the filter will be set, and so on...
For something to be considered a bloom filter, do at least two bits in the filter have to be set? If only one bit is set, is it called something else?
It is still a Bloom filter: one with k=1
. Depending on the bits per element, it is probably not be the most space-saving one. But there are various reasons why one might pick a k
that is not round(bitsPerKey * log(2))
, the main ones are:
k=1
is best. See also the paper "Compressed Bloom Filters" from Michael Mitzenmacher.k
is faster.By the way, you can still pick k
to be the most space-saving one, even if you only use one "application hash function" (like Murmur hash with 64 bits). You just pick the "Bloom hash functions" to be a function of this "application hash function" (64-bit Murmur hash), like so (assuming int
is 32 bit and long
is 64 bit):
long m = murmur(x)
h(x, i) = (int) (m >> 32) + i * (int) m
And that's actually easier and faster than calculating multiple "application hash functions". In a loop, that looks like this:
long m = murmur(x)
int hash = (int) (m >> 32);
int add = (int) m;
for (int i = 0; i < k; i++) {
... test / set the bit depending on "hash" ...
hash += add;
}
Many Bloom filter libraries do it like this, for example the Bloom filter implementation in Guava.