guavabloom-filterfalse-positive

Why actual false positives are much less than desired false positive probability in Guava's BloomFilter?


I use a Bloom Filter with a small desired false positive probability (fpp) and get much less result:

    BloomFilter<Long> bloomFilter = BloomFilter.create(Funnels.longFunnel(), 1_000_000, .001);
    int c = 0;
    for (int i = 0; i < 1_000_000; i ++) {
        // can replace with random.nextLong() because 1M random.nextLong() can hardly make collision
        if (!bloomFilter.put(Long.valueOf(i))) {
            // There is no duplicated elements so put returns false means false-positive
            c ++;
        }
    }
    System.out.println(c);

I expect 1000 (1M * 0.001) false positives but the result is 127 (If I use large random numbers the result will also near 120 but not 1000).

=== UPDATE ===

Here is my test:

desired actual    a/d 
0.3     0.12      40%
0.1     0.03      30%
0.03    0.006     20%    (guava's default fpp)
0.01    0.0017    17%
0.003   0.0004    13%
0.001   0.00012   12%
0.0003  0.00003   10%
0.0001  0.000009   9%
0.00003 0.000002   7%
0.00001 0.0000005  5%

Solution

  • The false positive probability is lower if there are less entries in the filter. In your test, you calculate the probability starting with a set that is empty, and then while adding entries. This is not the right way.

    You need to first add 1 million entries to the Bloom filter, and then calculate the false positive probability, for example by checking if entries are in the set that you didn't add.

    for (int i = 0; i < 1_000_000; i ++) {
        bloomFilter.put(Long.valueOf(i));
    }
    for (int i = 0; i < 1_000_000; i ++) {
        // negative entries are not in the set
        if (!bloomFilter.mightContain(Long.valueOf(-(i + 1)))) {
            c++;
        }
    }