I use a Bloom Filter with a small desired false positive probability (fpp) and get much less result:
BloomFilter<Long> bloomFilter = BloomFilter.create(Funnels.longFunnel(), 1_000_000, .001);
int c = 0;
for (int i = 0; i < 1_000_000; i ++) {
// can replace with random.nextLong() because 1M random.nextLong() can hardly make collision
if (!bloomFilter.put(Long.valueOf(i))) {
// There is no duplicated elements so put returns false means false-positive
c ++;
}
}
System.out.println(c);
I expect 1000 (1M * 0.001) false positives but the result is 127 (If I use large random numbers the result will also near 120 but not 1000).
=== UPDATE ===
Here is my test:
desired actual a/d
0.3 0.12 40%
0.1 0.03 30%
0.03 0.006 20% (guava's default fpp)
0.01 0.0017 17%
0.003 0.0004 13%
0.001 0.00012 12%
0.0003 0.00003 10%
0.0001 0.000009 9%
0.00003 0.000002 7%
0.00001 0.0000005 5%
The false positive probability is lower if there are less entries in the filter. In your test, you calculate the probability starting with a set that is empty, and then while adding entries. This is not the right way.
You need to first add 1 million entries to the Bloom filter, and then calculate the false positive probability, for example by checking if entries are in the set that you didn't add.
for (int i = 0; i < 1_000_000; i ++) {
bloomFilter.put(Long.valueOf(i));
}
for (int i = 0; i < 1_000_000; i ++) {
// negative entries are not in the set
if (!bloomFilter.mightContain(Long.valueOf(-(i + 1)))) {
c++;
}
}