rediscountdistinctredis-clusterhyperloglog

Redis - Count distinct problem (without hyper log log)


I should solve a count-distinct problem in Redis without the use of HyperLogLog (because of the 0.81% of known error).

I got different requests with a list of objects [O1, O2, ... On] for a specific Key A. For each list of objects received, Redis should memorize the Objects not still saved and return the number of new objects saved.

For Example:

I have tried to solve this problem with the Hyper Log Log and it's working perfectly but with a growing dataset of objects, the number of new objects saved is not so accurate. With the sets and the hashmap, the memory is growing too much.

I have read some stuff about Bitmaps but is not too clear. Do you have any reference to projects that are already facing this problem?

Thanks in advance


Solution

  • You might want to consider using a bloom filter. This is available as a module https://redis.com/redis-best-practices/bloom-filter-pattern/.

    Bloom filters allow quick tests for membership with 0 false negatives and a very low false positive, provided you know in advance what the maximum number of elements are. You would need to write code of the sort:

    result = bf.exists(key, item)
    if result == 0:
        bf.add(key, item)
        bf.inc(key_count)