I should solve a count-distinct problem in Redis without the use of HyperLogLog (because of the 0.81% of known error).
I got different requests with a list of objects [O1, O2, ... On] for a specific Key A. For each list of objects received, Redis should memorize the Objects not still saved and return the number of new objects saved.
For Example:
I have tried to solve this problem with the Hyper Log Log and it's working perfectly but with a growing dataset of objects, the number of new objects saved is not so accurate. With the sets and the hashmap, the memory is growing too much.
I have read some stuff about Bitmaps but is not too clear. Do you have any reference to projects that are already facing this problem?
Thanks in advance
You might want to consider using a bloom filter. This is available as a module https://redis.com/redis-best-practices/bloom-filter-pattern/.
Bloom filters allow quick tests for membership with 0 false negatives and a very low false positive, provided you know in advance what the maximum number of elements are. You would need to write code of the sort:
result = bf.exists(key, item)
if result == 0:
bf.add(key, item)
bf.inc(key_count)