We have a requirement where we have to maintain distinct counts every hour of day of month, for various combinations(user meeting a criteria). We are thinking of using HyperLogLog for it, one of other requirements is to provide a counts of the union and intersection for the matching conditions(criteria).
We have to do these operations over a day/week/month. As far as i have read unions are supported via hyperloglog. For intersections more than 2 hyperloglog seems to have high error rates. Is there any other data structure we could for Intersections only meeting the low space requirements with high cardinality or something that supports intersection and union for counting large distinct occurrences ?
Any pointers would be helpful. Thanks!!
Check out augmenting HyperLogLog with MinHash.