redishyperloglog

Has a single HyperLogLog the same accuracy than merging several ones?


If I create a HyperLogLog per day to count unique visitors, and then the 1st of January I merge the last 365 ones will I get the same value than if I keep a single HyperLogLog for the whole 365 days?

I guess not. But how different would those values be?


Solution

  • Short answer: Yes, both solutions have the same accuracy.

    The basis of the HyperLogLog algorithm is the observation that the cardinality of a multiset of uniformly distributed random numbers can be estimated by calculating the maximum number of leading zeros in the binary representation of each number in the set. If the maximum number of leading zeros observed is n, an estimate for the number of distinct elements in the set is 2**n.

    Referred from wiki

    No matter it's a single hyperloglog or 365 hyperloglogs, the numbers of leading zeros are the same. So they have the same accuracy, and that's why hyperloglog can be merged.