I am trying to understand this sketch but am not able to understand. Correct me if I am wrong but basically, lets say I have a text data.. words.. I have a hash function.. which takes a word and create an integer hash and then I convert that hash to binary bit vector?? Right.. Then I keep a track of the first 1 I see from left.. And the position where that 1 is (say , k)... the cardinality of this set is 2^k?
http://ravi-bhide.blogspot.com/2011/04/flajolet-martin-algorithm.html
But ... say I have just one word. and the hash function of it is such that hash it generates is 2^5, then I am guessing there are 5 (??) trailing 0's?? so it will predict 2^5 (??) cardinality? That doesnt sounds right? What am I missing
For a single word the distribution of R is a geometric distribution with p = 1/2, and its standard deviation is sqrt(2) ≈ 1.41.
So for a word with hash ending in 100000b the algorithm will, indeed, yield 25/0.77351 = 41.37. But the probability of that is only 1/64, which is consistent with the statement that the standard deviation of R is close to 1.