I have come across multiple algorithms such as Flajolet-Martin algorithm , HyperLogLog to find out unique elements from a list of elements and suddenly became curious about how Java calculates it? And what is the Time-complexity in each of these cases to store and find unique values?
Flajolet-Martin and HyperLogLog algorithms are about getting an approximate count of the distinct elements (the count-distinct problem) in one pass of a stream of N
elements with O(N)
time and modest (much better than O(N)
) memory usage.
An implementation of the Map
API does not need a solution to the "count-distinct" problem.
(Aside: TreeMap
and HashMap
already keep a precomputed count of the number of entries in the map1; i.e. Map.size()
. Provided that you don't get into thread-safety problems the result is accurate (not approximate). The cost of calling size()
is O(1)
. The cost of maintaining it is O(U)
where U
is the number of map addition and removal operations performed.)
More generally, Flajolet-Martin algorithm or HyperLogLog do not / cannot form the basis for a Map
data structure. They do not address the dictionary problem.
The algorithms used by HashMap
and TreeMap
are (respectively) hash table and binary tree algorithms. There are more details in the respective javadocs, and the full source code (with comments) is readily available for you to look at. (Google for "java.util.HashMap" source
... for example.)
1 - Interestingly, ConcurrentHashMap
doesn't work this way ... because updating the size
field would be a concurrency bottleneck. Instead, the size()
operation is O(N)
.