In an algorithm I have to calculate the 75th percentile of a data set whenever I add a value. Right now I am doing this:
x
x
in an already sorted array at the backx
down until the array is sortedarray[array.size * 3/4]
Point 3 is O(n), and the rest is O(1), but this is still quite slow, especially if the array gets larger. Is there any way to optimize this?
You can do it with two heaps. Not sure if there's a less 'contrived' solution, but this one provides O(logn)
time complexity and heaps are also included in standard libraries of most programming languages.
First heap (heap A) contains smallest 75% elements, another heap (heap B) - the rest (biggest 25%). First one has biggest element on the top, second one - smallest.
See if new element x
is <= max(A)
. If it is, add it to heap A
, otherwise - to heap B
.
Now, if we added x
to heap A and it became too big (holds more than 75% of elements), we need to remove biggest element from A
(O(logn)) and add it to heap B (also O(logn)).
Similar if heap B became too big.
Just take the largest element from A (or smallest from B). Requires O(logn) or O(1) time, depending on heap implementation.
edit
As Dolphin noted, we need to specify precisely how big each heap should be for every n (if we want precise answer). For example, if size(A) = floor(n * 0.75)
and size(B)
is the rest, then, for every n > 0
, array[array.size * 3/4] = min(B)
.