I am asked to find the run time of the general form of the median of medians algorithm for groups of size g. It seems from common examples g=3,5,7
with T(n)=T(n/5)+T(2n/3)+cn
, T(n)=T(n/5)+T(7n/10)+cn
, and T(n)=T(n/7)+T(5n/7)+cn
, respectively, that the general for an odd number would be T(n)=T(n/g)+T(1-(n/(2g)*(g+1)/2))+cn
.
However, I am struggling with what to use as median for an even number g where the median will exist between two elements and not actually in the set itself. I am told that I can ignore floors and ceilings. Intuition tells me it should simply be T(n)=T(n/g)+T(1-(n/(2g)*(g)/2))+cn
, but I can't help but feel that I'm missing something.
Can anyone give advice on how the algorithm might work with groups of even numbers? I think I should be able to find the run time once I understand the algorithm.
Your analysis is correct; when g is even, you can express the run-time as T(n) = T(n/g) + T(3n/4) + cn, which is your expression simplified; the inductive proof that this is O(n) whenever g > 4 is the same regardless of whether g is even or odd.
To see why this equation is true, it's easiest to think about how the expression for T(n) with odd g is derived. We can assume that our input list A has no duplicate elements, without loss of generality (by either slightly modifying the algorithm, or by replacing every element A[i] with the tuple (A[i], i) and using lexicographic comparisons). This makes the analysis much easier.
Now, Median-of-Medians Quickselect's run-time is based on the three things it does:
ceil(n/g)
to find the median-of-medians M
cn
work to group items, partition the list around M
, and find the median of each small group-of-g, andM
, the partition with all elements greater than M
, or immediately return.Ignoring the ceil and an additive O(1)
constant, this gives T(n) = T(n/g) + T(New Partition Size) + cn
. What's the largest the New Partition Size can be? It's max(# elements less than M, # elements greater than M)
.
When g
was odd, we had that half of the n/g
groups had medians less than M
(so (g-1)/2
, plus 1
for the median, elements in that group are less than M
), and half had medians greater than M
(again, giving (g+1)/2
'greater than M
' elements for each such group).
Since you're defining median of an even list as the arithmetic mean of the two middle elements, this gets even simpler: half of the n/g
groups have medians less than M
, and exactly half the elements of each such group is less than its median and thus less than M
; the same logic works for greater than. This means we have eliminated at least (half of n/g times g/2
), or n/4
elements, leaving us with 3n/4
as the maximum New Partition Size and T(n) = T(n/g) + T(3n/4) + cn
.