algorithmrecurrencedivide-and-conquermedian-of-medians

Generalizing the median of medians algorithm


I am asked to find the run time of the general form of the median of medians algorithm for groups of size g. It seems from common examples g=3,5,7 with T(n)=T(n/5)+T(2n/3)+cn, T(n)=T(n/5)+T(7n/10)+cn, and T(n)=T(n/7)+T(5n/7)+cn, respectively, that the general for an odd number would be T(n)=T(n/g)+T(1-(n/(2g)*(g+1)/2))+cn.

However, I am struggling with what to use as median for an even number g where the median will exist between two elements and not actually in the set itself. I am told that I can ignore floors and ceilings. Intuition tells me it should simply be T(n)=T(n/g)+T(1-(n/(2g)*(g)/2))+cn, but I can't help but feel that I'm missing something.

Can anyone give advice on how the algorithm might work with groups of even numbers? I think I should be able to find the run time once I understand the algorithm.


Solution

  • Your analysis is correct; when g is even, you can express the run-time as T(n) = T(n/g) + T(3n/4) + cn, which is your expression simplified; the inductive proof that this is O(n) whenever g > 4 is the same regardless of whether g is even or odd.

    To see why this equation is true, it's easiest to think about how the expression for T(n) with odd g is derived. We can assume that our input list A has no duplicate elements, without loss of generality (by either slightly modifying the algorithm, or by replacing every element A[i] with the tuple (A[i], i) and using lexicographic comparisons). This makes the analysis much easier.

    Now, Median-of-Medians Quickselect's run-time is based on the three things it does:

    1. Call itself recursively on the 'medians' list of size ceil(n/g) to find the median-of-medians M
    2. cn work to group items, partition the list around M, and find the median of each small group-of-g, and
    3. Call itself recursively on either the partition with all elements less than M, the partition with all elements greater than M, or immediately return.

    Ignoring the ceil and an additive O(1) constant, this gives T(n) = T(n/g) + T(New Partition Size) + cn. What's the largest the New Partition Size can be? It's max(# elements less than M, # elements greater than M).

    When g was odd, we had that half of the n/g groups had medians less than M (so (g-1)/2, plus 1 for the median, elements in that group are less than M), and half had medians greater than M (again, giving (g+1)/2 'greater than M' elements for each such group).

    Since you're defining median of an even list as the arithmetic mean of the two middle elements, this gets even simpler: half of the n/g groups have medians less than M, and exactly half the elements of each such group is less than its median and thus less than M; the same logic works for greater than. This means we have eliminated at least (half of n/g times g/2), or n/4 elements, leaving us with 3n/4 as the maximum New Partition Size and T(n) = T(n/g) + T(3n/4) + cn.