arraysalgorithmsortingmedianmedian-of-medians

correctness of fast small order statistic algorithm for odd-length array


Problem 9-3 of the textbook Intro to Algorithms (CLRS) describes a fast O(n) algorithm for finding the k-th order statistic (k-th element in the array when sorted) of a length-n array, for the particular case that k is much smaller than n. I am not certain about the correctness of this algorithm when n is odd, and want to see a way to prove that it is correct.

The basic idea is that we first split the array into two halves, the first with floor(n/2) elements, and the second with ceil(n/2) elements. Then, we "partner" each element in the first half with the corresponding element in the second half. When n is odd this leaves a remaining unpartnered element.

For each pair of partners, we make sure that the left partner is >= the right partner, swapping the two if not. Then, recursively find the k-th order statistic of the second half, mirroring any swaps made in the second half with corresponding swaps in the first half. After this, the k-th order statistic of the entire array must be either in the first k elements in the first half, or the first k elements in the second half.

My confusion comes from the case when the array length n is odd, and there is a lone element in the second half that has no partner. Since the recursion is performed on the second half, consisting of the last ceil(n/2) elements of the array, including the lone partnerless last element, and we are supposed to mirror all swaps made in second half with swaps made within the corresponding partners in the first half, it is unclear what to do when one of the swaps involves the final element, since it has no partner.

The textbook doesn't seem to take particular care on this issue, so I'm assuming that when a swap involves the final element, then just don't make any mirror moves of the partner in the first half at all. As a result, the final element simply "steals" the partner of whoever it got swapped with. However, in this case, is there an easy way to see if the algorithm is still correct? What if when the last element steals someone else's partner, the partner is actually the k-th order statistic, and gets swapped later on to an inaccessible location? The mechanics of the recursion and partitioning involving in order-statistic selection are sufficiently opaque to me such that I cannot confidently rule out that scenario.


Solution

  • I don't think your description of the algorithm is entirely accurate (but then the explanation you linked to is far from clear). As I understand it, the reason why the algorithm is correct for an odd-length array is as follows:

    Let's first look at a few examples of even-length arrays, with n=10 and k=3 (i.e. we're looking for the third-smallest element, which is 2):

    a.  5 2 7 6 1 9 3 8 4 0  
    b.  5 1 7 6 2 9 3 8 4 0  
    c.  5 0 7 6 2 9 3 8 4 1  
    d.  5 0 7 6 2 9 3 8 1 4  
    

    If we split the arrays into two parts, we get:

    a.  5 2 7 6 1    9 3 8 4 0  
    b.  5 1 7 6 2    9 3 8 4 0  
    c.  5 0 7 6 2    9 3 8 4 1  
    d.  5 0 7 6 2    9 3 8 1 4  
    

    and these couples:

    a.  (5,9) (2,3) (7,8) (6,4) (1,0)  <- 0 coupled with 1
    b.  (5,9) (1,3) (7,8) (6,4) (2,0)  <- 0 coupled with 2
    c.  (5,9) (0,3) (7,8) (6,4) (2,1)  <- 1 coupled with 2
    d.  (5,9) (0,3) (7,8) (6,1) (2,4)  <- 0, 1 and 2 not coupled with each other
    

    After comparing and swapping the couples so that their smallest element is in the first group, we get:

    a.  5 2 7 4 0    9 3 8 6 1  
    b.  5 1 7 4 0    9 3 8 6 2  
    c.  5 0 7 4 1    9 3 8 6 2  
    d.  5 0 7 1 2    9 3 8 6 4  
    

    You'll see that the smallest element 0 will always be in the first group. The second-smallest element 1 will be either in the first group, or in the second group if it was coupled with the smallest element 0. The third-smallest element 2 will be either in the first group, or in the second group if it was coupled with either the smallest element 0 or the second-smallest element 1.

    So the smallest element is in the first group, and the second- and third-smallest elements can be in either group. That means that the third-smallest element is either one of the 3 smallest elements in the first group, or one of the 2 (!) smallest elements in the second group.

    a.  5 2 7 4 0    9 3 8 6 1  ->  0 2 4 + 1 3  
    b.  5 1 7 4 0    9 3 8 6 2  ->  0 1 4 + 2 3  
    c.  5 0 7 4 1    9 3 8 6 2  ->  0 1 4 + 2 3  
    d.  5 0 7 1 2    9 3 8 6 4  ->  0 1 2 + 3 4  
    

    So if we say that the k-th smallest element of the whole array is now one of the k-th smallest elements in either of the groups, there is an available spot in the the second group, and that's why, in an odd-length array, we'd add the uncoupled element to the second group. Whether or not the uncoupled element is the element we're looking for, it will certainly be one of the k-th smallest elements in either of the groups.

    It is in fact more correct to say that the k-th smallest element is either one of the k smallest elements in the first group, or one of the k/2+1 smallest elements in the second group. I'm actually not sure that the algorithm is optimal, or even correct. There's a lot of repeated comparing and swapping going on, and the idea of keeping track of the couples and swapping elements in one group when their corresponding elements in the other group are swapped doesn't seem to make sense.