algorithmsortinginsertion-sort

When get the maximum number of comparisons, in insertion-sort algorithm?


Examine the pseudo-code from the Wikipedia page on insertion sort:

i ← 1
while i < length(A)
    j ← i
    while j > 0 and A[j-1] > A[j]
        swap A[j] and A[j-1]
        j ← j - 1
    end while
    i ← i + 1
end while

Please tell why it is stated in literature:

The number of exchanges used by insertion sort is equal to the number of 
inversions in the array, and the number of compares is at least equal to 
the number of inversions and at most equal to the number of inversions
plus the array size minus 1.

One adjacent element swap (i.e. of the 'current selected element', with those in the sorted sub-array (i.e., starting from the left end of the original array)) would remove only one inversion.
Though to a given list of element, introducing one swap can add many inversions; say to the list: 123, by swapping the element 3 with the element 1, that introduces 3 inversions.
And the number of comparisons needed are three:

i=1:<2,3>, i=2:<1,3>, <1,2>

Also, the number of inversions are three in the list:321

This number of inversions is also given by nC2, where n is the list size.

Say, for a reverse-sorted list of size 4: 4321, have six inversions: <4,3>,<4,2>,<4,1>,<3,2>,<3,1>,<2,1>, which is also the number of 2-subsets, of an n-set. In both problems, due to no ordering involved, have the number of combinations to be counted.

The steps (comparisons+exchanges) involved in application of insertion-sort, to the reverse-sorted list <4321> are:

1. i ← 1, j ← 1, A[0] > A[1]:    swap A[1] and A[0], j ← 0:     <3421>
2. i ← 2, j ← 2, A[1] > A[2]:    swap A[2] and A[1], j ← 1:     <3241>
          j ← 1, A[0] > A[1]:    swap A[1] and A[0], j ← 0:     <2341>
3. i ← 3, j ← 3, A[2] > A[3]:    swap A[3] and A[2], j ← 2:     <2314>
          j ← 2, A[1] > A[2]:    swap A[2] and A[1], j ← 1:     <2134>
          j ← 1, A[0] > A[1]:    swap A[1] and A[0], j ← 0:     <1234>

But, cannot understand when the maximum number of comparisons are equal to the number of inversions + array size -1; as for the worst-case of a reverse-sorted array, of size n; have the number of inversions equal nC2.

Where would be the rest of comparisons coming from, i.e. 'n-1' comparisons?

The literature referred to above, states the source of these 'n-1' comparisons, to be:

additional compare might happen for the value of `i` from `1` to `n-1` (when `a[i]` does not reach the left end of the array).

There is confusion caused by the part, in the brackets; as what is meant by: (when a[i] does not reach the left end of the array).
This case might mean the case of lists, where the 'first element' (i=0) is in its final position, say: 1423. Or in other words, does it talk about all the possible comparisons, including the one in which there is no inversion.

Say, for 1423, there are two inversions, but need to have extra comparisons of: <1,4>, <1,2>, <1,3>, <2,3>. But, these are then four comparisons, not three (array size : 4 minus 1).

Am also unclear about the wording used in the literature, as no such comparison is shown in the algorithm explicitly.

The same literature lists its C++ code, for Insertion-sort here, which also not explicitly shows any such comparison.


Solution

  • Suppose the array is already sorted. Then there are 0 inversions, but you still have to perform n-1 comparisons.

    The variant of insertion sort you're looking at will perform one comparison for every swap it makes. But for each element, unless that element gets moved all the way over to the left end of the array, there will be one comparison that does not correspond to a swap. That's the comparison that tells the algorithm it's done with that element.