algorithmdata-structuresbig-opriority-queueinsertion-sort

Priority Queue using Insertion Sort - Best case scenario O(n)?


I am studying Priority Queues using Data Structures and Algorithms in C++, 2nd Ed. (Goodrich, Tamassia, Mount) and I'm reading the page on insertion sort used for Priority Queues:

enter image description here

Briefly, it talks about O(š‘›Ā²) in the worst case scenario, which I understand. However the last paragraph has this:

Ā  Ā  Ā  Alternately, we could change our definition of insertion-sort so that we insert elements starting from the end of the priority-queue sequence in the first phase, in which case performing insertion-sort on a list that is already sorted would run in O(š‘›) time. Indeed, the running time of insertion-sort is O(š‘› + š¼) in this case, where š¼ is the number of inversions in the input list, that is, the number of pairs of elements that start out in the input list in the wrong relative order.

I just couldn't understand what the author is trying to say. It seems it has to do with best case scenario which has O(š‘›).

How can I picture what the author is trying to say?


Solution

  • Before getting into the meaning of that paragraph, note that if the original input is already sorted, the described insertion sort algorithm will hit its worst case: the first phase -- which the author already identified as the bottleneck -- will have to traverse the linked list to the end at each insertion. That's the worst case. Let's picture this particular scenario with the same diagram that the author used before:

    List šæ Priority Queue š‘ƒ
    Input (2,3,4,5,7,8,9) ()
    Phase 1 (a) (3,4,5,7,8,9) (2)
    (b) (4,5,7,8,9) (2,3)
    (c) (5,7,8,9) (2,3,4)
    (d) (7,8,9) (2,3,4,5)
    (e) (8,9) (2,3,4,5,7)
    (f) (9) (2,3,4,5,7,8)
    (g) () (2,3,4,5,7,8,9)
    Phase 2 (a) (2) (3,4,5,7,8,9)
    (b) (2,3) (4,5,7,8,9)
    ... ... ...
    (g) (2,3,4,5,7,8,9) ()

    See how in the first phase the value that is taken from list šæ has to be compared with every value in the (growing) priority queue š‘ƒ, only to find that the insertion should happen at the end of the priority queue (linked list): this is a worst case scenario.

    Now it might be undesirable that an input that is already sorted would give the worst running time. That is counter-intuitive. You'd hope or even expect that a sorting algorithm would need less time to output a sorted list when that happend to already be sorted. And that is what the author wants to give a solution for.

    The suggested alternative will in the first phase consume values from the right side of the input list šæ, instead of its left side. The above input scenario will then lead to a best case performance:

    List šæ Priority Queue š‘ƒ
    Input (2,3,4,5,7,8,9) ()
    Phase 1 (a) (2,3,4,5,7,8) (9)
    (b) (2,3,4,5,7) (8,9)
    (c) (2,3,4,5) (7,8,9)
    (d) (2,3,4) (5,7,8,9)
    (e) (2,3) (4,5,7,8,9)
    (f) (2) (3,4,5,7,8,9)
    (g) () (2,3,4,5,7,8,9)
    Phase 2 (a) (2) (3,4,5,7,8,9)
    (b) (2,3) (4,5,7,8,9)
    ... ... ...
    (g) (2,3,4,5,7,8,9) ()

    Because the values are extracted from the end of the (shrinking) list šæ, and we are dealing with a case where list šæ is sorted, these extracted values are never greater than the first value that is already in the priority queue š‘ƒ. So only one comparison is needed to find the position of the value to be inserted: that position is always the first position. That means every step of the first phase takes constant time, and the first phase in total has a time complexity of O(š‘›). This is a best case.

    As a side note: this doesn't make it a better variant of the algorithm. Both variants will have the same average time complexity. Note how the original insertion algorithm will have its best case when the input list šæ is sorted in descending order, while that will represent a worst case for the alternative algorithm.