algorithmsortingtimsort

TimSort : Optimal Algorithm for identifying natural runs within a given single block


Question : How does timsort identify during chunking or generating runs of block size [32-64 items] that the most optimal approach is to search for ascending or descending run ?

Known Fact : A run in timsort is a natural occurring sorted order of data items [ either ascending or descending ].

Example : Assume an array which has block size of 6 for sorting. A block of 6 encountered in the array of data provided is as follows. Further assume that the array is being sorted in ascending order.

arr = {..............7, 11, 9, 6, 5, 2, ..................}. 

If we see the hint provided by first two items 7, 11 this is increasing subportion of the run. At this point considering next 4 items to ensure that we maintain or look for a increasing run would be inefficient since its actually a decreasing run [ 9, 6, 5, 2 ]. A binaryInsertionSort algorithm would sort this in increasing order like below.

7

7, 11

7, 9, 11

6, 7, 9, 11

5, 6, 7, 9, 11

Result=2, 5, 6, 7, 9, 11---------(A)

In fact the most efficient method would have been.

7, 11 -> A complete ascending run.

9, 6, 5, 2 -> A complete descending run.

merge ([ 7, 11 ], [9, 6, 5, 2])

merge([7, 11], [2, 5, 6, 9]) //in place reversing of second array.

Result=2, 5, 6, 7, 9, 11 -------(B)

Fact: (B) is more efficient then (A) from computing perspective.

Question:

Does Tim sort actually do (A) or (B) for a given block of size [ Ex:- 32 to 64 items ]. i.e until the block size is met does it identify progressively looking at each item and performing binaryInsertionSort of each item to build the sorted Run ?

or

does it actually scan the whole block first to identify already occurring natural sorted sub sequences to merge them efficiently ?


Solution

  • It does (A). Identify the next run and then if short, extend it with binary insertion sort