Question : How does timsort identify during chunking or generating runs of block size [32-64 items] that the most optimal approach is to search for ascending or descending run ?
Known Fact : A run in timsort is a natural occurring sorted order of data items [ either ascending or descending ].
Example : Assume an array which has block size of 6 for sorting. A block of 6 encountered in the array of data provided is as follows. Further assume that the array is being sorted in ascending order.
arr = {..............7, 11, 9, 6, 5, 2, ..................}.
If we see the hint provided by first two items 7, 11 this is increasing subportion of the run. At this point considering next 4 items to ensure that we maintain or look for a increasing run would be inefficient since its actually a decreasing run [ 9, 6, 5, 2 ]. A binaryInsertionSort algorithm would sort this in increasing order like below.
7
7, 11
7, 9, 11
6, 7, 9, 11
5, 6, 7, 9, 11
Result=2, 5, 6, 7, 9, 11---------(A)
In fact the most efficient method would have been.
7, 11 -> A complete ascending run.
9, 6, 5, 2 -> A complete descending run.
merge ([ 7, 11 ], [9, 6, 5, 2])
merge([7, 11], [2, 5, 6, 9]) //in place reversing of second array.
Result=2, 5, 6, 7, 9, 11 -------(B)
Fact: (B) is more efficient then (A) from computing perspective.
Question:
Does Tim sort actually do (A) or (B) for a given block of size [ Ex:- 32 to 64 items ]. i.e until the block size is met does it identify progressively looking at each item and performing binaryInsertionSort of each item to build the sorted Run ?
or
does it actually scan the whole block first to identify already occurring natural sorted sub sequences to merge them efficiently ?
It does (A). Identify the next run and then if short, extend it with binary insertion sort