shellsort

How to choose the lengths of my sub sequences for a shell sort?


Let's assume we have a sequence a_i of length n and we want to sort it using shell sort. To do so, we would choose sub sequences of out a_i's of length k_i.

I'm now wondering how to choose those k_i's. You usually see that if n=16 we would choose k_1=8, k_2=4, k_3=2, k_4=1. So we would pair-wise compare the number's for each k_i and at the end use insertionSort to finish our sorting.

The idea of first sorting sub sequences of length k_i is to "pre-sort" the sequence for the insertionSort. Right?

Questions:

  1. Now, depending on how we choose our k_i, we get a better performance. Is there a rule I can use here to choose the k_i's?

  2. Could I also choose e.g. n=15, k_1=5, k_2=3, k_3=2?

  3. If we have n=10 and k_1=5, would we now go with {k_2=2, k_3=1} or {k_2=3, k_2=2, k_3=1} or {k_2=3, k_3=1}?


Solution

  • The fascinating thing about shellsort is that for a sequence of n (unique) entries a unique set of gaps will be required to sort it efficiently, essentially f(n) => {gap/gaps}

    For example, to most efficiently - on average - sort a sequence containing

    As you can see, 6-9 require 2 gaps, 10 and 11 three and 12 two. This is typical of shellsort's gaps: from one n to the next (i e n+1) you can be fairly sure that the number and makeup of gaps will differ.

    A nasty side-effect of shellsort is that when using a set of random combinations of n entries (to save processing/evaluation time) to test gaps you may end up with either the best gaps for n entries or the best gaps for your set of combinations - most likely the latter.

    I speculate that it is probably possible to create algorithms where you can plug in an arbitrary n and get the best gap sequence computed for you. Many high-profile computer scientists have explored the relationship between n and gaps without a lot to show for it. In the end they produce gaps (more or less by trial and error) that they claim perform better than those of others who have explored shellsort.

    Concerning your foreword given n=16 ...

    Then, to (try to) answer your questions ...

    Q1: a rule can probably be formulated

    Q2: you could ... but you should test it (for a given n there are n! possible sequences to test)

    Q3: you can compare it with the correct answer (above). Or you can test it against all 10! possible sequences when n=10 (comes out to 3628800 of them) - doable