In the past I had an interview question that went something like this:
[0, 9, 3, 7, 15] -- sort with the fewest swaps.
I am realizing both selection sort AND insertion sort only require 2 swaps. I'm thinking for future questioning that insertion is still preferred because there's less comparisons required & it's almost sorted. BUT the typical insertion sort implementation looks like this:
const sortArr = (arr) => {
for (let i = 1; i < arr.length; i++) {
let curr = arr[i]
let j
for (j = i - 1; j > -1 && arr[j] > arr[i]; j--) {
arr[j+1] = arr[j]
}
arr[j+1] = curr
}
return arr
}
I have a few reservations about insertion sort. First, why doesn't everyone wrap the arr[j+1] = curr portion in something like this so it doesn't trigger when the location is already correct?
if (j !== i - 1) {
arr[j+1] = curr
}
Or, does arr[j+1] = curr line do nothing if the indexes are the same? ie arr[0] = arr[0]
Also, why would people say this takes 2 swaps when it's really more like 4 swaps (in this case), due to the need to implement arr[j+1] = curr
each time the loop is true at least once and finishes.
Aren't both arr[j+1] = arr[j]
and arr[j+1] = curr
considered a "swap"? Or do we just consider them two sides to the same coin or something?
First, you're challenging opinions that "everybody has". But in fact not everybody has those opinions.
Second, both insertion sort and selection sort are quadratic. There is no need to worry about the efficiency of the algorithms for large lists when we have far better algorithms.
BUT in fact insertion sort is heavily used for a single reason. That reason is that the pattern of accesses it makes, and the predictability of its conditions, is really good for micro-optimization within CPUs. This makes it the fastest way to sort a small list. Particularly a small list of easy to compare things, like integers.
The two CPU characteristics that it uses are as follows:
CPUs need to know when to prefetch data into caches. The more predictable your pattern of accesses, the easier it is for the CPU to guess that right. Insertion sort is very predictable.
CPUs engage in speculative execution. CPUs arrange upcoming instructions in a pipeline. If one of the instructions is an if
condition, then the CPU will load the if
, guess at what branch will be taken, and then load those instructions after. If it guesses wrong, it has to throw away the partially completed work, back up, load the RIGHT instructions, and continue. This is called a pipeline stall. Insertion sort has a lot of comparisons, but they are all really predictable. And therefore it has very few pipeline stalls. (Compare to quicksort where the CPU guesses wrong on half the comparisons.)
Therefore IF the way you've described the code being written is really how it is written, THEN the most likely reasoning is that an unnecessary swap is faster than a pipeline stall for taking the wrong branch of an if
condition. And therefore it is faster to not do the if
.