c++performancesortingcomplexity-theoryintrosort

Introsort (quicksort + heapsort) implementation and complexity


I've read that C++ uses introsort (introspective sort) for its built-in std::sort where it starts off with quicksort and switches to heapsort when you hit the depth limit.

I've also read that the depth limit is supposed to be 2*log(2,N).

Is this value purely experimental? Or is there some mathematical theory behind it?


Solution

  • If you have an interval (range or array), the number of times you'll have to split the interval in half before you end up with an empty (or one element) interval is log(2,N), that's just a mathematical fact, you can work it out easily, if you want. If all goes perfectly well with quicksort, it should recurse log(2,N) times, for the same reason (and at each recursion level, it has to process all values of the interval, which leads to a O(N*log(2,N)) complexity for the overall algorithm). The problem is that quicksort could require many more recursions (if it keeps getting "unlucky" with picking pivot values, which means that it doesn't split the interval in half, but in an imbalanced way instead). At worse, quicksort could end up recursing N times, which is definitely not acceptable for a production-quality implementation.

    Switching to heap-sort at 2*log(2,N) is just a good heuristic in general, to detect a much too deep number of recursions.

    Technically, you could base this on the empirical performance of heap-sort versus quick-sort, to figure out what limit is the best. But such tests are highly dependent on the application (what are you sorting? how are you comparing elements? how cheap are the element swaps? etc..). So, most one-size-fits-all implementation, like std::sort, would just pick a reasonable limit like 2*log(2,N).