There are a lot of claims on StackOverflow and elsewhere that nth_element
is O(n) and that it is typically implemented with Introselect: http://en.cppreference.com/w/cpp/algorithm/nth_element
I want to know how this can be achieved. I looked at Wikipedia's explanation of Introselect and that just left me more confused. How can an algorithm switch between QSort and Median-of-Medians?
I found the Introsort paper here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.5196&rep=rep1&type=pdf But that says:
In this paper, we concentrate on the sorting problem and return to the selection problem only briefly in a later section.
I've tried to read through the STL itself to understand how nth_element
is implemented, but that gets hairy real fast.
Could someone show me pseudo-code for how Introselect is implemented? Or even better, actual C++ code other than the STL of course :)
You asked two questions, the titular one
How is nth_element Implemented?
Which you already answered:
There are a lot of claims on StackOverflow and elsewhere that nth_element is O(n) and that it is typically implemented with Introselect.
Which I also can confirm from looking at my stdlib implementation. (More on this later.)
And the one where you don't understand the answer:
How can an algorithm switch between QSort and Median-of-Medians?
Lets have a look at pseudo code that I extracted from my stdlib:
nth_element(first, nth, last)
{
if (first == last || nth == last)
return;
introselect(first, nth, last, log2(last - first) * 2);
}
introselect(first, nth, last, depth_limit)
{
while (last - first > 3)
{
if (depth_limit == 0)
{
// [NOTE by editor] This should be median-of-medians instead.
// [NOTE by editor] See Azmisov's comment below
heap_select(first, nth + 1, last);
// Place the nth largest element in its final position.
iter_swap(first, nth);
return;
}
--depth_limit;
cut = unguarded_partition_pivot(first, last);
if (cut <= nth)
first = cut;
else
last = cut;
}
insertion_sort(first, last);
}
Without getting into details about the referenced functions heap_select
and unguarded_partition_pivot
we can clearly see, that nth_element
gives introselect 2 * log2(size)
subdivision steps (twice as much as needed by quickselect in the best case) until heap_select
kicks in and solves the problem for good.