c++algorithmstlselectionnth-element

How is nth_element Implemented?


There are a lot of claims on StackOverflow and elsewhere that nth_element is O(n) and that it is typically implemented with Introselect: http://en.cppreference.com/w/cpp/algorithm/nth_element

I want to know how this can be achieved. I looked at Wikipedia's explanation of Introselect and that just left me more confused. How can an algorithm switch between QSort and Median-of-Medians?

I found the Introsort paper here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.5196&rep=rep1&type=pdf But that says:

In this paper, we concentrate on the sorting problem and return to the selection problem only briefly in a later section.

I've tried to read through the STL itself to understand how nth_element is implemented, but that gets hairy real fast.

Could someone show me pseudo-code for how Introselect is implemented? Or even better, actual C++ code other than the STL of course :)


Solution

  • You asked two questions, the titular one

    How is nth_element Implemented?

    Which you already answered:

    There are a lot of claims on StackOverflow and elsewhere that nth_element is O(n) and that it is typically implemented with Introselect.

    Which I also can confirm from looking at my stdlib implementation. (More on this later.)

    And the one where you don't understand the answer:

    How can an algorithm switch between QSort and Median-of-Medians?

    Lets have a look at pseudo code that I extracted from my stdlib:

    nth_element(first, nth, last)
    { 
      if (first == last || nth == last)
        return;
    
      introselect(first, nth, last, log2(last - first) * 2);
    }
    
    introselect(first, nth, last, depth_limit)
    {
      while (last - first > 3)
      {
          if (depth_limit == 0)
          {
              // [NOTE by editor] This should be median-of-medians instead.
              // [NOTE by editor] See Azmisov's comment below
              heap_select(first, nth + 1, last);
              // Place the nth largest element in its final position.
              iter_swap(first, nth);
              return;
          }
          --depth_limit;
          cut = unguarded_partition_pivot(first, last);
          if (cut <= nth)
            first = cut;
          else
            last = cut;
      }
      insertion_sort(first, last);
    }
    

    Without getting into details about the referenced functions heap_select and unguarded_partition_pivot we can clearly see, that nth_element gives introselect 2 * log2(size) subdivision steps (twice as much as needed by quickselect in the best case) until heap_select kicks in and solves the problem for good.