c++algorithmstlnth-element

nth_element implementations complexities


Does anyone know both the expected running times and worst case running times for different implementations of std::nth_element? I use this algorithm nearly every day.

I'm specifically interested in the STL versions shipping with the recent Microsoft Compilers, but any information on this topic is helpful.

Please note that this is not a duplicate of this question. I understand which algorithms exist, but I'm interested in which implementations use which algorithms.

For background, there are well-known algorithms to do this. One is O(n) average case and O(n log n) worst case, one is O(n) worst case but slow in practice (median of medians). Also note that there is talk of interesting implementation strategies to get worst-case O(n) running times that are fast in practice. The standard says that this must be at worse O(n) average time.


Solution

  • The expected running time is O(N) The worst case running time for most implemententations is O(N * N) because most implementations use QuickSelect and it could be that QuickSelect runs into bad partitions. That is true for Microsoft VS2008, VS2010 & VS2012.

    Now with the new ISO C++ 2011 standard, the complexity for std::sort has been tightened up - it is guaranteed to be O(N * log N) and has no worse case as David Musser's IntroSort is used: - use QuickSort and if parts of the array experience bad partitioning, swap to heapsort.

    Ideally exactly the same should apply std::nth_element but the ISO C++ 2011 standard has not tightened up the complexity requirements. So std::nth_element could be O(N * N) in the worst case. This could be because in David Musser's original paper (see here) he did not mention what algorithm should be swapped to if QuickSelect goes bad.

    In the worst case, the median-of-medians using groups of 5 could be used (I have seen a paper the recommended groups of 7 but cannot find it). So a quality implementation of std::nth_element could use QuickSelect and swap to median-of-medians if partitioning goes bad. This would guarantee O(N) behaviour. QuickSelect can be improved by using sampling making the worst case unlikely but not impossible.