performancealgorithmselectiontime-complexityspace-complexity

Floyd–Rivest vs. Introselect algorithm performance


Google couldn't help me so here it goes: which of the two selection algorithms, FloydRivest algorithm and Introselect, has better performance.

I'm assuming It's FloydRivest algorithm, but want to be 100% sure.

Also, if there exist even better algorithm for this purpose, I'd be glad to hear about them.


Solution

  • TLDR; I think Floyd-Rivest is better

    I recently did some research on selection algorithms for a project I am working on. Here is a basic description of each algorithm:

    As you can see, both are quite similar. Introselect starts with simple pivots, falling back to a complicated one; the Floyd-Rivest algorithm does just the opposite. The main difference is that introselect uses median-of-medians, whereas Floyd-Rivest uses a recursive sampling technique. So, I think a better comparison is median-of-medians vs Floyd-Rivest.

    Which is better? From my research, it appears the hidden constants for Floyd-Rivest are smaller than median-of-medians. If I remember correctly, median-of-medians requires something like 5n comparisons (worst case), whereas Floyd-Rivest only needs 3.5n. Floyd-Rivest also uses a quintary scheme, which is better when the data can have lots of duplicates. Both introselect and Floyd-Rivest reduce to the same algorithm for small inputs, so you should get similar performance there (so long as you implement them the same). In my tests, Floyd-Rivest was 20%+ faster than all the other selection algorithms I tried. Though, I must admit, I did not test against a proper implementation of introselect that falls back to median-of-medians (I just tested the pseudo-introselect of libstdc++). In the original Floyd-Rivest paper, they themselves (who were co-authors of the median-of-medians approach) said median-of-medians "is hardly practical", and that the Floyd-Rivest algorithm was "probably the best practical choice".

    So, it seems to me, Floyd-Rivest's pivoting technique is better than median-of-medians. You could implement introselect using Floyd-Rivest's pivoting, but then you might as well just do a pure Floyd-Rivest algorithm. I would recommend Floyd-Rivest as the best selection method.

    Be warned! The original Floyd-Rivest paper gives an example implementation of their algorithm (this is the implementation listed on Wikipedia, at the time of writing this). However, this is a simplified version. From my tests, the simplified version is actually pretty slow! If you want a fast implementation, I think you'll need to implement the full algorithm. I recommend reading the paper "On Floyd and Rivest's SELECT algorithm" by Krzysztof C. Kiwiel. He gives a pretty good description of how to implement a fast Floyd-Rivest selection.