performancealgorithmselectiontime-complexityspace-complexity# Floyd–Rivest vs. Introselect algorithm performance

Google couldn't help me so here it goes: which of the two selection algorithms, FloydRivest algorithm and Introselect, has better performance.

I'm assuming It's FloydRivest algorithm, but want to be 100% sure.

Also, if there exist even better algorithm for this purpose, I'd be glad to hear about them.

Solution

*TLDR; I think Floyd-Rivest is better*

I recently did some research on selection algorithms for a project I am working on. Here is a basic description of each algorithm:

**Introselect:**Performs a bipartitioning of the data, with a single pivot. Initially, a simple pivot (e.g. middle, median-of-3, etc) is chosen. The simple pivot is usually O(n^2) in the worst case, but O(n) on average. If the recursion level goes above a certain threshold, we fallback to a median-of-medians pivot. This is more costly to compute, but guarantees O(n) worst case.**Floyd-Rivest**: Performs a quintary partition of the data, with two pivots. (Technical note: The partitioning starts out as quintary, and finally resolves to three partitions once the algorithm completes). The two pivots are chosen so that the kth element lies between them, with high probability (this involves randomly sampling the data, and selecting two elements, through recursion, above and below what would be the nth-element). When the size of the partition becomes small enough, we select the pivots using a simpler method (e.g. median-of-3, etc)

As you can see, both are quite similar. Introselect starts with simple pivots, falling back to a complicated one; the Floyd-Rivest algorithm does just the opposite. The main difference is that introselect uses median-of-medians, whereas Floyd-Rivest uses a recursive sampling technique. So, I think a better comparison is median-of-medians vs Floyd-Rivest.

Which is better? From my research, it appears the hidden constants for Floyd-Rivest are smaller than median-of-medians. If I remember correctly, median-of-medians requires something like *5n* comparisons (worst case), whereas Floyd-Rivest only needs *3.5n*. Floyd-Rivest also uses a quintary scheme, which is better when the data can have lots of duplicates. Both introselect and Floyd-Rivest reduce to the same algorithm for small inputs, so you should get similar performance there (so long as you implement them the same). In my tests, Floyd-Rivest was 20%+ faster than all the other selection algorithms I tried. Though, I must admit, I did not test against a proper implementation of introselect that falls back to median-of-medians (I just tested the pseudo-introselect of libstdc++). In the original Floyd-Rivest paper, they themselves (who were co-authors of the median-of-medians approach) said median-of-medians *"is hardly practical"*, and that the Floyd-Rivest algorithm was *"probably the best practical choice"*.

So, it seems to me, Floyd-Rivest's pivoting technique is better than median-of-medians. You could implement introselect using Floyd-Rivest's pivoting, but then you might as well just do a pure Floyd-Rivest algorithm. I would recommend Floyd-Rivest as the best selection method.

**Be warned!** The original Floyd-Rivest paper gives an example implementation of their algorithm (this is the implementation listed on Wikipedia, at the time of writing this). However, this is a simplified version. From my tests, the simplified version is actually pretty slow! If you want a fast implementation, I think you'll need to implement the full algorithm. I recommend reading the paper "On Floyd and Rivest's SELECT algorithm" by Krzysztof C. Kiwiel. He gives a pretty good description of how to implement a fast Floyd-Rivest selection.

- Fastest and lightest way to get the current time in milliseconds with JS Date object
- Is there a faster way to copy bitmap data when creating a new image?
- Is there a faster way to find the first value that is not NA in a large vector using base R?
- VS Code is not responding (Slow Startup and Unresponsive) even with extensions disabled
- do java classes compiled with backward compatability use JavaVM optimalization of newer versions?
- disable internal scikit input validation checks
- How does React's Virtual DOM improve performance compared to direct manipulation of the actual DOM?
- How to show maps first and then download image markers?
- Why is it slower to compare a nullable value type to null on a generic method with no constraints?
- Fast file copy with progress
- C: What is the best and fastest way to concatenate strings
- Is there any side effects for Runtime.getRuntime().gc()
- Fastest way to escape characters in a string in C#
- JMeter: What is the purpose of tearDown Thread Group
- iOS5 & OpenGL ES 2.0 Best Compiler
- Replace outliers from big data
- Parallel bitonic sort implementation speedup
- Efficiently Calculating a Euclidean Distance Matrix Using Numpy
- Fuzzy comparison of strings in lists of huge length (taking into account performance)
- Prisoner's Dilemma Algorithm
- What extra optimisation does g++ do with -Ofast?
- How to efficiently count the number of keys/properties of an object in JavaScript
- How to use Three.js InstancedBufferGeometry & InstancedBufferAttribute?
- Escape HTML to PHP or Use Echo? Which is better?
- PHP - To echo or not to echo?
- Activity performance with conditional visibility gone/visible
- Access efficiency of C++ 2D array
- Why does the Bulma CSS framwork change the z-index on hover?
- Why is Pandas itertuples slower than iterrows on dataframes with many (>100) columns?
- what is the best way to in powershell for finding only folder in term of time performance