algorithmsortingfloating-pointbit-manipulationradix-sort

Viable to sort IEEE754 floats by MSB?


I've been doing some low-level bit manipulation, and ended up creating an algorithm that sorts 64-bit floats by octets of descending significance (LE = 7 -> 0; BE = 0 -> 7) as a byproduct. When logging the output, I expected an arbitrary order. However, somewhat to my surprise, they were sorted (with the caveat that negative numbers are placed after positive)! Here's an example of random floats (-50, 50) being implicitly sorted:

[
  4.2217695176666155,
  4.6478025698022165,
  25.500103628472857,
  26.819343028582594,
  43.402122471148246,
  44.150586938484324,
  -12.313937254813531,
  -21.429343402208257,
  -27.2049115791126,
  -38.48913575841195
]

I was wondering to what extent this output is a coincidence or computational truth, because it would be convenient to rely on this data being sorted. I also recognize that this sounds very similar to radix sort, but as mentioned before I understand the mantissa to be arbitrary in relation to numeric order.

Here is the TS class that exhibits this behavior and below is how I am testing it.

const ns = new NumberSet();

for (let i=0; i < 10; i++) {
    ns.add(100 * Math.random() - 50);
}

for (let n of ns) console.log(n);

Solution

  • You can "compare" floats using lexicographic ordering, i.e., treat the stored bits as a string or an unsigned integer, if you do the following:

    1. Check for NaN. If so, the procedure doesn't apply. Abort. (NaN's don't compare to others--including themselves--the comparison result is always false whether it's equality or greater/less than etc.)
    2. If negative-zero, then treat it as positive 0 (i.e., flip the sign bit)
    3. If negative, flip all the bits (including the sign bit)
    4. Otherwise, flip the sign bit only (leave other bits the same)

    The above assumes you don't have any NaN values, hence step-1 above. Provided that is the case, and you apply the above transformation, then you can compare two floats as unsigned integer values (or lexicographically if you like), and the result will be the correct ordering of the original floats. This is true even in the presence of infinities, positive or negative, leading to correct IEEE754 compliant comparison semantics.

    In particular, the reason float exponents are stored biased is precisely because you can do this trick, i.e., comparisons become easier. What you are doing essentially exploits this structure, and thus the output coming out as it does in your case isn't surprising.