formatieee-754subnormal-numbers

IEEE 754: rationale for format: subnormal and normal numbers


Can someone please clarify:

  1. Why exactly the format of subnormal numbers is ±(0.F) × 2^-126 and not ±(1.F) × 2^-127?
  2. Why exactly the format of normal numbers is: ±(1.F) × 2^exp and not, say, ±(11.F) × 2^exp, or, say, ±(10.F) × 2^exp?

Solution

  • I checked the properties of both format using simplified example. For the sake of simplicity I use formats 0.F × 10^-2 and 1.F × 10^-3, where F has 2 decimal digits and there is no ±.

    Min (non-zero) / max values:

    Format          Min value (non-zero)           Max value
    0.F × 10^-2     0.01 × 10^-2 = 0.0001          0.99 × 10^-2 = 0.0099
    1.F × 10^-3     1.00 × 10^-3 = 0.001           9.99 × 10^-3 = 0.00999
    

    Here is the graphical representation:

    enter image description here

    Here we see that starting from value 0.001 format 1.F × 10^-3 does not allow anymore to represent smaller values. However, format 0.F × 10^-2 allows to represent smaller values. Here is the zoomed-in version:

    enter image description here

    Conclusion: from the graphical representation we see that the properties of format 0.F × 10^-2 over format 1.F × 10^-3 are:

    1. gives more dynamic range: log10(max_real / min_real): 1.99 vs 0.99
    2. gives less precision: less values can be represented: 100 vs 900

    It seems that for subnormals IEEE 754 preferred more dynamic range despite of less precision. Hence, that is why the format of subnormal numbers is ±(0.F) × 2^-126 and not ±(1.F) × 2^-127.