floating-pointbinarynumbersdouble

Double precision floating point. Why is the maximum exponent 1023


When we have a binary Intel processor and a double precision float represented by 64 bits of which 1 bit for the sign, 52 bits for mantissa and 11 bits for the exponent.

I do not understand why the e_max = 2^10 - 1 = 1023 shouldn't it be 2^11 since 11 bits are dedicated to it.

How does it follow from this that the smallest represented float is in the order of 10^(-308) and the largest 10^(308)?

Thanks for any clarification or explanation!


Solution

  • ... why the e_max = 2^10 - 1 = 1023 shouldn't it be 2^11 since 11 bits are dedicated to it.

    No. Consider negative exponents.

    With binary64 encoding, the encoded number has a biased exponent, an 11-bit unsigned integer from 0 to 2047. (Biased exponents of 0 and 2047 have special meaning.)

    After applying an offset of -1023, the encoding has an (unbiased) exponent in the range [-1023 ... 1024]. (Again, the end-points have additional special meaning.) This encodes very large values like 10308 and tiny ones like 10-308.


    The original design of the encoding of binary64 could have used an 11-bit signed integer for the exponent, in some signed integer encoding, yet that had disadvantages. With a biased exponent we have advantages: