javascriptfloating-point

How MAX_SAFE_INTEGER 2 ** 53 - 1


We Know in Javascript it uses the STD IEEE 754 Double precision and we know that the Mantissa Section is 52 bit and there is implicit 1 bit but is not stored

But if it isn't stored how it's actually we represent it because we can store 52 bit that 1 bit what we are using for ?


Solution

  • It might help to work through an example showing every step.

    Let us construct an IEEE-754 double-precision representation of the number 21.3125.

    First, convert it to an ordinary binary fraction:

    10101.01010
    

    Next convert it to binary exponential notation:

    10101.01010 × 2⁰
    

    Next adjust the significand and exponent so that the significand is normalized:

    1.010101010 × 2⁴
    

    (What does "normalized" mean? Simply that the part to the left of the decimal point is exactly 1, or stated another way, that the significand is in the range [1, 2).)

    Next, to make things clear later on, pad the significand out to 53 full bits of precision:

    1.0101010100000000000000000000000000000000000000000000 × 2⁴
    

    (This obviously doesn't change the value at all.)

    Before proceeding, there are two things we have to check: (1) Is the significand equal to 0 (this would only happen if our original number was 0.0), and (2) is the exponent outside of the range [-1022, 1023]? In our case neither condition holds true, so we may proceed.

    A floating-point number consists of three distinct parts: the sign, the significand, and the exponent. In our case the sign is 'positive', the significand is 1.0101010100000000000000000000000000000000000000000000, and the exponent is 4. And each of these parts needs to be encoded a little bit further before we are done.

    Now assemble these three parts together, sign first, fraction bits last:

      0 10000000011 0101010100000000000000000000000000000000000000000000
    

    Finally jam all the bits together, and then, if you like, convert to hexadecimal to make it more compact:

      0 10000000011 0101010100000000000000000000000000000000000000000000
    = 0100000000110101010100000000000000000000000000000000000000000000
    = 0100 0000 0011 0101 0101 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    =  4    0    3    5    5    0    0    0    0    0    0    0    0    0    0    0 
    = 0x4035500000000000
    

    To confirm that this has all worked, we can reverse the procedure:

    1. Break the 64-bit value back up into the three parts 0, 10000000011, and 0101010100000000000000000000000000000000000000000000.
    2. The sign bit is 0, so our eventual number will be positive, or +1.
    3. The exponent field 10000000011 is 0x403, or 1027 in decimal. Make sure it is not at its minimum possible value (0), or its maximum possible value (2047 or 0x7ff). It is not, so proceed with steps 4–6.
    4. Subtract the bias 1023 from 1027, yielding a true exponent of 4.
    5. Prepend "1." to the fraction bits, giving 1.0101010100000000000000000000000000000000000000000000.
    6. Finally, put it all together:
        +1 × 0b1.0101010100000000000000000000000000000000000000000000 × 2⁴
    

    Now, 0b1.0101010100000000000000000000000000000000000000000000 is 1.33203125, and 1.33203125 × 2⁴ is 1.33203125 × 16 which is 21.3125 which is the number we started with. (Hooray!)

    And the bottom line is that although the double precision format we've been working with does encode 53 significant bits of precision, we only "explicitly" stored the fraction bits 0101010100000000000000000000000000000000000000000000, not the leading 1. Although, as Eric has pointed out, the fact that the leading bit was 1 is encoded in a different way, by the fact that the exponent was in its normal range. If the exponent had been at its minimum value, the leading bit would have been 0, not 1, and this is how it's possible to encode a value of 0.0. (It also lets you handle subnormal numbers, but that's a topic for another day.)

    Even when you work through it like this, I know, it still seems somehow magic that you can throw away the leading 1. and not store it. How can that work? The answer is simply that for every single value in the normal range (there are precisely 9,214,364,837,600,034,816 of them), they all start with "1.", so that leading 1 bit adds no information, it's redundant, you don't need to store it. You can remember it was there, and "add it back in" like we did in step 5 of my reconstruction, and you'll always get your original value back out.

    There's some more discussion of this point at a former question Is it 52 or 53 bits of floating point precision?.