c++floating-pointdoubleuint64

Why uint64_t cannot show pow(2, 64) - 1 properly?


I'm trying to understand why uint64_t type can not show pow(2,64)-1 properly. The cplusplus standard is 199711L.

I checked the pow() function under C++98 standard which is

double pow (double base     , double exponent);
float pow (float base      , float exponent);
long double pow (long double base, long double exponent);
double pow (double base     , int exponent);
long double pow (long double base, int exponent);

So I wrote the following snippet

double max1 = (pow(2, 64) - 1);
cout << max1 << endl;

uint64_t max2 = (pow(2, 64) - 1);
cout << max2 << endl;

uint64_t max3 = -1;
cout << max3 << endl;

The outputs are:

max1: 1.84467e+019
max2: 9223372036854775808
max3: 18446744073709551615

Solution

  • Floating point numbers have finite precision.

    On your system (and typically, assuming binary64 IEEE-754 format) 18446744073709551615 is not a number that has a representation in the double format. The closest number that does have a representation happens to be 18446744073709551616.

    Subtracting (and adding) together two floating point numbers of wildly different magnitudes usually produces an error. This error can be significant in relation to the smaller operand. In the case of 18446744073709551616. - 1. -> 18446744073709551616. the error of the subtraction is 1, which is in fact the same value as the smaller operand.

    When a floating point value is converted to an integer type, and the value cannot fit in the integer type, the behaviour of the program is undefined - even when the integer type is unsigned.