c++floating-pointprecisionlogarithm

log of unsigned long long int argument


I need to do the following calculation in a c++ code:

(((n*log(n)) / log(4)) + 1)

Where n is of type 'unsigned long long int' (and is a power of 2, so result should be integer).

For very large numbers i get some errors e.g for n = 9007199254740992 result should be 238690780250636289, but when i run the code i get 238690780250636288.

Could this be the result of 'log' function not having an implementation with 'unsigned long long int' argument? If so is there a way to circumvent it this without implementing a new log function?

unsigned long long int upToBit(unsigned long long int n) {
    unsigned long long int check = (((n*log(n)) / log(4)) + 1);
    return check;
}

Solution

  • Could this be the result of 'log' function not having an implementation with 'unsigned long long int' argument?

    Yes and no.

    You use std::log which returns double. double cannot represent 238690780250636289 because of the extended range. If you simply convert that number to long long, you'll get exactly the same error:

    int main()
    {
        volatile double dd = 238690780250636289.0;
        printf("%lld\n", (long long)dd);
    }
    

    Output:

    238690780250636288
    

    To understand why that happens, there is a good paper about floating point numbers. You may have luck with long double version of log if sizeof(long double) is > 8 on your compiler. You may also test "correctness" of your computation:

    bool resultOk = upToBit(9007199254740992) == 238690780250636289.0;
    

    In general, double has 52-bit mantissa and because of extra hidden bit maximum reliable integer that double can represent is 2 power 53 or 9007199254740992. If your resulting double has higher integer value then simple integer math sometimes "stops working":

    #include <stdio.h>
    
    int main()
    {
        long long l = 9007199254740992;
        double d = (double)l;
        d += 1;
        printf("9007199254740992 + 1 = %lld\n", (long long)d);
    }
    

    Output:

    9007199254740992 + 1 = 9007199254740992
    

    To get better precision you can use some of the multiple precision arithmetic libraries that are designed to do that. GCC for example uses GMP / MPFR for its internal calculations.