pythonfloating-point-precision

Precision of repr(f), str(f), print(f) when f is float


If I run:

>>> import math
>>> print(math.pi)
3.141592653589793

Then pi is printed with 16 digits,

However, according to:

>>> import sys
>>> sys.float_info.dig 
15

My precision is 15 digits.

So, should I rely on the last digit of that value (i.e. that the value of π indeed is 3.141592653589793nnnnnn).


Solution

  • TL;DR

    The last digit of str(float) or repr(float) can be "wrong" in that it seems that the decimal representation is not correctly rounded.

    >>> 0.100000000000000040123456
    0.10000000000000003
    

    But that value is still closer to the original than 0.1000000000000000 (with 1 digit less) is.

    In the case of math.pi, the decimal approximation of pi is 3.141592653589793238463..., in this case the last digit is right.

    The sys.float_info.dig tells how many decimal digits are guaranteed to be always precise.


    The default output for both str(float) and repr(float) in Python 3.1+ (and 2.7 for repr) is the shortest string that when converted to float will return the original value; in case of ambiguity, the last digit is rounded to the closest value. A float provides ~15.9 decimal digits of precision; but actually up to 17 decimal digit precision is required to represent a 53-binary-digit floating point number unambiguously,

    For example 0.10000000000000004 is between 0x1.999999999999dp-4 and 0x1.999999999999cp-4, but the latter is closer; these 2 have the decimal expansions

    0.10000000000000004718447854656915296800434589385986328125
    

    and

    0.100000000000000033306690738754696212708950042724609375
    

    respectively. Clearly the latter is closer, so that binary representation is chosen.

    Now when these are converted back to string with str(), or repr(), the shortest string that yields the exactly same value is chosen; for these 2 values they are 0.10000000000000005 and 0.10000000000000003 respectively


    The precision of a double in IEEE-754 is 53 binary digits; in decimal you can calculate the precision by taking 10-based logarithm of 2^53,

    >>> math.log(2 ** 53, 10)
    15.954589770191001
    

    meaning almost 16 digits of precision. The float_info precision tells how much you can always expect to be presentable, and this number is 15, for there are some numbers with 16 decimal digits that are indistinguishable.


    However this is not the whole story. Internally what happens in Python 3.2+ is that the float.__str__ and float.__repr__ end up calling the same C method float_repr:

    float_repr(PyFloatObject *v)
    {
        PyObject *result;
        char *buf;
    
        buf = PyOS_double_to_string(PyFloat_AS_DOUBLE(v),
                                    'r', 0,
                                    Py_DTSF_ADD_DOT_0,
                                    NULL);
        if (!buf)
            return PyErr_NoMemory();
        result = _PyUnicode_FromASCII(buf, strlen(buf));
        PyMem_Free(buf);
        return result;
    }
    

    The PyOS_double_to_string then, for the 'r' mode (standing for repr), calls either the _Py_dg_dtoa with mode 0, which is an internal routine to convert the double to a string, or snprintf with %17g for those platforms for which the _Py_dg_dtoa wouldn't work.

    The behaviour snprintf is entirely platform dependent, but if _Py_dg_dtoa is used (as far as I understand, it should be used on most machines), it should be predictable.

    The _Py_dg_dtoa mode 0 is specified as follows:

    0 ==> shortest string that yields d when read in and rounded to nearest.

    So, that is what happens - the yielded string must exactly reproduce the double value when read in, and it must be the shortest representation possible, and among multiple decimal representations that would be read in, it would be the one that is closest to the binary value. Now, this might also mean that the last digit of decimal expansion does not match the original value rounded at that length, only that the decimal representation is as close to the original binary representation as possible. Thus YMMV.