c++floating-pointmaxdecimalnumeric-limits

What Are the Maximum Number of Base-10 Digits in the Fractional Part of a Floating Point Number


If the a floating point number could be outputted so that there was no truncation of value (say with setpercision) and the number was outputted in fixed notation (say with fixed) what is the buffer size that would be required to guarantee the entire fractional part of the floating point number could be stored in the buffer?

I'm hoping there is something in the standard, like a #define or something in numeric_limits which would tell me the maximum base-10 value place of the fractional part of a floating point type.

I asked about the maximum number of base-10 digits in the fractional part of a floating point type here: What Are the Maximum Number of Base-10 Digits in the Integral Part of a Floating Point Number

But I realize this may be more complex. For example, 1.0 / 3.0 is an infinitely repeating series of numbers. When I output that using fixed formatting I get this many places before repeating 0s:

0.333333333333333314829616256247390992939472198486328125

But I can't necessarily say that's the maximum precision, cause I don't know how many of those trailing 0s were actually represented in the floating point's fraction, and it hasn't been shifted down by a negative exponent.

I know we have min_exponent10 is that what I should be looking to for this?


Solution

  • If you consider the 32 and 64 bit IEEE 754 numbers, it can be calculated as described below.

    It is all about negative powers of 2. So lets see how each exponent contribute:

    2^-1 = 0.5         i.e. 1 digit
    2^-2 = 0.25        i.e. 2 digits
    2^-3 = 0.125       i.e. 3 digits
    2^-4 = 0.0625      i.e. 4 digits
    ....
    2^-N = 0.0000..    i.e. N digits
    

    as the base-10 numbers always end with 5, you can see that the number of base-10 digits increase by 1 when the exponent descrease by 1. So 2^(-N) will require N digits

    Also notice that when adding those contributions, the number of resulting digits is determined by the smallest number. So what you need to find out is the smallest exponent that can contribute.

    For 32 bit IEEE 754 you have:

    Smallest exponent -126

    Fraction bits 23

    So the smallest exponent is -126 + -23 = -149, so the smallest contribution will come from 2^-149, i.e.

    For 32 bit IEEE 754 printed in base-10 there can be 149 fractional digits

    For 64 bit IEEE 754 you have:

    Smallest exponent -1022

    Fraction bits 52

    So the smallest exponent is -1022 + -52 = -1074, so the smallest contribution will come from 2^-1074, i.e.

    For 64 bit IEEE 754 printed in base-10 there can be 1074 fractional digits