floating-pointprecisionnumerical-computing

How many digits can I rely on?


Context

I am trying to improve my ability to recognize the number of correct decimal digits after performing short calculations involving floating-point numbers. In particular, I find it challenging to develop a process that, simply by analyzing the calculation, can determine how many digits are reliable.

I am not looking for approaches that involve using multi-precision packages, as seen here. Instead, I am seeking an approach that relies solely on the IEEE standard (single or double precision), the arithmetic operations used, and the output, to infer the number of correct decimal digits.

Solution Attempt

So far, my approach has been asking the following questions:

  1. What is the IEEE being used? If single precision then there are 8 decimal digits available, otherwise we are using the double precision with 16 decimal digits.

  2. Dependent on the IEEE standard used, how does the inputs look like in the computer?

  3. Is there a subtraction involved? (If yes, this is where the error-prone part is located, since the subtraction is not good conditioned).

  4. How many ZEROS (before & after the comma) does the output have?

Finally:

Number of correct decimal digits = IEEE standard <available digits> - Number of ZEROS in Output


Examples

Some examples to illustrate my problem are given below. For all of them I am asking the question: "How many decimal digits are correct in the Output?"

  1. Example
x = 900000.00000
y = 900000.00012
x - y

Output: -0.00011999998241662979 

My solution:

Here we are using the double precision standard (default in Julia), so there are 16 digits available. From the computer perspective, x and y look like approx. (9*10^5). The subtraction of (x - y)from our "human perspective" yields (-000000.00012), so there are 9 ZEROS (before & after the comma) and consequently (16-9 = 7) decimal digits are correct. Having 7 digits correct, means that independent of the computer's solution being more precise, namely (-0.00011999998241662979) - only the first 7 digits are correct, namely: (-0.000119)and the rest is garbage.

Does this make sense? I am unsure how to account for the sign (+ or -) bit - does it make a difference leading to (-0.00011)?

  1. Example
x = 9876.34f0
sqrt(x) - sqrt(x + 1)

Output: -0.005027771f0

My solution:

Here we are using the single precision standard (denoted by "f"), so there are 8 digits available. From the computer perspective, x look like approx. (9.8763410^3). It is not so clear how (a = sqrt(x)) and (b = sqrt(x + 1)) look like in the computer. To estimate them, we observe, first, (a = sqrt(9.8763410^3) = 9.87634^(1/2) * 10^(3/2)), and hence (a \in (10^(1), 10^(2))). Second, (b = sqrt(x + 1) = sqrt(9.8763410^3 + 1.010^0) = sqrt(9.87734*10^3) = 9.87734^(1/2)*10^(3/2)). Both a and b live in the interval between ((3. \text{something} * 10^1, 3. \text{something} * 10^2)). From this point onwards, I find it hard to proceed.... my approach would look to the output (-0.005027771f0), notice that there are 3 ZEROS (before & after the comma) and conclude ( 8 - 3 = 5) decimal digits are correct. Suggesting, consequently, that only the 5 first digits of the computer's solution (-0.0050) are correct and the rest is garbage.

I don't think my solution is correct here because it misses the "human perspective" of the subtraction between ( a - b = (9.87634^(1/2)*10^(3/2)) - (9.87734^(1/2)*10^(3/2))), which in the first example gave me the right number of ZEROS before and after the comma.

Update (Jan. 27 2025)

First of all, thanks for all answers so far. I think, I am moving in the right direction. Based on the comments so far, I came up with this small julia notebook (HMDCIRO.jl) example (with commentary) in this github repository. In the github repo, the pdf file is a better documentation of the work compared to the README.md file.


Solution

  • Your example 1 is basic stuff that isn't at all interesting, but example 2 belongs to a much more interesting class of computational formulae where a bit of clever algebra can eliminate loss of precision almost entirely. These are useful to recognise because it can make a lot of difference to the accuracy of the final result.

    x = 9876.34f0
    sqrt(x) - sqrt(x + 1)
    

    The trick to avoid having a potentially nasty catastrophic cancellation error from subtracting two numbers of a very similar magnitude when x is large is to multiply top and bottom by sqrt(x) + sqrt(x + 1).

    (sqrt(x) - sqrt(x +1))*(sqrt(x) + sqrt(x + 1))/(sqrt(x) + sqrt(x + 1))
    ( x - (x + 1))/(sqrt(x) + sqrt(x + 1))
    -1/(sqrt(x) + sqrt(x+1))     
    

    That simple algebraic transformation results in an answer computed to full numerical precision.

    That modest value of x isn't especially challenging - it needs to be bigger try 10^6, 10^7 and 10^8 instead. The alternative method gets it right in all cases whereas the naive method comes unstuck.