floating-pointieee-754numerical-stability

What is the worst case error for (a - b) + b?


When evaluating with IEEE 754 floating point numbers a and b, what is the worst case error in terms of the magnitude of a and b of the sum (a - b) + b? How close to a can I expect that to be?


Solution

  • 100%. b may be so large that a-b produces -b, and then (a-b)+b produces zero.

    For example, with IEEE-754 basic 64-bit binary, (1−254)+254 yields 0, with round-to-nearest-ties-to-even. We can also have 100% in the other direction. If a is 1 and b is 253+2, then (a-b)+b produces 2.

    Also, if b is infinity, (a-b)+b produces a NaN.