When evaluating with IEEE 754 floating point numbers a and b, what is the worst case error in terms of the magnitude of a and b of the sum (a - b) + b? How close to a can I expect that to be?
100%. b
may be so large that a-b
produces -b
, and then (a-b)+b
produces zero.
For example, with IEEE-754 basic 64-bit binary, (1−254)+254 yields 0, with round-to-nearest-ties-to-even. We can also have 100% in the other direction. If a
is 1 and b
is 253+2, then (a-b)+b
produces 2.
Also, if b
is infinity, (a-b)+b
produces a NaN.