cfloating-pointroundingnumerical-methodsunderflow

How can I detect lost of precision due to rounding in both floating point addition and multiplication?


From Computer Systems: a Programmer's Perspective:

With single-precision floating point

  • the expression (3.14f+1e10f)-1e10f evaluates to 0.0: the value 3.14 is lost due to rounding.

  • the expression (1e20f*1e20f)*1e-20f evaluates to +∞ , while 1e20f*(1e20f*1e-20f) evaluates to 1e20f.


Solution

  • While in mathematics, addition and multiplication of real numbers are associative operations, those operations are not associative when performed on floating point types, like float, due to the limited precision and range extension.

    So the order matters.

    Considering the examples, the number 10000000003.14 can't be exactly represented as a 32-bit float, so the result of (3.14f + 1e10f) would be equal to 1e10f, which is the closest representable number. Of course, 3.14f + (1e10f - 1e10f) would yeld 3.14f instead.

    Note that I used the f postfix, because in C the expression (3.14+1e10)-1e10 involves double literals, so that the result would be indeed 3.14 (or more likely something like 3.14999).

    Something similar happens in the second example, where 1e20f * 1e20f is already beyond the range of float (but not of double) and the succesive multiplication is meaningless, while (1e20f * 1e-20f), which is performed first in the other expression, has a well defined result (1) and the successive multiplication yelds the correct answer.

    In practice, there are some precautions you adopt