cfloating-pointieee-754single-precision

C - adding two single-precision floating point normal numbers, can't get result to infinity


I'm playing around with floating-point arithmetic, and I encountered something which needs explaining.

When setting rounding mode to 'towards zero', aka:

fesetround(FE_TOWARDZERO);

And adding different kind of normal positive numbers, I can never reach Infinity.

However, it is known from the ieee 745 that overflow to infinity can result from adding finite numbers.

For instance:

#include <fenv.h>
#include <stdio.h>

float hex2float (int hex_num) {
  return *(float*)&hex_num;
}

void main() {
  int a_int = 0x7f7fffff; // Maximum finite single precision number, about 3.4E38
  int b_int = 0x7f7fffff;
  float a = hex2float(a_int);
  float b = hex2float(b_int);
  float res_add;

  fesetround(FE_TOWARDZERO);  // need to include fenv.h for that
  printf("Calculating... %+e + %+e\n",a,b);
  res_add = a + b;
  printf("Res = %+e\n",res_add);
}

However, If i change rounding mode to something other, I might get a +INF as the answer.

Can someone explain this?


Solution

  • The explanation for the observed behavior is that it is mandated by the IEEE 754-2008 floating-point standard:

    7.4 Overflow

    The overflow exception shall be signaled if and only if the destination format’s largest finite number is exceeded in magnitude by what would have been the rounded floating-point result (see 4) were the exponent range unbounded. The default result shall be determined by the rounding-direction attribute and the sign of the intermediate result as follows:

    [...]

    b) roundTowardZero carries all overflows to the format’s largest finite number with the sign of the intermediate result.

    So for the rounding mode used here (truncation, or rounding towards zero), the result in case of overflow is the largest finite number, not infinity.