cgccx86floating-pointx87

Why does this simple program compiled with gcc,-mfpmath=387, and an optimization level of -O2 or -O3 produce NaN values?


I have a short program that performs a numerical computation, and obtains an incorrect NaN result when some specific conditions hold. I cannot see how this NaN result can arise. Note that I am not using compiler options that allow the reordering of arithmetic operations, such as -ffath-math.

Question: I am looking for an explanation of how the NaN result arises. Mathematically, there is nothing in the computation that leads to division by zero or similar. Am I missing something obvious?

Note that I am not asking how to fix the problem—that is easy. I am simply looking for an understanding of how the NaN appears.

Minimal example

Note that this example is very fragile and even minor modifications, such as adding printf() calls in the loop to observe values, will change the behaviour. This is why I was unable to minimize it further.

// prog.c

#include <stdio.h>
#include <math.h>

typedef long long myint;

void fun(const myint n, double *result) {
    double z = -1.0;
    double phi = 0.0;
    for (myint i = 0; i < n; i++) {
        double r = sqrt(1 - z*z);

        /* avoids division by zero when r == 0 */
        if (i != 0 && i != n-1) {
            phi += 1.0 / r;
        }

        double x = r*cos(phi);
        double y = r*sin(phi);

        result[i + n*0] = x;
        result[i + n*1] = y;
        result[i + n*2] = z;

        z += 2.0 / (n - 1);
    }
}

#define N 11

int main(void) {
    // perform computation
    double res[3*N];
    fun(N, res);

    // output result
    for (int i=0; i < N; i++) {
        printf("%g %g %g\n", res[i+N*0], res[i+N*1], res[i+N*2]);
    }

    return 0;
}

Compile with:

gcc -O3 -mfpmath=387 prog.c -o prog -lm

The last line of the output is:

nan nan 1

Instead of NaN, I expect a number close to zero.

Critical features of the example

The following must all hold for the NaN output to appear:

Without these features, I do get the expected output, i.e. something like 1.77993e-08 -1.12816e-08 1 or 0 0 1 as the last line.

Explanation of the program

Even though it doesn't really matter to the question, I give a short explanation of what the program does, to make it easier to follow. It computes x, y, z three-dimensional coordinates of n points on the surface of a sphere in a specific arrangement. z values go from -1 to 1 in equal increments, however, the last value won't be precisely 1 due to numerical round-off errors. The coordinates are written into an n-by-3 matrix, result, stored in column-major order. r and phi are polar coordinates in the (x, y) plane.

Note that when z is -1 or 1 then r becomes 0. This happens in the first and last iteration steps. This would lead to division by 0 in the 1.0 / r expression. However, 1.0 / r is excluded from the first and last iteration of the loop.


Solution

  • This is caused by interplay of x87 80-bit internal precision, non-conforming behavior of GCC, and optimization decisions differing between compiler versions.

    x87 supports IEEE binary32 and binary64 only as storage formats, converting to/from its 80-bit representation on loads/stores. To make program behavior predictable, the C standard requires that extra precision is dropped on assignments, and allows to check intermediate precision via the FLT_EVAL_METHOD macro. With -mfpmath=387, FLT_EVAL_METHOD is 2, so you know that intermediate precision corresponds to the long double type.

    Unfortunately, GCC does not drop extra precision on assignments, unless you're requesting stricter conformance via -std=cNN (as opposed to -std=gnuNN), or explicitly passing -fexcess-precision=standard.

    In your program, the z += 2.0 / (n - 1); statement should be computed by:

    1. Computing 2.0 / (n - 1) in the intermediate 80-bit precision.
    2. Adding to previous value of z (still in the 80-bit precision).
    3. Rounding to the declared type of z (i.e. to binary64).

    In the version that ends up with NaNs, GCC instead does the following:

    1. Computes 2.0 / (n - 1) just once before the loop.
    2. Rounds this fraction from binary80 to binary64 and stores on stack.
    3. In the loop, it reloads this value from stack and adds to z.

    This is non-conforming, because the 2.0 / (n - 1) undergoes rounding twice (first to binary80, then to binary64).


    The above explains why you saw different results depending on compiler version and optimization level. However, in general you cannot expect your computation to not produce NaNs in the last iteration. When n - 1 is not a power of two, 2.0 / (n - 1) is not representable exactly and may be rounded up. In that case, 'z' may be growing a bit faster than the true sum -1.0 + 2.0 / (n - 1) * i, and may end up above 1.0 for i == n - 1, causing sqrt(1 - z*z) to produce a NaN due to a negative argument.

    In fact, if you change #define N 11 to #define N 12 in your program, you will deterministically get a NaN both with 80-bit and 64-bit intermediate precision.