cfloating-pointnanavx2icx

Nan problem with Intel 2022 compiler using AVX2 & /fp:fast


I was playing with a test harness described in another post by @njaffa to try and make a faster more accurate erfc() function when I hit a peculiar bug that I am still unable to understand fully and would like some help with. I think the code is perfectly well behaved whilst pushing at some boundaries.

A purist might object to overflowing the counter argi for loop termination but that is not the issue. When compiled with the usual go faster options and specifically /fp:fast the code generated is wrong. I have added a few diagnostics that don't affect the problem and count the number of nans detected (but give inconsistent results).

NB It only fails on the Intel compiler and it must be set to /fp:fast and release (all other options work correctly as do all the other compilers that I have tried). FP strict and precise both work correctly on Intel.

I have simplified his code as much as possible whilst still retaining the errant behaviour I see here and it is reproduced below in its failing state:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <math.h>

// njuffa's fast erfc(x) and test harness (stripped down as far as I can) see 
// https://stackoverflow.com/questions/77741402/fast-implementation-of-complementary-error-function-with-5-digit-accuracy/

// It only fails on the latest Intel 2022 compiler when /fp:fast and /O2 /Ob2 /Ot

/* Fast computation of the complementary error function. For argument x > 0
erfc(x) = exp(-x*x) * P(x) / Q(x), where P(x) and Q(x) are polynomials. 
If expf() is faithfully rounded, the following error bounds should hold:
Maximum relative error < 1.065e-5, maximum absolute error < 9.50e-6, and 
maximum ulp error < 176.5
*/
float fast_erfcf (float x)
{
  float a, c, e, p, q, r, s;
  a = fabsf (x);
  c = fminf (a, 10.5f);
  s = -c * c;
  e = expf (s);
  q =         3.82346272e-1f; //  0x1.8785c8p-2f
  p =        -4.38243151e-5f; // -0x1.6fa000p-15
  q = q * c + 1.30382371e+0f; //  0x1.4dc764p+0
  p = p * c + 2.16852218e-1f; //  0x1.bc1d04p-3
  q = q * c + 1.85278797e+0f; //  0x1.da5050p+0
  p = p * c + 7.23953605e-1f; //  0x1.72aa0cp-1
  q = q * c + 9.99991596e-1f; //  0x1.fffee6p-1
  p = p * c + 1.00000000e+0f; //  0x1.000000p+0
  r = e / q;
  r = r * p;
  if (x < 0.0f) r = 2.0f - r;
  if (isnan(x)) r = x + x;
  return r;
}

float uint32_as_float (uint32_t a)
{
  float r;
  memcpy (&r, &a, sizeof r);
  return r;
}

uint32_t float_as_uint32 (float a)
{
  uint32_t r;
  memcpy (&r, &a, sizeof r);
  return r;
}

uint64_t double_as_uint64 (double a)
{
  uint64_t r;
  memcpy (&r, &a, sizeof r);
  return r;
}

double floatUlpErr (float res, double ref)
{
  uint64_t refi, i, j, err;
  int expoRef;

  /* ulp error cannot be computed if either operand is NaN, infinity, zero */
  if (isnan (res) || isnan (ref) || isinf(res) || isinf (ref) ||
     (res == 0.0f) || (ref == 0.0)) {
    return 0.0;
  }
  i = ((int64_t)float_as_uint32 (res)) << 32;
  expoRef = (int)(((double_as_uint64 (ref) >> 52) & 0x7ff) - 1023);
  refi = double_as_uint64 (ref);
  if (expoRef >= 129) {
      j = (refi & 0x8000000000000000ULL) | 0x7fffffffffffffffULL;
  } else if (expoRef < -126) {
      j = ((refi << 11) | 0x8000000000000000ULL) >> 8;
      j = j >> (-(expoRef + 126));
      j = j | (refi & 0x8000000000000000ULL);
  } else {
      j = ((refi << 11) & 0x7fffffffffffffffULL) >> 8;
      j = j | ((uint64_t)(expoRef + 127) << 55);
      j = j | (refi & 0x8000000000000000ULL);
  }
  err = (i < j) ? (j - i) : (i - j);
  return err / 4294967296.0;
}

int main (void)
{
  uint32_t argi, last_argi;
  float arg, res, reff, abserrloc = NAN, ulperrloc = NAN;
  double ref, abserr, ulperr;
  double maxabserr = -0x1.2468Ap-127, maxulperr = -0x1.13579p-127; // tagged to spot in MMXreg
  unsigned int nancount = 0, loopcount = 0;

  //    argi = 0; // original code to test entire range - uses XMM8h,XMM10h & fails
  argi = float_as_uint32(-4.0); // this is a convenient hard failing value aka 0xc0800000 - uses XMM10h, XMM13h & fails
  //    argi = 0xff810000-0x007f0000; // this is the closest start value I found that fails with increment 0x007f0000

  do {
     if (argi == 0xff810000)
        printf("This time it will fail\n");
     arg = uint32_as_float (argi);
     ref = erfc ((double)arg);
     res = fast_erfcf (arg);
     ulperr = floatUlpErr (res, ref);
     if (ulperr > maxulperr) {
         ulperrloc = arg;
         maxulperr = ulperr;
     }
     abserr = fabs (res - ref);
     if (isnan(abserr)) {
        nancount++; // testing for nan here the bug persists
        printf("NaN @ %8x  ", argi); // this line will trigger breakpoint here
     }
     if (abserr > maxabserr) {
 //       if (isnan(abserr)) printf("abserr is Nan/n"); // testing for nan here makes results correct
        abserrloc = arg;
        maxabserr = abserr;
 //            printf("abserr %g @ %g ", abserr, arg);  // this line also makes results correct 

     }

     last_argi = argi;
//      argi++; // original code with Nan (0xffffffff)
//      argi += 0x007f0000; // the "nan" reported depends on this increment!
     argi += 0x00400000;

                        // 0x007F0000 is the largest increment that is still a Nan (0x7fc100000)
                        // 0x00400000 is Nan (0xffc00000)
                        // 0x00200000 is Nan (0xffe00000)
                        // 0x00100000 is Nan (0xfff00000) etc. 
     loopcount++;
  } while (argi>last_argi);  (false); // changing this line to prevent looping generates correct code
                          // NB it is radically different to the looping code (that fails).

  printf ("max abs err = %.6e %.6a %x @ % 15.8e %.6a\n"
          "max ulp err = %.6e @ % 15.8e\n",
          maxabserr, maxabserr, float_as_uint32(maxabserr), abserrloc, abserrloc,
          maxulperr, ulperrloc);
  printf("Nan count %i / %i\n", nancount, loopcount);
  return EXIT_SUCCESS;
}

If it worked as expected then the output should be (give or take rounding errors):

NaN @ ffc00000
max abs err = 1.541726e-08 0x1.08ddd1p-26 32846ee9 @ -4.00000000e+00 -0x1.000000p+2
max ulp err = 1.293293e-01 @ -4.00000000e+00

The command line options using MS VC IDE are:

 /GS /W3 /Gy /Zi /O2 /Ob2 /fp:fast /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE"
 /Qipo /Zc:forScope /arch:CORE-AVX2 /Oi /MD /Fa"x64\Release\" /EHsc /nologo 
 /Fo"x64\Release\" //fprofile-instr-use "x64\Release\" /Ot 
 /Fp"x64\Release\erfc_njaffa.pch" 

Since it might be relevant the CPU is i5-12600K AlderLake Family 6 Model 7 Step 2 Ext Model 97 Rev C0 and I'm running under Win 11 Pro.

The symptom is that absmaxerr eventually ends up with a nan in it due to some weird compiler quirk when optimisation and fp:fast are used together. I can follow disassembly code execution to some extent and have tagged the max____ variables with distinctive negative denorms to make them stand out in the XMMn register set. But I haven't been able to figure out quite how it happens. I think it may be because of executing library code with nans (testing all values).

Tiny changes to the code cause radical shifts in code generated (and/or register allocation). In particular the loop branch must be taken at least once for the problem to show. Compiler generates entirely different (and working) code if while(false) is used at the end, likewise if defensive tests or a printf are put in key locations.

There are two tests for abserr is nan - one active at the outer loop level which triggers for 0x7f800000<argi<=0x7fffffff and 0xff800000<argi<=0xffffffff (IOW when argi itself is a nan). The other is commented out to put the code into a failing state and is inside the abserr>maxabserr path. If enabled there it never triggers and the code works correctly whilst the outer test continues to see nan's.

My hunch is that there is some peculiar interaction between the maximum abs value update logic and the storing of maxabserr when argi is a nan but I just can't pin it down. abserrloc doesn't get altered after the first loop and stays at -4.0 but the initially correct maxabserr gets trampled on. Thanks for any enlightenment.


Solution

  • Potential problem: > with NAN

    C does not specify what happens with if (abserr > maxabserr) and a NAN is involved. IEEE 754 does have specified functionality on such compares, yet C is not obliged to follow them.

    Enabling FP performance options can reduce portability.

    Suggested change to only perform relational math when the arguments are not NAN for more portable functionality:

     // Delay subtraction to later
     // abserr = fabs (res - ref);
     // if (isnan(abserr)) {
     if (isnan(res) || isnan(ref)) {
        nancount++;
        printf("NaN @ %8x  ", argi);
     }
    
     // if (abserr > maxabserr) {
     else {
        abserr = fabs (res - ref);
        if (abserr > maxabserr) {
          abserrloc = arg;
          maxabserr = abserr;
        }
     }
    

    Note: some compilers (like GCC and clang) treat isnan(x) as false with -ffast-math because it implies -ffinite-math-only which lets them assume any float is non-NaN non-Inf. ICX and classic ICC don't, though, even with their respective fast-math options enabled (-ffp-model=fast and -fp-model fast=2). Godbolt