[SOLVED] Why do some x64 compilers not inline fmin/fminf?

Why do some x64 compilers not inline fmin/fminf?

I have been benchmarking some fast numerical code on various compilers recently and was struck by a systematic variation in the speed with certain compilers at maximum optimisation -O2 and AVX/AVX2 code generation. I have narrowed some of it down to a curious behaviour than sets the fastest code generators apart from the rest.

Namely that with AVX code generation enabled and -O2 Clang and ICX will inline calls to fminf/fmin as minss whereas the rest of the pack GCC, ICC and MSVC stubbornly continue to call fminf. They all quite happily inline fabsf/fabs OK.

The code for an MRE is below and if I have done it right this is a link to it on Godbolt where you can try it on various compilers. Clang and Intel's ICX seem to inline it going as far back as I could find compilers to test.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

int main()
{
  float y, x = 2.0;
  y = 10*rand()-5;
  y = fabs(y);
  if (x < y) y = x;
  printf("%g", y);
  y = 10 * rand();
  y = fminf(x,y);
  printf("%g", y);
}

A summary table is as follows:

Compiler	inline `fminf`
GCC 13.2	no
ICC latest	no
MSVC 2022 x64	no
CLANG 17.0.1	yes
ICX	yes

fmin and fmax appear in quite a lot of numerical optimisation code so the slow down can be fairly significant. It can be worked around by defining macros FMIN and FMAX once you know. It would be nice if the other compilers inlined it though.

I can't think of any reason why this particular optimisation is missing in some compilers... does anyone have an explanation? fabs is a more complex case and that does inline in all of them.

Solution

The basic issue is that the minss instruction does not do the same thing as the fminf function if the second operand is NaN, so a call to fminf cannot be safely replaced by (just) that instruction.

You can use -ffast-math to enable optimizations that may not be strictly IEEE correct (particularly in the presence of NaNs like this).