c++g++compiler-optimizationgcc6

g++: optimization -march=haswell and newer changes numerical result


I have been working on optimizing performance and of course doing regression tests when I noticed that g++ seems to alter results depending on chosen optimization. So far I thought that -O2 -march=[whatever] should yield the exact same results for numerical computations regardless of what architecture is chosen. However this seems not to be the case for g++. While using old architectures up to ivybridge yields the same results as clang does for any architecture, I get different results for gcc for haswell and newer. Is this a bug in gcc or did I misunderstand something about optimizations? I am really startled because clang does not seem to show this behavior.

Note that I am well aware that the differences are within machine precision, but they still disturb my simple regression checks.

Here is some example code:

#include <iostream>
#include <armadillo>

int main(){
    arma::arma_rng::set_seed(3);
    arma::sp_cx_mat A = arma::sprandn<arma::sp_cx_mat>(20,20, 0.1);
    arma::sp_cx_mat B = A + A.t();
    arma::cx_vec eig;
    arma::eigs_gen(eig, B, 1, "lm", 0.001);
    std::cout << "eigenvalue: " << eig << std::endl;
}

Compiled using:

g++ -march=[architecture] -std=c++14 -O2 -o test example.cpp -larmadillo

gcc version: 6.2.1

clang version: 3.8.0

Compiled for 64 bit, executed on an Intel Skylake processor.


Solution

  • It is because GCC uses fused-multiply-add (fma) instruction by default, if it is available. Clang, on the contrary, doesn't use them by default, even if it is available.

    Result from a*b+c can differ whether fma used or not, that's why you get different results, when you use -march=haswell (Haswell is the first Intel CPU which supports fma).

    You can decide whether you want to use this feature with -ffp-contract=XXX.