[SOLVED] ARM NEON vectorization failure

ARM NEON vectorization failure

I would like to enable NEON vectorization on my ARM cortex-a9, but I get this output at compile:

"not vectorized: relevant stmt not supported: D.14140_82 = D.14143_77 * D.14141_81"

Here is my loop:

void my_mul(float32_t * __restrict data1, float32_t * __restrict data2, float32_t * __restrict out){    
    for(int i=0; i<SIZE*4; i+=1){
        out[i] = data1[i]*data2[i];
    }
}

And the options used at compile:

-march=armv7-a -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -mvectorize-with-neon-quad -ftree-vectorizer-verbose=2

I am using arm-linux-gnueabi (v4.6 ) compiler.

It is important to note that the problem only appears with float32 vectors. If I switch in int32, then the vectorization is done. Maybe the vectorization for float32 is not yet available…

Does anyone has an idea ? Do I forget something in the cmd line or in my implementation ?

Thanks in advance for your help.

Guix

Solution

From GCC's ARM options page

-mfpu=name

...

If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=`neon'), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.

If you specify -funsafe-math-optimizations it should work, but reread the note above if you are going to use this with high precision.