c++gcccompiler-optimizationsimdauto-vectorization

Is `-ftree-loop-vectorize` not enabled by `-O2` in GCC v12?


Example: https://www.godbolt.org/z/ahfcaj7W8

From https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Optimize-Options.html

It says

-ftree-loop-vectorize
     Perform loop vectorization on trees. This flag is enabled by default at -O2 and by -ftree-vectorize, -fprofile-use, and -fauto-profile."

However it seems I have to pass a flag explicitly to turn on loop unrolling & SIMD. Did I misunderstand something here? It is enabled at -O3 though.


Solution

  • It is enabled at -O2 in GCC12, but only with a much lower cost threshold than at -O3, e.g. often only vectorizing when the loop trip count is a compile-time constant and known to be a multiple of the vector width (e.g. 8 for 32-bit elements with AVX2 vectors). See https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2b8453c401b699ed93c085d0413ab4b5030bcdb8

    https://godbolt.org/z/3xjdrx6as shows some loops at -O2 vs. -O3, with a sum of an array of integers only vectorizing with a constant count, not a runtime variable. Even for (int i=0 ; i < (len&-16) ; i++) sum += arr[i] to make the length a multiple of 16 doesn't make gcc -O2 auto-vectorize.

    Before GCC12, -ftree-vectorize wasn't enabled at all by -O2.