Example: https://www.godbolt.org/z/ahfcaj7W8
From https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Optimize-Options.html
It says
-ftree-loop-vectorize
Perform loop vectorization on trees. This flag is enabled by default at-O2
and by-ftree-vectorize
,-fprofile-use
, and-fauto-profile
."
However it seems I have to pass a flag explicitly to turn on loop unrolling & SIMD. Did I misunderstand something here? It is enabled at -O3
though.
It is enabled at -O2 in GCC12, but only with a much lower cost threshold than at -O3
, e.g. often only vectorizing when the loop trip count is a compile-time constant and known to be a multiple of the vector width (e.g. 8 for 32-bit elements with AVX2 vectors). See https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2b8453c401b699ed93c085d0413ab4b5030bcdb8
https://godbolt.org/z/3xjdrx6as shows some loops at -O2 vs. -O3, with a sum of an array of integers only vectorizing with a constant count, not a runtime variable. Even for (int i=0 ; i < (len&-16) ; i++) sum += arr[i]
to make the length a multiple of 16 doesn't make gcc -O2
auto-vectorize.
Before GCC12, -ftree-vectorize
wasn't enabled at all by -O2
.