I have a very basic question concerning optimization done by the compiler (in my case gcc) using -O flag. I would like to focus here only on vectorization of loops. Assume a simple for-loop without the danger of pointer aliasing/ race conditions. Is it possible to rewrite this loop in a way which makes the code generated by the compiler with -O0 flag as fast as it would be if the loop was vectorized by the compiler?
More than likely: nope. Using -O0 gives the compiler carte blanche to make the code as inefficient as it wants to. Ofcourse, it is not some evil beasty that wants to mess with you, but it won't try hard unless you tell it to either.
Some optimizations must generally be done by the compiler. You cannot, in general, get the same performance through micro-optimizations in your source code as you could get using aggressive compiler optimizations.
Regarding your specific example: yes, you can include vector directives in your code to enforce the use of vector instructions. Your code probably won't work on all platforms if you do such things, though, unless you know very well what you're doing and always provide fallbacks.