I have a C++ source file that is compiled under -mavx/-mavx2
flags using the Clang compiler.
Some functions have AVX2 implementations, but some of them are just pure std calls.
I'm wondering can std::sort
(1), std::memcpy
(2), and std::accumulate
be vectorized because of the -mavx
flag? So removing that flag can affects the perf of those functions. Thanks.
Yes, they all can. But the details differ.
memcpy
, memmove
and memset
functions are implemented as a part of the C standard library, it is a separately pre-compiled a static or shared library. The implementation there should be vectorized already. However the compilers know their semantic and can replace calls to the C standard library with inline implementation, vectorized if vectorization is applicable.
Substituting memcpy
with inline implementation is especially helpful when the size is known in advance. In this case the inline implementation is certain to outperform the library one, even if the library one is vectorized, because knowing the size is a big benefit.
For very small known sizes there's nothing to vectorize, this is just movement with general-purpose registers. For middle scale sizes the algorithm would be vectorized and the loop would be fully unrolled. Finally, for large sizes, the compiler may decide calling the C standard libary implementation, as the inline one wouldn't be very much better.
Algorithms like std::accumulate
, reduce
or inner_product
are templated, so they don't have pre-compiled implementation. Usually with plain data types and no suspected aliasing compilers auto-vectorize them perfectly.
For more complex algorithms, the compiler may not have enough capability to auto-vectorize them. In this case, the STL implementation may still be specialized to vectorize them. As an example, there are recent manual vectorization added to STLs, like for mismatch
in libc++ and MSVC STL. Note that the approaches differ for the purposes of your question: libc++ manual vectorization is inlined, and MSVC manual vectorization is separately compiled.
std::sort
is one of the most complex, but still can be vectorized. Check your STL implementation, or your compiler output.
The flag -mavx
/ -mavx2
is likely to affect the algorithms that are auto-vectorized, but not separately compiled, so may or may not affect your case.
Inline implementation (whether auto or maunal vectorization) usually will follow compilation flags.
Separately compiled implementation may be compiled with certain flags, and have specific vector instruction level, or it may use runtime dispatch for multiple levels. Also hypothetically, different separately compiled implementation may be picked depending on flags, but I'm not aware of such examples.