I am planning to implement runtime detection of SIMD extensions. Is it such that if I find out that the processor has AVX2 support, it is also guaranteed to have SSE4.2 and AVX support?
Support for a more-recent Intel SIMD ISA extension implies support for previous SIMD ones.
AVX2 definitely implies AVX1.
I think AVX1 implies all of SSE / SSE2 / SSE3 / SSSE3 / SSE4.1 / SSE4.2 feature bits must also be set in CPUID. If not formally guaranteed, many things make this assumption and a CPU that violated it would probably not be commercially viable for general use.
Note that popcnt
has its own feature bit, so in theory, you could have a CPU with AVX2 and SSE4.2, but not popcnt
, but many things treat SSE4.2 as implying popcnt
. So it's more like you can advertise support for popcnt
without SSE4.2.
In theory, you could make a CPU (or virtual machine) with AVX but which didn't accept the non-VEX legacy-SSE encoding of SSE4.2 instructions like pcmpistri
, but I think you'd be violating Intel's guarantees about what the AVX feature bit implies. Not sure if that's formally written down in a manual, but most software will assume that. (SSE1 and SSE2 are baseline for x86-64, but not for 32-bit mode.)
But AVX1 does imply support for the VEX encoding of all SSE4.2 and earlier SIMD instructions, e.g. vpcmpistri
or vminss
gcc -mavx2
definitely implies AVX1 and previous extensions, (-Q --help=target
to see) but will only emit code that uses the VEX encoding. It will define the __SSE4_2__
macro and so on, though, so GCC does treat AVX2 as implying earlier SSE extensions and popcnt
, but not FMA, AES-NI, or PCLMUL. Those are separate features even for GCC.
(In practice you should use gcc -march=native
or gcc -march=znver1
or whatever to enable all the features your CPU has, and set tuning options for it. Not just -mavx2 -mfma
, that leaves tuning settings at bad defaults like splitting every possibly-unaligned 256-bit load/store into 128-bit halves. Or with recent GCC/clang, -march=x86-64-v3
(https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels) which leaves -mtune=generic
, but these days that's appropriate for AVX2 CPUs: it's been a decade since Haswell.)
(Note that MSVC doesn't have as many SIMD ISA detection macros; it has one for AVX but not for all of the earlier SSE* extensions. MSVC's model is designed around the assumption that programs will do runtime CPU detection instead of being compiled for the local machine. Although MSVC does now have -arch:AVX
, and -arch:AVX2
, and -arch:AVX512
options to use those as baselines.)
Note that AVX512 kind of breaks the traditions. AVX512F implies support for AVX2 and everything before it, but beyond that AVX512DQ doesn't come "before" or "after" AVX512ER, for example. You can (in theory) have either, both, or neither. (In practice, Skylake-X/Cannonlake/etc. has only a bit of overlap with Xeon Phi (Knight's Landing / Knight's Mill), beyond AVX512F. https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512