I have written a library, where I use CMake for verifying the presence of headers for MMX, SSE, SSE2, SSE4, AVX, AVX2, and AVX-512. In addition to this, I check for the presence of the instructions and if present, I add the necessary compiler flags, -msse2 -mavx -mfma etc.
This is all very good, but I would like to deploy a single binary, which works across a range of generations of processors.
Question: Is it possible to tell the compiler (GCC) that whenever it optimizes a function using SIMD, it must generate code for a list of architectures? And of of course introduce high-level branches
I am thinking similar to how the compiler generates code for functions, where input pointers are either 4 or 8 byte aligned. To prevent this, I use the __builtin_assume_aligned
macro.
What is best practice? Multiple binaries? Naming?
As long as you don't care about portability, yes.
Recent versions of GCC make this easier than any other compiler I'm aware of by using the target_clones function attribute. Just add the attribute, with a list of targets you want to create versions for, and GCC will automatically create the different variants, as well as a dispatch function to choose a version automatically at runtime.
If you want a bit more portability you can use the target attribute, which clang and icc also support, but you'll have to write the dispatch function yourself (which isn't difficult), and emit the function multiple times (generally using a macro, or repeatedly including a header).
AFAIK, if you want your code to work with MSVC you'll need multiple compiler invocations with different options.