I'm starting to learn a little bit about SIMD intrinsics. I noticed that for some functions there is an aligned and an unaligned version, for example _mm_store_si128
and _mm_storeu_si128
. My question is, do these functions perform differently, and if not why are there two different versions?
I'd say "always align (wherever possible)", this way you are covered no matter what. Some platforms do not support unaligned access, others will have substantial performance degradation. If you go for aligned access you will have optimal performance in any case. There might be a small cost of memory on some platforms, but it is well worth it, because if you go SIMD that means you go for performance. I can think of no reason why one should implement unaligned code path. Maybe if you have to deal with some old design, which wasn't built with SIDM in mind, but I'd say the odds of that are slim to none.
I'd say the same applies to scalars as well, proper alignment is proper in any case, and saves you some trouble when achieving optimal performance...
As of why unaligned access might be slower or even unsupported - it is because of how hardware works. Say you have a 64bit integer, and a 64bit memory controller, if your integer is properly aligned, the memory controller can access it in a single swoop. But if it is offset, the memory controller will have to do 2 operations, plus the CPU may need to shift data around to compose it properly. And since that is suboptimal, some platforms don't even support it implicitly, as the means to enforce efficiency.