I found that the following code(C Files) can be compiled successfully in x86_64, gcc 10.1.0.
#include <immintrin.h>
#include <stdint.h>
#include <stdio.h>
typedef union{
__m64 x;
#if defined(__arm__) || defined(__aarch64__)
int32x2_t d[1];
#endif
uint8_t i8u[8];
}u_m64;
int main()
{
u_m64 a, b, c;
c.x = a.x + b.x;
return 0;
}
But there are lots of add function for __m64, like "_mm_add_pi16, _mm_hadd_pi16", "_mm_add_si64" and so on(The same applies to __mm128, __mm256...). So which one is called by the operate '+' ? And how can a 'Operator Overloading' be used in a C Files?
Yeah, gcc and clang provide basic operators for builtin SIMD types, which is frankly so beyond stupid that it's not even remotely funny :(
Anyhow, this mechanism isn't working in the same way as operator overloading in C++. What it's actually doing, is promoting __m64 to be a true intrinsic type (such as int/float), meaning the operators are at a language level, rather than overload level. (That's why it works in C).
In this case I would assume it is calling add (rather than horizontal add).
However, we now hit the biggest problem! - The contents of __m64 are NOT known at compile time!
Within any given __m64, we could be storing any permutation of:
8 x int8
4 x int16
2 x int32
8 x uint8
4 x uint16
2 x uint32
For addition (ignoring the saturated variants) that means the addition operator could be calling any one these perfectly valid choices:
_mm_add_pi8
_mm_add_pi16
_mm_add_pi32
I don't know which of those instructions gcc/clang ends up calling in this context, however I do know that it's always going to be the wrong instruction 66.66% of the time :(