here is the code:
#include <immintrin.h>
#include <stdio.h>
#include <memory>
#include <iostream>
__m256i foo();
__m256i foo2();
int main() {
__m256i vec1 = foo();
__m256i vec2 = foo2();
__m256i result = _mm256_add_epi32(vec1, vec2);
/* Display the elements of the result vector */
int32_t* res = (int32_t*)&vec1;
std::cout << res[0] << std::endl;
std::cout << res[1] << std::endl;
std::cout << res[2] << std::endl;
std::cout << res[3] << std::endl;
std::cout << res[4] << std::endl;
std::cout << res[5] << std::endl;
std::cout << res[6] << std::endl;
std::cout << res[7] << std::endl;
system("pause");
return 0;
}
__m256i foo() {
__m256i v = { 1, 2, 3, 4, 5, 6, 7,8 };
return v;
}
__m256i foo2() {
__m256i w = { 1, 2, 3, 4, 5, 6, 7,8 };
return w;
}
I have delved deep and digged hard on the internet to try to learn how this AVX stuff works, but still don't fully get it. Shouldn't the above code display this:
2, 4, 6, 8, 10, 12, 14, 18 ? since I am adding adding the two vectors of eight uints together?
the output I get in the console when running this is this:
67305985 134678021 0 0 0 0 0 0
How could this be? I am an experienced programmer with c#, and am trying to learn c++ now. Thanks in advance for any response/explanation!
In order to initialize a __m256i
register, you should use the proper function.
For initializing with 32 bit values, you can use _mm256_set_epi32
.
Note that the values you pass to it should be in the opposite order than you expect (or alternatively, use mm256_setr_epi32
which accepts them in the reverse order).
Therefore your foo
and foo2
should be:
__m256i foo() {
__m256i v = _mm256_set_epi32(8, 7, 6, 5, 4, 3, 2, 1);
return v;
}
__m256i foo2() {
__m256i w = _mm256_set_epi32(8, 7, 6, 5, 4, 3, 2, 1);
return w;
}
Another issue in your code is probably a typo - you initialiazed int32_t* res
from vec1
instead of from result
.
Note:
Accessing the __m256i
via a pointer to int32_t
like you do for printing violates the strict aliasing rule.
As you can see in the live demo above MSVC supports it (-fno-strict-aliasing
), but a proper access would require to copy the data into a buffer of int32_t
s.
You can see more info in this post: print a __m128i variable.