Can someone please explain me why tr2 and tr4 show different result:
auto test1 = _mm256_set1_epi8(-1);
uint64_t tr2 = _mm256_movemask_epi8(test1);
uint32_t tr3 = _mm256_movemask_epi8(test1);
uint64_t tr4 = tr3;
_mm256_movemask_epi8(test1) should return int32, so assigning it to int64 should just assign lower bits.
Instead, tr2 prints 0xFFFFFFFFFFFFFFFF and tr4 prints 0x00000000FFFFFFFF
Is there any performance in doing it as tr4?
I'm new to both C++ and intrinsics so maybe I'm missing something obvious.
I'm using Visual Studio 2019 C++ compiler.
As Paul above said, this has to do with assignment of signed/unsigned with bigger integers. Here's an example:
#include <iostream>
#include <iomanip>
int main()
{
int32_t negInt = -1;
uint32_t unInt = static_cast<uint32_t>(negInt);
int64_t negBigInt = static_cast<int64_t>(negInt);
uint64_t unBigInt = static_cast<uint64_t>(negInt);
uint64_t fromUnsigned = static_cast<uint64_t>(unInt);
std::cout << std::hex;
std::cout << "0x" << std::setfill('0') << std::setw(16) << negInt << "\n";
std::cout << "0x" << std::setfill('0') << std::setw(16) << unInt << "\n";
std::cout << "0x" << std::setfill('0') << std::setw(16) << negBigInt << "\n";
std::cout << "0x" << std::setfill('0') << std::setw(16) << unBigInt << "\n";
std::cout << "0x" << std::setfill('0') << std::setw(16) << fromUnsigned << "\n";
}
This prints:
0x00000000ffffffff
0x00000000ffffffff
0xffffffffffffffff
0xffffffffffffffff
0x00000000ffffffff
So Paul is right, but notably this doesn't happen if you assign a signed number to higher bit-width fields.