I have found the bug in my program caused by misused SSE '_mm_extract_epi16' instruction, like below code:
#include <smmintrin.h>
#include <iostream>
int main(int argc, const char * argv[]) {
int16_t test_input[8] = {-1, 2, -3, -4, -5, -6, -7, -8};
__m128i v_input = _mm_load_si128((__m128i *)test_input);
int32_t extract = (int32_t)(_mm_extract_epi16(v_input, 1));
return 0;
}
If the extracted value is positive, then I get the right value 2. Oppositely I get the wrong value '65533'. Or I can use the below code get the right value.
#include <smmintrin.h>
#include <iostream>
int main(int argc, const char * argv[]) {
int16_t test_input[8] = {-1, 2, -3, -4, -5, -6, -7, -8};
__m128i v_input = _mm_load_si128((__m128i *)test_input);
int16_t extract = (_mm_extract_epi16(v_input, 1));
int32_t result = extract;
return 0;
}
I don't know why it happens.
int _mm_extract_epi16 ( __m128i a, int imm)
matches the asm behaviour of the pextrw
instruction of zero-extending into a 32-bit register.
Intel's intrinsics API uses int
all over the place even when an unsigned type would be more appropriate.
If you want to do 16-bit sign-extension on the result,
use (int16_t)_mm_extract_epi16(v,1)
. Or assign it to an int16_t
variable so the upper bytes of the result are ignored to start with.
unsigned 65533
= 2's complement -3
. This is normal. (216 - 3 = 65533 = 0xfffd
)