I'm writing some SSE/AVX code and there's a task to divide a packed signed 32 bit integers by 2's complement. When the values are positive this shift works fine, however it produces wrong results for negative values, because of shifting the sign bit.
Is there any SIMD operation that lets me shift preserving the position of the sign bit? Thanks
SSE2/AVX2 has a choice of arithmetic1 vs. logical right shifts for 16 and 32-bit element sizes. (For 64-bit elements, only logical is available until AVX512).
Use _mm_srai_epi32
(psrad
) instead of _mm_srli_epi32
(psrld
).
See Intel's intrinsics guide, and other links in the SSE tag wiki https://stackoverflow.com/tags/sse/info. (Filter it to exclude AVX512 if you want, because it's pretty cluttered these days with all the masked versions for all 3 sizes...)
Or just look at the asm instruction-set reference, which includes intrinsics for instructions that have them. Searching for "arithmetic" in http://felixcloutier.com/x86/index.html finds the shifts you want.
Note the a
=arithmetic vs. l
=logical, instead of the usual intrinsics naming scheme of epu32
for unsigned. The asm mnemonics are simple and consistent (e.g. Packed Shift Right Arithmetic Dword = psrad
).
Arithmetic right shifts are also available for AVX2 variable-shifts (vpsravd
, and for the one-variable-for-all-elements version of the immediate shifts.
Footnote 1:
Arithmetic right shifts shift in copies of the sign bit, instead of zero.
This correctly implement 2's complement signed division by powers of 2 with rounding towards negative infinity, unlike the truncation toward zero you get from C signed division. Look at the asm output for int foo(int a){return a/4;}
to see how compilers implement signed division semantics in terms of shifts.