Looking through the intel intrinsics guide, I saw this instruction. Looking through the naming pattern, the meaning should be clear: "Shift 128-bit register left by a fixed number of bits", but it is not. In actuality it shifts by a fixed number of bytes, which makes it exactly the same as _mm_bslli_si128
.
_mm_slli_epi32
or _mm_slli_epi64
?_mm_bslli_si128
?1 that’s not an oversight. That instruction indeed shifts by bytes, i.e. multiples of 8 bits.
2 doesn’t matter, _mm_slli_si128
and _mm_bslli_si128
are equivalents, both compile into pslldq
SSE2 instruction.
As for the emulation, I’d do it like that, assuming you have C++/17. If you’re writing C++/14, replace if constexpr
with normal if
, also add a message to the static_assert
.
template<int i>
inline __m128i shiftLeftBits( __m128i vec )
{
static_assert( i >= 0 && i < 128 );
// Handle couple trivial cases
if constexpr( 0 == i )
return vec;
if constexpr( 0 == ( i % 8 ) )
return _mm_slli_si128( vec, i / 8 );
if constexpr( i > 64 )
{
// Shifting by more than 8 bytes, the lowest half will be all zeros
vec = _mm_slli_si128( vec, 8 );
return _mm_slli_epi64( vec, i - 64 );
}
else
{
// Shifting by less than 8 bytes.
// Need to propagate a few bits across 64-bit lanes.
__m128i low = _mm_slli_si128( vec, 8 );
__m128i high = _mm_slli_epi64( vec, i );
low = _mm_srli_epi64( low, 64 - i );
return _mm_or_si128( low, high );
}
}