cssesimdintrinsicssse2

How to best emulate the logical meaning of _mm_slli_si128 (128-bit bit-shift), not _mm_bslli_si128


Looking through the intel intrinsics guide, I saw this instruction. Looking through the naming pattern, the meaning should be clear: "Shift 128-bit register left by a fixed number of bits", but it is not. In actuality it shifts by a fixed number of bytes, which makes it exactly the same as _mm_bslli_si128.


Solution

  • 1 that’s not an oversight. That instruction indeed shifts by bytes, i.e. multiples of 8 bits.

    2 doesn’t matter, _mm_slli_si128 and _mm_bslli_si128 are equivalents, both compile into pslldq SSE2 instruction.

    As for the emulation, I’d do it like that, assuming you have C++/17. If you’re writing C++/14, replace if constexpr with normal if, also add a message to the static_assert.

    template<int i>
    inline __m128i shiftLeftBits( __m128i vec )
    {
        static_assert( i >= 0 && i < 128 );
        // Handle couple trivial cases
        if constexpr( 0 == i )
            return vec;
        if constexpr( 0 == ( i % 8 ) )
            return _mm_slli_si128( vec, i / 8 );
    
        if constexpr( i > 64 )
        {
            // Shifting by more than 8 bytes, the lowest half will be all zeros
            vec = _mm_slli_si128( vec, 8 );
            return _mm_slli_epi64( vec, i - 64 );
        }
        else
        {
            // Shifting by less than 8 bytes.
            // Need to propagate a few bits across 64-bit lanes.
            __m128i low = _mm_slli_si128( vec, 8 );
            __m128i high = _mm_slli_epi64( vec, i );
            low = _mm_srli_epi64( low, 64 - i );
            return _mm_or_si128( low, high );
        }
    }