cx86simdssesse4

Intrinsic inverse to _mm_movemask_epi8


So first I'll just describe the task:

I need to:

  1. Compare two __m128i.
  2. Somehow do the bitwise and of the result with a certain uint16_t value (probably using _mm_movemask_epi8 first and then just &).
  3. Do the blend of the initial values based on the result of that.

So the problem is as you might've guessed that blend accepts __m128i as a mask and I will be having uint16_t. So either I need some sort of inverse instruction for _mm_movemask_epi8 or do something else entirely.

Some points -- I probably cannot change that uint16_t value to some other type, it's complicated; I doing that on SSE4.2, so no AVX; there's a similar question here How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)? but it's about avx and I'm very inexperienced with this so I cannot adopt the solution.

PS: I might need to do that for arm as well, would appreciate any suggestions.


Solution

  • When you do _mm_movemask_epi8 after a vector comparison, which produces -1 for true and 0 for false, you'll get a 16-bit integer (assuming SSE only) having the nth bit set for the nth byte equal to -1 in the vector.

    The following is the reverse (inverse?) operation.

    static inline __m128i bitMaskToByteMask16(int m) {
      __m128i sel = _mm_set1_epi64x(0x8040201008040201);
      return _mm_cmpeq_epi8(
        _mm_and_si128(
          _mm_shuffle_epi8(_mm_cvtsi32_si128(m),
            _mm_set_epi64x(0x0101010101010101, 0)),
          sel),
        sel);
    }
    

    Note that you might want to do a bitwise operation with the vector mask converted from an integer mask, without going back and forth between integer ops and vector ops.