For this question, I will use the notation 1
for a byte with all ones (0xFF) and 0
for a byte with all zeros.
I am looking for a way to zero the remaining bytes in a SSE register after the first zero byte using SSE 4.2 intrinsics:
Example Input:
1111'1101'1011'1000
Desired Output:
1111'1100'0000'0000
Please note, that the data should remain in the SSE register. This is an trivial task to do in a simple byte array!
SSE4.2 provides string instructions returning masks or indexes. In your case this should work:
// generate a mask of `0xff` until first `0` entry of `a`:
__m128i mask_until_first_zero(__m128i a)
{
// checks where `a` and `a` are equal and valid (i.e., until the first `0` byte):
return _mm_cmpistrm(a, a, _SIDD_UNIT_MASK); // <-- _SIDD_UNIT_MASK to create byte mask instead of bit mask
// ^ ^-- `m` to return a mask
// `------ `i` for implicit length string
}