I wrote a program that runs _mm_cmpistri to get the next \n (newline) character. While this works great on my computer, it fails on a server due to missing SSE 4.2 support.
Is there a good alternative using SSE commands <= SSE 4.1?
Ok, actual code it is. This hasn't been tested, it's just to give you the idea.
__m128i lf = _mm_set1_epi8('\n');
// unaligned part
__m128i data = _mm_loadu_si128((__m128i *)ptr);
int mask = _mm_movemask_epi8(_mm_cmpeq_epi8(data, lf));
if (mask != 0)
return ffs(mask);
int index = 16 - ((size_t)ptr & 15);
// aligned part, possibly overlaps unaligned part but that's ok
for (; index < length; index += 16) {
data = _mm_load_si128((__m128i *)(ptr + index));
mask = _mm_movemask_epi8(_mm_cmpeq_epi8(data, lf));
if (mask != 0)
return index + ffs(mask);
For MSVC, ffs
can be defined in terms of _BitScanForward