c++ssesse4

SSE 4.2: alternative to _mm_cmpistri


I wrote a program that runs _mm_cmpistri to get the next \n (newline) character. While this works great on my computer, it fails on a server due to missing SSE 4.2 support.

Is there a good alternative using SSE commands <= SSE 4.1?


Solution

  • Ok, actual code it is. This hasn't been tested, it's just to give you the idea.

    __m128i lf = _mm_set1_epi8('\n');
    // unaligned part
    __m128i data = _mm_loadu_si128((__m128i *)ptr);
    int mask = _mm_movemask_epi8(_mm_cmpeq_epi8(data, lf));
    if (mask != 0)
        return ffs(mask);
    int index = 16 - ((size_t)ptr & 15);
    // aligned part, possibly overlaps unaligned part but that's ok
    for (; index < length; index += 16) {
        data = _mm_load_si128((__m128i *)(ptr + index));
        mask = _mm_movemask_epi8(_mm_cmpeq_epi8(data, lf));
        if (mask != 0)
            return index + ffs(mask);
    }
    

    For MSVC, ffs can be defined in terms of _BitScanForward.