assemblyx86simdsse2sse4

SSE4.1 unsigned integer comparison with overflow


Is there any way to perform a comparison like C >= (A + B) with SSE2/4.1 instructions considering 16 bit unsigned addition (_mm_add_epi16()) can overflow?

The code snippet looks like-

#define _mm_cmpge_epu16(a, b) _mm_cmpeq_epi16(_mm_max_epu16(a, b), a)

__m128i *a = (__m128i *)&ptr1;
__m128i *b = (__m128i *)&ptr2;
__m128i *c = (__m128i *)&ptr3;
            
_m128i xa = _mm_lddqu_si128(a);
_m128i xb = _mm_lddqu_si128(b);
_m128i xc = _mm_lddqu_si128(c);

_m128i res = _mm_add_epi16(xa, xb);
_m128i xmm3 = _mm_cmpge_epu16(xc, res);

The issue is that when the 16 bit addition overflows (wraps-around), the greater than comparison results in false positives. I can't use saturated addition for my purpose. I have looked at mechanism to detect overflow for unsigned addition here SSE2 integer overflow checking. But how how do I use if for greater than comparision.


Solution

  • You build the missing primitives from what you have available in the instruction set. Here’s one possible implementation, untested. Disassembly.

    // Compare uint16_t lanes for a >= b
    inline __m128i cmpge_epu16( __m128i a, __m128i b )
    {
        const __m128i max = _mm_max_epu16( a, b );
        return _mm_cmpeq_epi16( max, a );
    }
    
    // Compare uint16_t lanes for c >= a + b, with overflow handling
    __m128i cmpgeSum( __m128i a, __m128i b, __m128i c )
    {
        // Compute c >= a + b, ignoring overflow issues
        const __m128i sum = _mm_add_epi16( a, b );
        const __m128i ge = cmpge_epu16( c, sum );
    
        // Detect overflow of a + b
        const __m128i sumSaturated = _mm_adds_epu16( a, b );
        const __m128i sumInRange = _mm_cmpeq_epi16( sum, sumSaturated );
    
        // Combine the two
        return _mm_and_si128( ge, sumInRange );
    }