Is there any way to perform a comparison like C >= (A + B) with SSE2/4.1 instructions considering 16 bit unsigned addition (_mm_add_epi16()
) can overflow?
The code snippet looks like-
#define _mm_cmpge_epu16(a, b) _mm_cmpeq_epi16(_mm_max_epu16(a, b), a)
__m128i *a = (__m128i *)&ptr1;
__m128i *b = (__m128i *)&ptr2;
__m128i *c = (__m128i *)&ptr3;
_m128i xa = _mm_lddqu_si128(a);
_m128i xb = _mm_lddqu_si128(b);
_m128i xc = _mm_lddqu_si128(c);
_m128i res = _mm_add_epi16(xa, xb);
_m128i xmm3 = _mm_cmpge_epu16(xc, res);
The issue is that when the 16 bit addition overflows (wraps-around), the greater than comparison results in false positives. I can't use saturated addition for my purpose. I have looked at mechanism to detect overflow for unsigned addition here SSE2 integer overflow checking. But how how do I use if for greater than comparision.
You build the missing primitives from what you have available in the instruction set. Here’s one possible implementation, untested. Disassembly.
// Compare uint16_t lanes for a >= b
inline __m128i cmpge_epu16( __m128i a, __m128i b )
{
const __m128i max = _mm_max_epu16( a, b );
return _mm_cmpeq_epi16( max, a );
}
// Compare uint16_t lanes for c >= a + b, with overflow handling
__m128i cmpgeSum( __m128i a, __m128i b, __m128i c )
{
// Compute c >= a + b, ignoring overflow issues
const __m128i sum = _mm_add_epi16( a, b );
const __m128i ge = cmpge_epu16( c, sum );
// Detect overflow of a + b
const __m128i sumSaturated = _mm_adds_epu16( a, b );
const __m128i sumInRange = _mm_cmpeq_epi16( sum, sumSaturated );
// Combine the two
return _mm_and_si128( ge, sumInRange );
}