simdavxavx2avx512

x86-64 SIMD mechanism to "compare" 8-bit unsigned integers, giving a vector of +1 / 0 / -1 results (signum)?


Let's say I have two unsigned integer (8-bit) packed registers a and b. I'd like to compare them and get back +1 for a > b, 0 for a=b, or -1 for a < b. Alternatively, distance also works (i.e. instead of -1/+1, the actual difference is returned).

What's the SIMD (presumably AVX2) approach for accomplishing this efficiently? I can't use AVX512 but would be good to know if this is an AVX512 feature.


Solution

  • a <=> b for signed bytes would be _mm256_subs_epi8(a, b) (subtract using signed saturation). The result can be normailzed to -1 / 0 / 1 using _mm256_sign_epi8(_mm256_set1_epi8(-1), res)

    Unsigned bytes could be "range shifted"[1] to signed bytes:

    const __m256i x80 = _mm256_set1_epi8(-128);
    __m256i res = _mm256_subs_epi8(_mm256_xor_si256(a, x80), _mm256_xor_si256(b, x80));
    

    Absolute difference can be found by

    _mm256_or_si256(_mm256_subs_epu8(a, b), _mm256_subs_epu8(b, a)) or

    _mm256_sub_epi8(_mm256_max_epu8(a, b), _mm256_min_epu8(a, b)).


    [1] Range Shifting:

    SSE2 (or later) often have instructions for signed integers without any corresponding instruction for unsigned integers (or vice versa).

    To convert 8-bit integers between signed and unsigned ranges just add or subtract 128.

    Signed 8-bit integers have the range -128..127
    Unsigned 8-bit integers have the range 0..255
    

    A micro-optimization for this conversion is to flip the most significant bit of a lane using XOR. 128 is 0b10000000 (0x80) in two's complement. There is no 9th bit so there is no need for a borrow or carry. Logical ops may run on more execution ports than arithmetic ops, on some architectures. Most SSE2 instructions have destructive two-operand forms so XOR allows the compiler to freely reorder x ^= y into y ^= x, unlike subtraction.