[SOLVED] Saturate 16-bit signed integer to 12-bits signed

Saturate 16-bit signed integer to 12-bits signed

I'm working with an SDR that has a 12-bit signed ADC/DAC that stores in 16-bit IQ samples. I want to ensure that after all the DSP is done the samples saturate at 12 bits instead of getting truncated by the SDR.

This is the equivalent c++ code:

        for (int i = 0; i < block_size_with_header; i++) {
            if (floatSamples[i].real() > 2047)
                floatSamples[i].real(2047);
            if (floatSamples[i].imag() > 2047)
                floatSamples[i].imag(2047);
            if (floatSamples[i].real() < -2048)
                floatSamples[i].real(-2048);
            if (floatSamples[i].imag() < -2048)
                floatSamples[i].imag(-2048);
        }

Is there a faster way to do this using SIMD or Assembly? I've seen questions on here saturating at 16 bits or 8 bits, but not 12.

Thanks.

Solution

One interesting property of clamping, applying it twice doesn’t change the output, i.e. clamp( clamp( x ) ) == clamp( x ) for all x. This greatly simplifies handling of remainder. Here’s AVX2 example, untested.

#include <stdint.h>
#include <immintrin.h>

// Clamp 16 int16_t numbers in memory to the specified min/max values
inline void clamp16( int16_t* ptr, __m256i min, __m256i max )
{
    __m256i v = _mm256_loadu_si256( ( const __m256i* )ptr );
    v = _mm256_min_epi16( v, max );
    v = _mm256_max_epi16( v, min );
    _mm256_storeu_si256( ( __m256i* )ptr, v );
}

void saturate12bits_avx2( int16_t* ptr, size_t length )
{
    if( length >= 16 )
    {
        const __m256i max = _mm256_set1_epi16( 2047 );
        const __m256i min = _mm256_set1_epi16( -2048 );

        // We want a remainder of length [ 1 .. 16 ],
        // saves a branch testing for no remainder
        int16_t* const last = ptr + length - 16;
        for( ; ptr < last; ptr += 16 )
            clamp16( ptr, min, max );
        clamp16( last, min, max );
    }
    else
    {
        // Very small input, can't load AVX vectors
        int16_t* const end = ptr + length;
        for( ; ptr < end; ptr++ )
        {
            int16_t i = *ptr;
            i = std::min( i, (int16_t)2047 );
            i = std::max( i, (int16_t)-2048 );
            *ptr = i;
        }
    }
}

The input pointer doesn’t need to be aligned. Still, when it is aligned by 32 bytes, the function will run slightly faster.