I'm working with an SDR that has a 12-bit signed ADC/DAC that stores in 16-bit IQ samples. I want to ensure that after all the DSP is done the samples saturate at 12 bits instead of getting truncated by the SDR.
This is the equivalent c++ code:
for (int i = 0; i < block_size_with_header; i++) {
if (floatSamples[i].real() > 2047)
floatSamples[i].real(2047);
if (floatSamples[i].imag() > 2047)
floatSamples[i].imag(2047);
if (floatSamples[i].real() < -2048)
floatSamples[i].real(-2048);
if (floatSamples[i].imag() < -2048)
floatSamples[i].imag(-2048);
}
Is there a faster way to do this using SIMD or Assembly? I've seen questions on here saturating at 16 bits or 8 bits, but not 12.
Thanks.
One interesting property of clamping, applying it twice doesn’t change the output, i.e. clamp( clamp( x ) ) == clamp( x )
for all x
. This greatly simplifies handling of remainder. Here’s AVX2 example, untested.
#include <stdint.h>
#include <immintrin.h>
// Clamp 16 int16_t numbers in memory to the specified min/max values
inline void clamp16( int16_t* ptr, __m256i min, __m256i max )
{
__m256i v = _mm256_loadu_si256( ( const __m256i* )ptr );
v = _mm256_min_epi16( v, max );
v = _mm256_max_epi16( v, min );
_mm256_storeu_si256( ( __m256i* )ptr, v );
}
void saturate12bits_avx2( int16_t* ptr, size_t length )
{
if( length >= 16 )
{
const __m256i max = _mm256_set1_epi16( 2047 );
const __m256i min = _mm256_set1_epi16( -2048 );
// We want a remainder of length [ 1 .. 16 ],
// saves a branch testing for no remainder
int16_t* const last = ptr + length - 16;
for( ; ptr < last; ptr += 16 )
clamp16( ptr, min, max );
clamp16( last, min, max );
}
else
{
// Very small input, can't load AVX vectors
int16_t* const end = ptr + length;
for( ; ptr < end; ptr++ )
{
int16_t i = *ptr;
i = std::min( i, (int16_t)2047 );
i = std::max( i, (int16_t)-2048 );
*ptr = i;
}
}
}
The input pointer doesn’t need to be aligned. Still, when it is aligned by 32 bytes, the function will run slightly faster.