Environment : STM32H7 and GCC
Working with a flow of data : 1 sample received from SPI every 250 us
I do a "triangle" weighted moving average with 256 samples, like this but middle sample is weighted 1 and it forms a triangle around it
My samples are stored in uint32_t val[256]
circular buffer, it works with a uint8_t write_index
The samples are 24 bits, the max value of a sample is 0x00FFFFFF
uint8_t write_idx =0;
uint32_t val[256];
float coef[256];
void init(void)
{
uint8_t counter=0;
// I calculate my triangle coefs
for(uint16_t c=0;c<256;c++)
{
coef[c]=(c>127)?--counter:++counter;
coef[c]/=128;
}
}
void ACQ_Complete(void)
{
uint32_t moy=0;
// write_idx is meant to wrap
val[write_idx++]= new_sample;
// calc moving average (uint8_t)(c-write_idx) is meant to wrap
for(uint16_t c=0;c<256;c++)
moy += (uint32_t)(val[c]*coef[(uint8_t)(c-write_idx)]);
moy/=128;
}
I have to do the calcs during a 250 us time span, but I measured with a debug GPIO pin that the "moy" part takes 252 us
Code is simulated here
Interesting fact : If I remove the (uint32_t)
cast near the end it takes 274 us instead of 252 us
I was thinking of using uint32
instead of float
for coef
(by multiply by 1000 for example) but my uint32
would overflow
This should unquestionably be done in integer. It will be both faster and more accurate.
This processor can do 32x32+64=64 multiply accumulate in a single cycle!
Multiply all your coefficients by a power of 2 (not 1000 mentioned in the comments), and then shift down at the end rather than divide.
uint32_t coef[256];
uint64_t moy = 0;
for(unsigned int c = 0; c < 256; c++)
{
moy += (val[c] * (uint64_t)coef[(c - write_idx) & 0xFFu]);
}
moy >>= N;