In the following line:
aData[i] = aData[i] + ( aOn * sin( i ) );
If aOn
is 0
or 1
, does the processor actually perform the multiplication, or does it conditionally work out the result (0
for 0
, other-value for 1
)?
I'm looking into algorithm performance consistency, which partly involves a look into the effect of Branch Prediction.
The hypothesis is that this code:
for ( i = 0; i < iNumSamples; i++ )
aData[i] = aData[i] + ( aOn * sin( i ) );
will provide more stable performance than this code (where branch prediction may destabilise performance):
for ( i = 0; i < iNumSamples; i++ )
{
if ( aOn )
aData[i] = aData[i] + sin( i );
}
with aOn
being either 0
or 1
, and it can toggle during the loop execution by another thread.
The actual conditional calculation (+ sin( i )
in the example above) involves more processing and the if condition must be within the loop (there are multitude of conditions, not just one like in the example above; also, changes to aOn
should have effect immediately and not per loop).
Ignoring performance consistency, the performance tradeoff between the two options is in the time it takes to execute the if
statement and that of a multiplication.
Regardless, it is easy to spot that if a processor would not perform the actual multiplication for values like 1
and 0
, the first option could be a win-win solution (no branch prediction, better performance).
Processors perform regular multiplication with 0
s and 1
s.
Reason is, that if the processor would check for 0
and 1
before each calculation, the introduction of the condition will take more cycles. While you would gain performance for 0
and 1
multipliers, you will lose performance for any other values (which are much more probable).
A simple program can prove this:
#include <iostream>
#include "cycle.h"
#include "time.h"
void Loop( float aCoefficient )
{
float iSum = 0.0f;
clock_t iStart, iEnd;
iStart = clock();
for ( int i = 0; i < 100000000; i++ )
{
iSum += aCoefficient * rand();
}
iEnd = clock();
printf("Coefficient: %f: %li clock ticks\n", aCoefficient, iEnd - iStart );
}
int main(int argc, const char * argv[])
{
Loop( 0.0f );
Loop( 1.0f );
Loop( 0.25f );
return 0;
}
For which the output is:
Coefficient: 0.000000: 1380620 clock ticks
Coefficient: 1.000000: 1375345 clock ticks
Coefficient: 0.250000: 1374483 clock ticks