I've come accross a very strange behavior in gcc regarding operators and functions marked with __attribute((const))
. Logical and arithmetic operators lead to different optimizations, and I don't understand why.
It's not really a bug since __attribute((const))
is only a hint and there's no guarantee to its effect, but still this is very surprising. Anyone has any explanation ?
Here's the code. So I define an __attribute((const))
function:
int f(int & counter) __attribute((const));
int f(int & counter) {
++counter;
return 0;
}
Then I define an operator testing macro. This is done with macros and not templates/functors to present simple code to the compiler and simplify the optimization:
int global = 0; // forces results to be computed
#define TestOp(OP) \
{ \
int n = 0; \
global += (f(n) OP f(n)); \
std::cout << "op" #OP " calls f " << n << " times" << std::endl; \
}
And finally, I test different operators as follows. The comments match the output with g++-4.8 -std=c++11 -O2 -Wall -pedantic
same output at -O3
and -Ofast
int main() {
// all calls optimized away
TestOp(^) // 0
TestOp(-) // 0
// one call is optimized away
TestOp(|) // 1
TestOp(&) // 1
TestOp(||) // 1
TestOp(&&) // 1
// no optimization
TestOp(+) // 2
TestOp(*) // 2
return global;
}
My question is: why do arithmetic operators yield two calls? Why couldn't f()+f()
be optimized as 2*f()
? Is there a way to help/force this optimization ?
At first I thought multiplication might be more expensive, but I tried with f()+....+f()
and 10 additions still don't reduce to 10*f()
. Also, since it's int
arithmetic, operation order is irrelevant (contrary to float
s).
I also checked the asm but it doesn't help: all ints seem to be pre-computed at compile-time.
The compiler doesn't trust you.
Since you have a reference argument, the compiler doesn't seem to trust your const
attribute - a const
function is supposed to only look at values passed through the arguments (not references or dereferencing pointers).
Another way to test this is to break the const
function out in a separate compilation unit:
test1.cpp:
#include <stdio.h>
int global = 0; // forces results to be computed
int f(int i) __attribute((const));
void print_count(void);
#define TestOp(OP) \
{ \
int n = 0; \
global += (f(n) OP f(n)); \
printf("op %s ", #OP);\
print_count();\
}
int main() {
// all calls optimized away
TestOp(^) // 0
TestOp(-) // 0
// one call is optimized away
TestOp(|) // 1
TestOp(&) // 1
TestOp(||) // 1
TestOp(&&) // 1
// no optimization
TestOp(+) // 2
TestOp(*) // 2
return global;
}
counter.cpp:
#include <stdio.h>
static int counter = 0;
int f(int i) {
++counter;
return 0;
}
void print_count(void)
{
printf("counter %d\n", counter);
counter = 0;
}
Now the compiler figures out that there's no need to call f(0)
until f(0) | f(0)
, and the result of that one call to f(0)
is re-used for the other cases.
$ g++ -O2 -c counter.cpp && g++ -O2 -c test.cpp && g++ counter.o test.o && ./a.out
op ^ counter 0
op - counter 0
op | counter 1
op & counter 0
op || counter 0
op && counter 0
op + counter 0
op * counter 0