c++gcccompiler-optimization

How to get GCC to optimize long XOR chain


I have a loop like:

uint32_t result = 0;

for ( int i = 0; i < CONSTANT; ++i )
{
    result ^= expr;
}
return result;

Overall, GCC is doing a beautiful job with this code. It fully unrolls the loop and generates optimal code for expr. However, it does the result XOR CONSTANT times. It could be accumulating partial results and XOR'ing them together hierarchically.

I suspect if I hand-unroll this with macros I can do it manually (CONSTANT isn't large), but I'm wondering why it doesn't see this, or if I'm doing something that's preventing it due to some arcane C++ language rule.


Solution

  • There is likely no benefit to accumulating partial results here. If you use a divide and conquer strategy (XOR evens with odds to halve size, then repeat, halving number of operands each time), you still end up doing O(CONSTANT) work (one half the work plus one quarter the work plus one eighth the work, etc., eventually performing CONSTANT - 1 operations).

    Accumulating partial results in chunks behaves the same. Fundamentally, you must have CONSTANT - 1 XOR operations. And since these are fixed width registers, not growing arbitrary precision integers, the work for each XOR is identical. You're highly unlikely to realize any gains at all from a more complicated approach barring parallelizing the expr work.