javajvmjitmachine-instruction

What is the loop inversion technique?


I was going through a document which talks about just-in-time compiler (JIT) optimization techniques for Java. One of them was "loop inversion". And the document says:

You replace a regular while loop with a do-while loop. And the do-while loop is set within an if clause. This replacement leads to two fewer jumps.

How does loop inversion work and how does it optimize our code path?

N.B.: It would be great if somebody can explain with an example of Java code and how JIT optimizes it to native code and why is it optimal in modern processors.


Solution

  • while (condition) { 
      ... 
    }
    

    Workflow:

    1. check condition;
    2. if false, jump to outside of loop;
    3. run one iteration;
    4. jump to top.

    if (condition) do {
      ...
    } while (condition);
    

    Workflow:

    1. check condition;
    2. if false, jump to beyond the loop;
    3. run one iteration;
    4. check condition;
    5. if true, jump to step 3.

    Comparing these two you can easily see that the latter may not do any jumps at all, provided that there is exactly one step through the loop, and generally the number of jumps will be one less than the number of iterations. The former will have to jump back to check the condition, only to jump out of the loop when the condition is false.

    Jumps on modern pipelined CPU architectures can be quite expensive: as the CPU is finishing the execution of the checks before the jump, the instructions beyond that jump are already in the middle of the pipeline. All this processing must be discarded if the branch prediction fails. Further execution is delayed while the pipeline is being reprimed.

    Explaining the mentioned branch prediction: for each kind of conditional jump the CPU has two instructions, each including a bet on the outcome. For example, you would put an instruction saying "jump if not zero, betting on not zero" at the end of a loop because the jump will have to be made on all iterations except the last one. That way the CPU starts pumping its pipeline with the instructions following the jump target instead of those following the jump instruction itself.

    Important note

    Please do not take this as an example of how to optimize at the source code level. That would be entirely misguided since, as already clear from your question, the transformation from the first form into the second is something the JIT compiler does as a matter of routine, completely on its own.