gcc clang compiler-optimization control-flow-graph pgo

How are PGOs applied to the source code? How does it affects the CFG?

Recently I've been searching for PGO's related topic, and started wondering how they are applied to the source code and one application effects after another is already applied.

I mean, if you enable PGO optimization in GCC or CLang, for example, it will apply all optimizations (Inlining, Virtual Call Speculation, Dead Code Separation, etc.), right!?

Even if they all are not applied to the source code, let's suppose that some of them are. Then, I guess that they are applied sequentially, right?

So, can they modify the CFG (Control Flow Graph) to the point where some Basic Block frequencies are lost?

For example, if a PGO named "B" is applied after a PGO named "A", and "A" has modified the source code so that some Basic Blocks frequencies are lost, how is "B" applied (supposing that both are PGOs that depend on the BB frequencies)?

(Sorry for my bad english)

Solution

PGO and most other optimizations don't get applied on the source code, they are applied on the intermediate code. The source code itself remains the same. However, the generated binary code will be (hopefully) optimized.

The purpose of PGO is to improve the effectiveness of traditional optimizations including inlining, virtual call speculation and rarely-executed code separation. So they're all still applied. You guessed correctly, they're applied in some sequential order.

Some of these optimizations change the CFG of the code. However, the the compiler keeps track of the original basic blocks that have been profiled even if their locations in the intermediate code have changed. In PGO, the compiler never removes any basic blocks. The compiler, however, may either keep a basic block the same, change its location, expand a basic block into more than one basic block or insert new basic blocks. Whatever it does, it keeps track of the original profiled basic blocks and their execution statistics so that further optimizations know how to make use of the profile. If there a new basic block has been inserted, it will be optimized normally without using the profile.