How to build dominance frontier for control flow graph?

I would like to understand what a common principle is used for building Ф-functions for nodes. I read about "dominance frontier (DF)" relationship in a graph that allow to build Ф-functions. Here is an example with control-flow graph for simple code snippet: control-flow-graph

Let's consider the definition for DF:

DF is a set of nodes w such that x dominates predecessor of w, but x does not strictly dominate w

Okay, here is my understanding of this definition. Let's consider: DF(B1) = { B3, B5, B6, B7 } because:

dom(B1, B2) & !strictly_dom(B1, B3) & is_predecessor(B2, B3);
dom(B1, B3) & !strictly_dom(B1, B5) & is_predecessor(B3, B5);
dom(B1, B3) & !strictly_dom(B1, B6) & is_predecessor(B3, B6); 
dom(B1, B6) & !strictly_dom(B1, B7) & is_predecessor(B6, B7);

Is this right understanding of DF? Could you give me more detailed explanation, please?

Solution

The best way to understand the role of dominance frontiers in the context of constructing SSA form is to identify join points in your graph. This is the intuition people use when constructing SSA form on paper.

If a node in your control flow graph has less than 2 predecessors, then it won't be in the dominance frontier of any node - as it cannot be a merge point for competing definitions. The concept of a "dominance frontier" is precisely the nodes where the "dominance" of some node ends (i.e. where its definitions may no longer be the dominant ones; the ones that definitely reach those points, without competition from other alternate paths where other definitions may occur and, thus, reach the same points).

So, just by looking at your control flow graph, we can see that the only nodes with at least 2 predecessors are B2 and B7. So, these nodes will be the ones that form the frontiers of some nodes.

You must appeal to the definition of dominance to know whose frontier is composed of these nodes. By definition, every node dominates itself. So, if we look at the predecessors of the blocks B2 and B7, we can eliminate the ones that don't strictly dominate those nodes.

For B7, we have that B5 and B6 dominate a predecessor of B7 (namely, themselves). However, they do not strictly dominate B7 (in fact, they do not dominate B7 at all). The fact that they both compete for dominance over the definitions that reach B7's is precisely why the dominance frontiers of B5 and B6 are both {B7}. Intuitively, citing your original graph, you can see that these two predecessors both define j and k, so when control flow reaches B7, whose definitions do you use? You can't tell, since you could go through either to reach B7. So, you'd place phi nodes: j <- ϕ((j, B5), (j, B6)) and k <- ϕ((k, B5), (k, B6)). Keeping track of the source of each variable definition you're talking about is important to identify which (re)name you use when you perform renaming (you place phi nodes, then rename).

For B2, B7 is a predecessor and dominates itself by definition. However, it doesn't strictly dominate B2 as it competes with initial definitions from the entry block, B1. Therefore, you'd also place phi nodes to merge those definitions: j <- ϕ((j, B1), (j, B7)), k <- ϕ((k, B1), (k, B7)). Notice how there's no competing definition for i because i is never redefined. As far as I can tell, you could constant propagate i's value, 1, to its usage site and remove the definition.

I recommend reading "A Simple, Fast Dominance Algorithm" by Cooper et al. (https://www.cs.rice.edu/~keith/EMBED/dom.pdf), they give a simple algorithm that computes the dominator tree and also an intuitive algorithm (fig 5) for computing the frontiers by walking up the dominator tree.

You're best thinking about dominance frontiers intuitively; which competing definitions can reach a join point? In the suggested DF(B1) you give, you have that B3 ∈ DF(B1). This cannot be right. The reason being that the only definitions that can reach B3 are the ones that leave B2 (its single predecessor). However, the definitions that enter B2 are competitive because you could reach B2 by coming from B7 or B1 (hence, as described above, B2 has to have a leading phi node - the definition created by the phi node is the one that will end up reaching B3 without competition).