I am working on a compiler project and wondering about the meaning and implementation of "basic blocks" of a control-flow graph (CFG). They say that the basic block is for the linear sequence of steps which don't have any branching. But first, a few questions there:
For example, say I have this:
let a = 10;
let b = 0;
if ((foo() && bar()) || baz()) {
b = 20;
if (hello > world) {
b *= 3
}
} else {
b = 2 * a;
}
let c = a * b
log(a);
Here, what are the "basic blocks"?
const blocks = [
{
inputs: [],
statements: [
'let a = 10',
'let b = 0',
],
outputs: ['a', 'b']
},
{
// ...?
}
]
I am not asking how exactly to implement a data-flow analyzer, just roughly speaking what the blocks should be in an example like this? Some pseudocode would be helpful. The conditional expression (foo() && bar()) || baz()
even hides some branching logic. In my code, I do something like this:
if
or
and
test foo
test bar
baz
Or even:
andResult =
and
test foo
test bar
orResult = baz | andResult
if orResult, ...
That shows it itself is steps in computation. And where should the if
statement go, does it start the next block? Basically wondering how they should generally be structured. And then to account for the nested conditional branch.
The basic blocks are effectively the segments of code you would get if you were to rewrite your program in terms of labels and gotos (of unconditional form: goto L
and conditional form: if (t0) then goto L else L'
).
Care must be taken to lower conditional constructs correctly so as to preserve the short-circuiting semantics that comes with them.
This transformation is a common intermediary step before constructing a control flow graph. The so-called "leader" algorithm (where you identify the first statement and and jump targets, designate those as leaders, and then construct blocks by taking the instructions from one leader up until the next) is described in "Compilers: Principles, Techniques, and Tools".
So, let's look at your example:
let a = 10;
let b = 0;
if ((foo() && bar()) || baz()) {
b = 20;
if (hello > world) {
b *= 3
}
} else {
b = 2 * a;
}
let c = a * b
log(a);
A compiler would first want to normalise this into something such as the following (being mindful that short-circuiting semantics must be retained):
L0:
a = 100;
b = 0;
t0 = foo();
if (t0 != 0) goto L1; else goto L2;
L1:
t1 = bar();
if (t1 != 0) goto L3; else goto L2;
L2:
t2 = baz();
if (t2 != 0) goto L3; else goto L4;
L3:
b = 20;
if (hello > world) goto L5; else goto L6;
L5:
b = b * 3;
L6:
goto L7;
L4:
b = a * 2;
L7:
c = a * b;
log (a);
return
From here, we can apply the aforementioned "leaders" algorithm to identify the leading instructions of basic blocks. Then, we can segment the code into blocks (from one leader up until the next), then figure out the control flow; ensuring to identify "fallthrough" edges (when a block's last instruction falls onto a leader without an explicit branch).
In the end, we are left with something like:
You can see that L6
is a block that just unconditionally jumps to L7
, this is an artefact arising from the normalisation/linearisation algorithm. Removing such superfluous blocks is known as "jump threading" and is done later.
If you wish to play around with an extant compiler that exposes this control flow transformation in one of its IRs, you can invoke GCC with the flag -fdump-tree-all
and then look at the file with the extension .gimple
.