compiler-constructioncontrol-flow-graph

How exactly to construct "basic blocks" for a compiler (using JavaScript as an example)?


I am working on a compiler project and wondering about the meaning and implementation of "basic blocks" of a control-flow graph (CFG). They say that the basic block is for the linear sequence of steps which don't have any branching. But first, a few questions there:

  1. What if there is nested branching, how does that work?
  2. What about the logic inside the conditional branch statement, is that part of the previous block or the current block (or a third block)?

For example, say I have this:

let a = 10;
let b = 0;
if ((foo() && bar()) || baz()) {
  b = 20;
  if (hello > world) {
    b *= 3
  }
} else {
  b = 2 * a;
}
let c = a * b
log(a);

Here, what are the "basic blocks"?

const blocks = [
  {
    inputs: [],
    statements: [
      'let a = 10',
      'let b = 0',
    ],
    outputs: ['a', 'b']
  },
  {
    // ...?
  }
]

I am not asking how exactly to implement a data-flow analyzer, just roughly speaking what the blocks should be in an example like this? Some pseudocode would be helpful. The conditional expression (foo() && bar()) || baz() even hides some branching logic. In my code, I do something like this:

if
  or
    and
      test foo
      test bar
    baz

Or even:

andResult = 
  and
    test foo
    test bar
orResult = baz | andResult
if orResult, ...

That shows it itself is steps in computation. And where should the if statement go, does it start the next block? Basically wondering how they should generally be structured. And then to account for the nested conditional branch.


Solution

  • The basic blocks are effectively the segments of code you would get if you were to rewrite your program in terms of labels and gotos (of unconditional form: goto L and conditional form: if (t0) then goto L else L').

    Care must be taken to lower conditional constructs correctly so as to preserve the short-circuiting semantics that comes with them.

    This transformation is a common intermediary step before constructing a control flow graph. The so-called "leader" algorithm (where you identify the first statement and and jump targets, designate those as leaders, and then construct blocks by taking the instructions from one leader up until the next) is described in "Compilers: Principles, Techniques, and Tools".

    So, let's look at your example:

    let a = 10;
    let b = 0;
    if ((foo() && bar()) || baz()) {
      b = 20;
      if (hello > world) {
        b *= 3
      }
    } else {
      b = 2 * a;
    }
    let c = a * b
    log(a);
    

    A compiler would first want to normalise this into something such as the following (being mindful that short-circuiting semantics must be retained):

    L0:
      a = 100;
      b = 0;
      t0 = foo();
      if (t0 != 0) goto L1; else goto L2;
    L1:
      t1 = bar();
      if (t1 != 0) goto L3; else goto L2;
    L2:
      t2 = baz();
      if (t2 != 0) goto L3; else goto L4;
    L3:
      b = 20;
      if (hello > world) goto L5; else goto L6;
    L5:
      b = b * 3;
    L6:
      goto L7;
    L4:
      b = a * 2;
    L7:
      c = a * b;
      log (a);
      return
    

    From here, we can apply the aforementioned "leaders" algorithm to identify the leading instructions of basic blocks. Then, we can segment the code into blocks (from one leader up until the next), then figure out the control flow; ensuring to identify "fallthrough" edges (when a block's last instruction falls onto a leader without an explicit branch).

    In the end, we are left with something like: control flow graph

    You can see that L6 is a block that just unconditionally jumps to L7, this is an artefact arising from the normalisation/linearisation algorithm. Removing such superfluous blocks is known as "jump threading" and is done later.

    If you wish to play around with an extant compiler that exposes this control flow transformation in one of its IRs, you can invoke GCC with the flag -fdump-tree-all and then look at the file with the extension .gimple.