assemblycompiler-optimizationcontrol-flow-graph

Jump in the middle of basic block


A basic block is defined as a sequence of (non-jump) instructions ending with a jump (direct or indirect) instruction. The jump target address should be the start of another basic block. Consider I have the following assembly code :

106ac:       ba00000f        blt     106f0 <main+0xb8>
106b0:       e3099410        movw    r9, #37904      ; 0x9410
106b4:       e3409001        movt    r9, #1
106b8:       e79f9009        ldr     r9, [pc, r9]
106bc:       e3a06000        mov     r6, #0
106c0:       e1a0a008        mov     sl, r8
106c4:       e30993fc        movw    r9, #37884      ; 0x93fc
106c8:       e3409001        movt    r9, #1
106cc:       e79f9009        ldr     r9, [pc, r9]
106d0:       e5894000        str     r4, [r9]
106d4:       e7941105        ldr     r1, [r4, r5, lsl #2]
106d8:       e1a00007        mov     r0, r7
106dc:       e12fff31        blx     r1
106e0:       e0806006        add     r6, r0, r6
106e4:       e25aa001        subs    sl, sl, #1
106e8:       e287700d        add     r7, r7, #13
106ec:       1afffff4        bne     106c4 <main+0x8c>
106f0:       e30993d0        movw    r9, #37840      ; 0x93d0
106f4:       e3409001        movt    r9, #1

bb1

106a4:       ...
106ac:       ba00000f        blt     106f0 <main+0xb8>

The first basic block bb1 has a target address which is the start of bb4.

bb2

106b0:       e3099410        movw    r9, #37904      ; 0x9410
....        All other instructions
106c4:       e30993fc        movw    r9, #37884      ; 0x93fc
....        All other instructions
106d8:       e1a00007        mov     r0, r7
106dc:       e12fff31        blx     r1

The second basic block bb2 has an indirect branch so the target address is known only at runtime.

bb3

106e0:       e0806006        add     r6, r0, r6
106e4:       e25aa001        subs    sl, sl, #1
106e8:       e287700d        add     r7, r7, #13
106ec:       1afffff4        bne     106c4 <main+0x8c>

The third basic block has a target address which is not the start of another basic block but it is in the middle of bb2. According to the definition of a basic block, it is not possible. But, in practice, I am seeing this behavior (jumps in the middle of basic blocks) in multiple places. How to explain this behavior ? Is it possible to force a compiler (LLVM) to generate code that does not jump anywhere else except at the beginning of a basic block ?

bb4

106f0:       e30993d0        movw    r9, #37840      ; 0x93d0
106f4:       e3409001        movt    r9, #1
....
Ends with a branch (direct or indirect)

I am generating basic blocks using a tool (SPEDI) and the compiler used is LLVM (CLANG front end) and the targeted architecture is ARM V7 Cortex-A9.


Solution

  • Basic blocks are the nodes in the control flow graph, which means that once control enters the block, it can't do anything else apart from running through the whole block and exiting it. It doesn't mean that they have to start or end with a jump instruction. For better understanding refer to this excerpt from Wikipedia:

    Because of its construction procedure, in a CFG, every edge A→B has the property that:

    outdegree(A) > 1 or indegree(B) > 1 (or both).

    The CFG can thus be obtained, at least conceptually, by starting from the program's (full) flow graph—i.e. the graph in which every node represents an individual instruction—and performing an edge contraction for every edge that falsifies the predicate above, i.e. contracting every edge whose source has a single exit and whose destination has a single entry.

    According to this definition I would analyze code between 106b0 and 106ec differently: one block B1 from 106b0 to 106c0, and one block B2 from 106c4 to 106ec. This code is a loop, B1 is the setup part of the loop and B2 is the body.

    In ARM a bl instruction such as the one at 106dc is a function call, meaning that control will be passed to the called function but then returned to the instruction right after the bl. So if we're only constructing the CFG for the calling function I wouldn't consider this instruction as a block boundary. If we're doing the CFG for the whole program there should be an edge towards the called function here and then another edge back from the called function to the next instruction.