loopsgraph-theorycontrol-flow-graph

Control Flow Graph : properly identify loop "condition"


I have this C# code sample (but language is absolutely not important) :

 public static void NestedSimple(int[] a, int n)
    {
        for(int i = 0; i < n && i < 12; i++)
        {
            a[i] += 1;
        }
    }

Once compiled, I get this IL :

IL_0000: nop
    IL_0001: ldc.i4.0
    IL_0002: stloc.0
    IL_0003: br.s IL_0017
    // loop start (head: IL_0017)
        IL_0005: nop
        IL_0006: ldarg.0
        IL_0007: ldloc.0
        IL_0008: ldelema [mscorlib]System.Int32
        IL_000d: dup
        IL_000e: ldind.i4
        IL_000f: ldc.i4.1
        IL_0010: add
        IL_0011: stind.i4
        IL_0012: nop
        IL_0013: ldloc.0
        IL_0014: ldc.i4.1
        IL_0015: add
        IL_0016: stloc.0

        IL_0017: ldloc.0
        IL_0018: ldarg.1
        IL_0019: bge.s IL_0022

        IL_001b: ldloc.0
        IL_001c: ldc.i4.s 12
        IL_001e: clt
        IL_0020: br.s IL_0023

        IL_0022: ldc.i4.0

        IL_0023: stloc.1
        IL_0024: ldloc.1
        IL_0025: brtrue.s IL_0005
    // end loop

    IL_0027: ret

which can be represented by this Control Flow Graph (reconstructed from IL, basic blocks are properly identified and domination relations computed by Lengauer-Tarjan) :

Control Flow Graph

The back edge is IL_0005 -> IL_0017 as IL_0005 dominates IL_0017 which means loop header is IL_0017.

How can I identify that every block between IL_0017 and IL_0023 belong to the loop exit condition and not the plain body ?

My intuition is that IL_0005 is the actual body and the only edge going to IL_0005 starts from IL_0023, meaning that all blocks between IL_0017 and IL_0023 are part of the condition.

However -> it doesn't rely on theory and depends on C# -> MSIL compiler behavior.

Is there something solid that I can implement ?

So far, my code looks like :

// if back edge (n->d where d dominates n) then we have a loop
            if ((block.JumpTarget != null && block.JumpTarget.Dominates(block)) ||
                (block.Next != null && block.Next.Dominates(block)))
            {
                var header = block.JumpTarget != null ? block.JumpTarget : block.Next;
                var tail = block;
                var visited = new HashSet<BasicBlock> { header, tail};
                var stack = new Stack<BasicBlock>();
                
                // populate body
                stack.Push(block);
                while (stack.Any())
                {
                    var n = stack.Pop();
                    if(!visited.Contains(n))
                    {
                        visited.Add(n);
                        foreach(var p in n.Predecessors)
                        {
                            stack.Push(p);
                        }
                    }
                }
                var body = visited.OrderBy(b => b.Id).ToList();
           /// HERE : how can I identify loop condition ??

Solution

  • I don't if this answer your question, but it's possible to unroll the loop like the following (practically "extracting" the boolean condition)

       IL_000
        |
       IL_0017
        |     \
       IL_001b  IL_0022
        |      /
     - IL_0023 ----------.
    '   |                |
    |  IL_0005           |
    |   |                |
    |  IL_0017           |
    |   |     \          |
    |  IL_001b  IL_0022  |
    '  /       /         |
     \--------/          |
                         |
                        IL_0027
    

    you see that (i < n && i <12) is encoded inside the 17, 1b, 22 and the if by the 23 block, becoming the head of the loop.

    My reasoning is that a while loop is encoded by the following graph

    enter image description here

    I don't know an algorithm with only basic blocks, decompilers usually "simplify" them: in your case all the blocks I have extracted could be collapsed into a single statemement and merged into IL_0023.