llvmllvm-ir

How LLVM mem2reg pass works


mem2reg is an important optimization pass in llvm. I want to understand how this optimization works but didn't find good articles, books, tutorials and similar.

I found these two links:

Both links explains that one can use Cytron's classical SSA algorithm to implement this pass, but reading the original paper I didn't see how alloca instructions are converted to registers.

As alloca is an instruction specific to llvm IR, I wonder if the algorithm that converts alloca instructions to registers is an ad-hoc algorithm that only works for llvm. Or if there is a general theory framework that I just don't know the name yet that explains how to promote memory variables to registers variables.

On the official documentation, it says: mem2reg: This file promotes memory references to be register references. It promotes alloca instructions which only have loads and stores as uses. An alloca is transformed by using dominator frontiers to place phi nodes, then traversing the function in depth-first order to rewrite loads and stores as appropriate. This is just the standard SSA construction algorithm to construct “pruned” SSA form.

By this description, it seems that one just need to check if all users of the variable are load and store instructions and if so, it can be promoted to a register.

So if you can link me to articles, books, algorithms and so on I would appreciate.


Solution

  • Efficiently Computing Static Single Assignment Form and the Control Dependence Graph by Cytron, Ferrante et al.

    The alloca instruction is named after the alloca() function in C, the rarely used counterpart to malloc() that allocates memory on the stack instead of the heap, so that the memory disappears when the function returns without needing to be explicitly freed.

    In the paper, figure 2 shows an example of "straight-line code":

      V ← 4
        ← V + 5
      V ← 6
        ← V + 7
    

    That text isn't valid LLVM IR, if we wanted to rewrite the same example in pre-mem2reg LLVM IR, it would look like this:

      ; V ← 4
      store i32 4, i32* %V.addr
      ;   ← V + 5
      %tmp1 = load i32* %V.addr
      %tmp2 = add i32 %tmp1, 5
      store i32 %tmp2, i32* %V.addr
      ; V ← 6
      store i32 6, i32* %V.addr
      ;   ← V + 7
      %tmp3 = load i32* %V.addr
      %tmp4 = add i32 %tmp3, i32 7
      store i32 %tmp4, i32* %V.addr
    

    It's easy enough to see in this example how you could always replace %tmp1 with i32 4 using store-to-load forwarding, but you can't always remove the final store. Knowing nothing about %V.addr means that we must assume it may be used elsewhere.

    If you know that %V.addr is an alloca, you can simplify a lot of things. You can see the alloca instruction so you know you can see all the uses, you never have to worry that a store to %ptr may alias with %V.addr. You see the allocation so you know what its alignment is. You know pointer accesses can not fault anywhere in the function, there is no free() equivalent for an alloca that anyone could call. If you see a store that isn't loaded before the end of the function, you can eliminate that store as a dead store since you know the alloca's lifetime ends at the function return. Finally you may delete the alloca itself if you've removed all its uses.

    Thus if you start with an alloca whose only users are loads and stores, the problem has little in common with most memory optimization problems: there's no possibility of pointer aliasing nor concern about faulting memory accesses. What you do need to do is place the ϕ functions (phi instructions in LLVM) in the right places given the control flow graph, and that's what the paper describes how to do.