cllvmllvm-irllvm-c++-apillvm-codegen

How to map multiple same type loops under a function to the generated basic block in LLVM IR?


If the loops are of the different type then I can easily identify them with the name but if there are multiple same type loops (say 5 while loops), how can I identify what basic block in the LLVM IR corresponds to which loop in the source code?

Manually it is easy to identify as we visit the code and the LLVM IR sequentially but I am looking how we can identify the same programmatically.

Example, I have the below source code in C:

int main()
{
   int count=1;
   while (count <= 4)
   {
        count++;
   }
   while (count > 4)
   {
        count--;
   }
   return 0;
}

when I execute the comand clang -S -emit-llvm fileName.c I got fileName.ll create with the below content:

; ModuleID = 'abc.c'
source_filename = "abc.c"
target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.0.23026"

; Function Attrs: noinline nounwind uwtable
define i32 @main() #0 {
entry:
  %retval = alloca i32, align 4
  %count = alloca i32, align 4
  store i32 0, i32* %retval, align 4
  store i32 1, i32* %count, align 4
  br label %while.cond

while.cond:                                       ; preds = %while.body, %entry
  %0 = load i32, i32* %count, align 4
  %cmp = icmp sle i32 %0, 4
  br i1 %cmp, label %while.body, label %while.end

while.body:                                       ; preds = %while.cond
  %1 = load i32, i32* %count, align 4
  %inc = add nsw i32 %1, 1
  store i32 %inc, i32* %count, align 4
  br label %while.cond

while.end:                                        ; preds = %while.cond
  br label %while.cond1

while.cond1:                                      ; preds = %while.body3, %while.end
  %2 = load i32, i32* %count, align 4
  %cmp2 = icmp sgt i32 %2, 4
  br i1 %cmp2, label %while.body3, label %while.end4

while.body3:                                      ; preds = %while.cond1
  %3 = load i32, i32* %count, align 4
  %dec = add nsw i32 %3, -1
  store i32 %dec, i32* %count, align 4
  br label %while.cond1

while.end4:                                       ; preds = %while.cond1
  ret i32 0
}

attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"PIC Level", i32 2}
!1 = !{!"clang version 4.0.0 (tags/RELEASE_400/final)"}

Now there are two basic blocks created for the given source file as while.cond and while.cond1, how can I identify which basic block is for which while loop in the source code?


Solution

  • Before I attempt to answer, I just want to note that depending on the selected optimization level or the manually selected pass with opt that information might not be there or might not be as accurate (e.g. because of inlining, cloning, etc).

    Now, the way to associate between low-level representations and source code is using debugging information (e.g. with the DWARF format). To produce debugging information you need to use the -g command-line flag during compilation.

    For LLVM IR, if you take a look at the Loop API there are relevant calls like getStartLoc. So you could do something like this (e.g. inside the runOn method of a llvm::Function pass):

    llvm::SmallVector<llvm::Loop *> workList;
    auto &LI = getAnalysis<llvm::LoopInfoWrapperPass>(CurFunc).getLoopInfo();
    
    std::for_each(LI.begin(), LI.end(), [&workList](llvm::Loop *e) { workList.push_back(e); });
    
    for(auto *e : workList) {
      auto line = e->getStartLoc().getLine();
      auto *scope = llvm::dyn_cast<llvm::DIScope>(e->getStartLoc().getScope());
      auto filename = scope->getFilename();
    
      // do stuff here
    }
    

    Moreover, for BasicBlock, you can also use the debug-related methods in Instruction (e.g. getDebugLoc) and combine it with calls to other Loop's methods such as getHeader, etc.

    Also, note that there is a getLoopID method that uses an internal unique ID for each loop, but that is not always there and it's subject to the potential elisions I mentioned at the start. Anyhow, if you need to manipulate it, look at examples in LLVM source following the setLoopID method (e.g. in lib/Transforms/Scalar/LoopRotation.cpp).