clangllvmllvm-ir

How does the LLVM compiler parse a c struct?


For a c-program like this:

struct test_struct1 {
  int *a;
  int b;
};

int main(int argc, char *argv[]) {
  int a = 1;
  struct test_struct1 t1 = {&a, 0};

  return 0;
}

The generated IR code is as follows:

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p:32:32-p10:8:8-p20:8:8-i64:64-n32:64-S128-ni:1:10:20"
target triple = "wasm32"

%struct.test_struct1 = type { ptr, i32 } *** ①***

; Function Attrs: noinline nounwind optnone
define hidden i32 @__main_argc_argv(i32 noundef %0, ptr noundef %1) #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca ptr, align 4
  %6 = alloca i32, align 4
  %7 = alloca %struct.test_struct1, align 4
  store i32 0, ptr %3, align 4
  store i32 %0, ptr %4, align 4
  store ptr %1, ptr %5, align 4
  store i32 1, ptr %6, align 4      ***②***

  %8 = getelementptr inbounds %struct.test_struct1, ptr %7, i32 0, i32 0
  store ptr %6, ptr %8, align 4
  %9 = getelementptr inbounds %struct.test_struct1, ptr %7, i32 0, i32 1
  store i32 0, ptr %9, align 4
  ret i32 0
}

attributes #0 = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+mutable-globals,+sign-ext" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 16.0.6"}

In the above generated IR code, ① is the type of structure generated, and ② is generating the access instruction (load/store). I would like to know which part in LLVM source code is responsible for the ① and ② .

I hope someone who is familiar with the LLVM source code can tell me the answer or some ideas on how to solve it, like which part of the code to go to. Many Thanks for your help.


Solution

  • The starting point for learning about the Clang+LLVM design is the documentation pages for the respective components:

    For the task of parsing and code generation of code that uses a C struct, the "Clang" CFG Internals Manual would be the place to start, specifically The Lexer and Preprocessor Library, The Parser Library (brief though that is), and The AST Library. The job of the lexer and parser is to convert C/C++ source code into an abstract syntax tree (AST); this is the compiler "front end".

    The code generator or "back end" is responsible for generating LLVM IR from the AST. Unfortunately, the internals documentation for The CodeGen Library is just a single sentence with a couple links. And, more generally, these "internals" documents tend to provide conceptual overviews rather than code details.

    Fortunately, both components have doxygen-generated documentation that describes the namespaces, classes, and functions in detail, and has pointers directly into the source code:

    For both of these, the index page itself does not have much information, but the menu at the top has links to namespaces, classes, etc., that can be browsed. As with any large piece of software, it takes a while to get used to the naming conventions, but once you do, text search on the list of classes is often the fastest way to find something. (For Clang, my go-to bookmark is the clang namespace page.)

    Things relevant to parsing a struct include:

    Even more specific to struct:

    Things relevant to generating LLVM IR to manipulate a struct include: