For a c-program like this:
struct test_struct1 {
int *a;
int b;
};
int main(int argc, char *argv[]) {
int a = 1;
struct test_struct1 t1 = {&a, 0};
return 0;
}
The generated IR code is as follows:
; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p:32:32-p10:8:8-p20:8:8-i64:64-n32:64-S128-ni:1:10:20"
target triple = "wasm32"
%struct.test_struct1 = type { ptr, i32 } *** ①***
; Function Attrs: noinline nounwind optnone
define hidden i32 @__main_argc_argv(i32 noundef %0, ptr noundef %1) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca ptr, align 4
%6 = alloca i32, align 4
%7 = alloca %struct.test_struct1, align 4
store i32 0, ptr %3, align 4
store i32 %0, ptr %4, align 4
store ptr %1, ptr %5, align 4
store i32 1, ptr %6, align 4 ***②***
%8 = getelementptr inbounds %struct.test_struct1, ptr %7, i32 0, i32 0
store ptr %6, ptr %8, align 4
%9 = getelementptr inbounds %struct.test_struct1, ptr %7, i32 0, i32 1
store i32 0, ptr %9, align 4
ret i32 0
}
attributes #0 = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+mutable-globals,+sign-ext" }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 16.0.6"}
In the above generated IR code, ① is the type of structure generated, and ② is generating the access instruction (load/store). I would like to know which part in LLVM source code is responsible for the ① and ② .
I hope someone who is familiar with the LLVM source code can tell me the answer or some ideas on how to solve it, like which part of the code to go to. Many Thanks for your help.
The starting point for learning about the Clang+LLVM design is the documentation pages for the respective components:
For the task of parsing and code generation of code that uses a C
struct
, the
"Clang" CFG Internals Manual
would be the place to start, specifically
The Lexer and Preprocessor Library,
The Parser Library
(brief though that is), and
The AST Library.
The job of the lexer and parser is to convert C/C++ source code into an
abstract syntax tree (AST); this is the compiler "front end".
The code generator or "back end" is responsible for generating LLVM IR from the AST. Unfortunately, the internals documentation for The CodeGen Library is just a single sentence with a couple links. And, more generally, these "internals" documents tend to provide conceptual overviews rather than code details.
Fortunately, both components have doxygen-generated documentation that describes the namespaces, classes, and functions in detail, and has pointers directly into the source code:
For both of these, the index page itself does not have much information,
but the menu at the top has links to namespaces, classes, etc., that can
be browsed. As with any large piece of software, it takes a while to
get used to the naming conventions, but once you do, text search on the
list of classes is often the fastest way to find something. (For Clang,
my go-to bookmark is the
clang
namespace
page.)
Things relevant to parsing a struct
include:
clang::Sema
,
which translates the proto-AST created by Parser
into the "real" AST
used by later stages.
clang::RecordDecl
,
the AST node that represents a C struct
.
Even more specific to struct
:
clang::Parser::ParseStructDeclaration
parses the struct
. Note that the link is to the github repo rather
than the doxygen documentation, since the latter omits private methods
and this method happens to be private. At a certain point, there's no
substitute for directly searching the code.
clang::Sema::ActOnTag
is, I think, the main function that builds RecordDecl
s.
Things relevant to generating LLVM IR to manipulate a struct
include:
clang::CodeGen
namespace generally.
clang::CodeGen::CGRecordLayout
,
which deals with translating Clang record types to LLVM record types.
clang::CodeGen::CodeGenFunction
,
which emits LLVM IR instructions for reading and writing LLVM record
types, among other things.