closures clang llvm segmentation-fault trampolines

LLVM Trampoline causing SIGSEGV?

After reading up on generating closures in LLVM using trampolines I tried my hand at compiling some of the examples of trampolines that are floating around the internet (specifically this one). The LLVM IR given in the gist is as follows:

declare void @llvm.init.trampoline(i8*, i8*, i8*);
declare i8* @llvm.adjust.trampoline(i8*);

define i32 @foo(i32* nest %ptr, i32 %val) {
    %x = load i32* %ptr
    %sum = add i32 %x, %val
    ret i32 %sum
}

define i32 @main(i32, i8**) {
    %closure = alloca i32
    store i32 13, i32* %closure
    %closure_ptr = bitcast i32* %closure to i8*

    %tramp_buf = alloca [32 x i8], align 4
    %tramp_ptr = getelementptr [32 x i8]* %tramp_buf, i32 0, i32 0
    call void @llvm.init.trampoline(
            i8* %tramp_ptr,
            i8* bitcast (i32 (i32*, i32)* @foo to i8*),
            i8* %closure_ptr)
    %ptr = call i8* @llvm.adjust.trampoline(i8* %tramp_ptr)
    %fp = bitcast i8* %ptr to i32(i32)*
    %res = call i32 %fp (i32 13)

    ret i32 %res
}

Compiling this using clang trampolines.ll and executing it however, results in a SIGSEGV (the exact error that fish gives is fish: Job 1, './a.out ' terminated by signal SIGSEGV (Address boundary error)).

After some testing, it turned out that the calling of the "trampolined" function is the instruction causing the SIGSEGV, because commenting that out (and returning a dummy value) worked fine.

The problem does not seem to lie with clang either, because manually running llvm-as, llc and the like does not work either. Compiling on another machine is also not working. This leads me to believe that either my machine or LLVM is doing something wrong.

My clang version:

Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.3.0
Thread model: posix

Solution

Alright, more than a year later, and with the help of @user855, I finally have a working example.

As user855 noted in the comments, the code fails because the memory used to store the trampoline is not executable. This can be circumvented by using mmap to allocate executable memory instead (note that this is not memory on the stack, as opposed to before).

The code:

declare void @llvm.init.trampoline(i8*, i8*, i8*)
declare i8* @llvm.adjust.trampoline(i8*)
declare i8* @"\01_mmap"(i8*, i64, i32, i32, i32, i64)

define i32 @foo(i32* nest %ptr, i32 %val) {
    %x = load i32, i32* %ptr
    %sum = add i32 %x, %val
    ret i32 %sum
}

define i32 @main(i32, i8**) {
    %closure = alloca i32
    store i32 13, i32* %closure
    %closure_ptr = bitcast i32* %closure to i8*

    %mmap_ptr = call i8* @"\01_mmap"(i8* null, i64 72, i32 7, i32 4098, i32 0, i64 0)

    call void @llvm.init.trampoline(
            i8* %mmap_ptr,
            i8* bitcast (i32 (i32*, i32)* @foo to i8*),
            i8* %closure_ptr)

    %ptr = call i8* @llvm.adjust.trampoline(i8* %mmap_ptr)
    %fp = bitcast i8* %ptr to i32(i32)*
    %res = call i32 %fp (i32 13)

    ret i32 %res
}

The mmap call arguments are as follows: mmap(NULL, 72, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0). Note that the mmap function name is "\01_mmap" on my platform, it may differ on yours. To check, simply compile some code using clang -S -emit-llvm and note the mmap call.

Another interesting note is that this code requires the allocated trampoline to be released after, using munmap(ptr, 72).