cassemblyx86-64abifibers

C fibers crashing on printf


I am in the process of creating a fiber threading system in C, following https://graphitemaster.github.io/fibers/ . I have a function to set and restore context, and what i am trying to accomplish is launching a function as a fiber with its own stack. Linux, x86_64 SysV ABI.

extern void restore_context(struct fiber_context*);
extern void create_context(struct fiber_context*);

void foo_fiber()
{
    printf("Called as a fiber");
    exit(0);
}

int main()
{
    const uint32_t stack_size = 4096 * 16;
    const uint32_t red_zone_abi = 128;

    char* stack = aligned_alloc(16, stack_size);
    char* sp = stack + stack_size - red_zone_abi;

    struct fiber_context c = {0};
    c.rip = (void*)foo_fiber;
    c.rsp = (void*)sp;

    restore_context(&c);
}

where restore_context code is as follows:

.type restore_context, @function
.global restore_context
restore_context:
  movq 8*0(%rdi), %r8

  # Load new stack pointer.
  movq 8*1(%rdi), %rsp

  # Load preserved registers.
  movq 8*2(%rdi), %rbx
  movq 8*3(%rdi), %rbp
  movq 8*4(%rdi), %r12
  movq 8*5(%rdi), %r13
  movq 8*6(%rdi), %r14
  movq 8*7(%rdi), %r15

  # Push RIP to stack for RET.
  pushq %r8

  xorl %eax, %eax
  ret

So basically i am creating a new stack on the heap, and since the stack growns downwards, i take the end address - 128 bytes of red zone (which is necessary in the ABI). What restore_context does is simply swap %rsp to my new stack, and push address of foo_fiber onto it and then ret's to jump into foo_fiber. (it also loads some registers from fiber_context structure, but it should not matter now).

From what im seeing in GDB, the program manages to properly jump to foo_fiber and into printf, and then it crashes in _vprintf_internal on movaps %xmm1, 0x10(%rsp).

|  0x7ffff7e2f389 <__vfprintf_internal+153>        movdqu (%rax),%xmm1                                                                                                                                                    │
│  0x7ffff7e2f38d <__vfprintf_internal+157>        movups %xmm1,0x128(%rsp)                                                                                                                                               │
│  0x7ffff7e2f395 <__vfprintf_internal+165>        mov    0x10(%rax),%rax                                                                                                                                                 │
│  >0x7ffff7e2f399 <__vfprintf_internal+169>       movaps %xmm1,0x10(%rsp)  

I find that extremely odd since it managed movups %xmm1, 0x128(%rsp) so a much higher offset from stack pointer. What is going on there?

If i change the code of foo_fiber to do something else, for example allocate and randomly fill char[100], it works.

I am kind of at loss about what is going on. At first i thought i might have alignment issues, since the vector xmm functions are crashing, so I changed malloc to aligned_alloc. The crash i am getting is a SIGSEGV, but 0x10


Solution

  • Agree with comments: your stack alignment is incorrect.

    It is true that the stack must be aligned to 16 bytes. However, the question is when? The normal rule is that the stack pointer must be a multiple of 16 at the site of a call instruction that calls an ABI-compliant function.

    Well, you don't use a call instruction, but what that really means is that on entry to an ABI-compliant function, the stack pointer must be 8 less than a multiple of 16, or in other words an odd multiple of 8, since it assumes it was called with a call instruction that pushed an 8-byte return address. That is just the opposite of what your code does, and so the stack is misaligned for the rest of your program, which makes printf crash when it tries to use aligned move instructions.

    You could subtract 8 from the sp computed in your C code.

    Or, I'm not really sure why you go to the trouble of loading the destination address into a register, then pushing and ret, when an indirect jump or call would do. (Unless you are deliberately trying to fool the indirect branch predictor?) An indirect call will also kill the stack-alignment bird, by pushing the return address (even though it will never be used). So you could leave the rest of your code alone, and replace all the r8/ret stuff in restore_context with just

    callq *(8*0)(%rdi)