cstackstack-overflowdisassemblyvariable-length

What is stack guard page and probing stack?


I'm analysing how the compiler implements the variable-length array in c99. The following is my c code and disassembly which is commented on my understanding. The code is compiled with "-O3 -fomit-frame-pointer -fno-stack-protector -fpie"

c code:

# include<stdio.h>

int main() {
  size_t sz; // never be signed
  scanf("%zd", &sz);
  volatile char s[sz+1]; // prevent to be optimized away.
  s[sz] = '\0';     
}

disassembly:

Reading symbols from a.out...
(gdb) disass main
Dump of assembler code for function main():
   0x0000000000001060 <+0>:     endbr64                  
   0x0000000000001064 <+4>:     push   %rbp              # save the current frame pointer.
   0x0000000000001065 <+5>:     lea    0xf98(%rip),%rdi  # rdi = "%zd". 1st param
   0x000000000000106c <+12>:    xor    %eax,%eax         # eax = 0. 
   0x000000000000106e <+14>:    mov    %rsp,%rbp         # set the new frame pointer. 
   0x0000000000001071 <+17>:    sub    $0x10,%rsp        # allocate a 16 bytes. rsp is aligned by 16.

   0x0000000000001075 <+21>:    lea    -0x8(%rbp),%rsi              # rsi = &sz. 2nd param.
   0x0000000000001079 <+25>:    callq  0x1050 <__isoc99_scanf@plt>  # call __isoc99_scanf

   # volatile char s[sz+1]; // prevent to be optimized away.
   0x000000000000107e <+30>:    mov    -0x8(%rbp),%rcx           # rcx = sz
   0x0000000000001082 <+34>:    mov    %rsp,%rdi                 # rdi = rsp.
   0x0000000000001085 <+37>:    lea    0x10(%rcx),%rax           # rax = sz + 1 + 15
   0x0000000000001089 <+41>:    mov    %rax,%rdx                 # rdx = sz + 1 + 15
   0x000000000000108c <+44>:    and    $0xfffffffffffff000,%rax  # be mutilple of 4096
   0x0000000000001092 <+50>:    sub    %rax,%rdi                 # rdi is the address of the array s
   0x0000000000001095 <+53>:    and    $0xfffffffffffffff0,%rdx  # be multiple of 16 
   0x0000000000001099 <+57>:    mov    %rdi,%rax                 # rax = &s
   0x000000000000109c <+60>:    cmp    %rax,%rsp                 # if sz+16 is less than 4096,
   0x000000000000109f <+63>:    je     0x10b6 <main()+86>        # then jump to main+86 for 

   # the stack is grown as page size for every iteration of the loop.
   0x00000000000010a1 <+65>:    sub    $0x1000,%rsp        # grow the stack. 
   0x00000000000010a8 <+72>:    orq    $0x0,0xff8(%rsp)    # probe stack(???).
   0x00000000000010b1 <+81>:    cmp    %rax,%rsp           # if rsp isn't equal to rax,
   0x00000000000010b4 <+84>:    jne    0x10a1 <main()+65>  # then loop.

   0x00000000000010b6 <+86>:    and    $0xfff,%edx         # be less than 4096
   0x00000000000010bc <+92>:    sub    %rdx,%rsp           # allocate the remainder.
   0x00000000000010bf <+95>:    test   %rdx,%rdx           # if the remainder is not zero,
   0x00000000000010c2 <+98>:    jne    0x10cc <main()+108> # then, jump to probe stack(?).

   0x00000000000010c4 <+100>:   movb   $0x0,(%rsp,%rcx,1)  # s[sz] = '\0'
   0x00000000000010c8 <+104>:   xor    %eax,%eax           # eax = 0. 
   0x00000000000010ca <+106>:   leaveq                     # restore the previous stack frame.
   0x00000000000010cb <+107>:   retq                       # return 0;

   0x00000000000010cc <+108>:   orq    $0x0,-0x8(%rsp,%rdx,1)  # probe stack(??).
   0x00000000000010d2 <+114>:   jmp    0x10c4 <main()+100>     # jump back.
End of assembler dump.

"https://nullprogram.com/blog/2019/10/27/" says that first, -fomit-frame-pointer is ignored because VLA have to track the stack-frame dynamically. Second, when -fstack-clash-protection is enabled the compiler generates extra code to probe every pages of allocation in case one of those pages is a guard page, etc..

But in my disassembly code, I don't understand these lines:

   # the stack is grown as page size for every iteration of the loop.
   0x00000000000010a1 <+65>:    sub    $0x1000,%rsp        # grow the stack. 
   0x00000000000010a8 <+72>:    orq    $0x0,0xff8(%rsp)    # probe stack(???).
   0x00000000000010b1 <+81>:    cmp    %rax,%rsp           # if rsp isn't equal to rax,
   0x00000000000010b4 <+84>:    jne    0x10a1 <main()+65>  # then loop.

What does "orq $0x0, 0xff8(%rsp)" mean??. and what is probing stack?


Solution

  • Usually in normal operation stack is accessed sequentially as it grows. OS places guard (marked as non-existing) page at the end of stack space, so when stack overflows process tries to write to protected page and it causes segmentation fault. But when stack grows by more than one page at once stack pointer can jump over guard page and overflow wouldn't be detected, out-of-stack data may be overwritten. Probing every allocated page process ensures stack is not overflowed or segmentation fault occurres on overflow.

    With -fstack-clash-protection and array larger than stack limit your program is terminated by segmentation fault:

    $ ./stack-protected
    16777216
    Segmentation fault (core dumped)
    

    With -fno-stack-clash-protection it countinue to work:

    $ ./stack-unprotected
    16777216
    $
    

    but some data out of stack is corrupted.

    orq $0x0, 0xff8(%rsp) performs OR operation on 64-bit word in every page with value 0, i.e. writes to page without real data modification.