I'm analysing how the compiler implements the variable-length array in c99. The following is my c code and disassembly which is commented on my understanding. The code is compiled with "-O3 -fomit-frame-pointer -fno-stack-protector -fpie"
c code:
# include<stdio.h>
int main() {
size_t sz; // never be signed
scanf("%zd", &sz);
volatile char s[sz+1]; // prevent to be optimized away.
s[sz] = '\0';
}
disassembly:
Reading symbols from a.out...
(gdb) disass main
Dump of assembler code for function main():
0x0000000000001060 <+0>: endbr64
0x0000000000001064 <+4>: push %rbp # save the current frame pointer.
0x0000000000001065 <+5>: lea 0xf98(%rip),%rdi # rdi = "%zd". 1st param
0x000000000000106c <+12>: xor %eax,%eax # eax = 0.
0x000000000000106e <+14>: mov %rsp,%rbp # set the new frame pointer.
0x0000000000001071 <+17>: sub $0x10,%rsp # allocate a 16 bytes. rsp is aligned by 16.
0x0000000000001075 <+21>: lea -0x8(%rbp),%rsi # rsi = &sz. 2nd param.
0x0000000000001079 <+25>: callq 0x1050 <__isoc99_scanf@plt> # call __isoc99_scanf
# volatile char s[sz+1]; // prevent to be optimized away.
0x000000000000107e <+30>: mov -0x8(%rbp),%rcx # rcx = sz
0x0000000000001082 <+34>: mov %rsp,%rdi # rdi = rsp.
0x0000000000001085 <+37>: lea 0x10(%rcx),%rax # rax = sz + 1 + 15
0x0000000000001089 <+41>: mov %rax,%rdx # rdx = sz + 1 + 15
0x000000000000108c <+44>: and $0xfffffffffffff000,%rax # be mutilple of 4096
0x0000000000001092 <+50>: sub %rax,%rdi # rdi is the address of the array s
0x0000000000001095 <+53>: and $0xfffffffffffffff0,%rdx # be multiple of 16
0x0000000000001099 <+57>: mov %rdi,%rax # rax = &s
0x000000000000109c <+60>: cmp %rax,%rsp # if sz+16 is less than 4096,
0x000000000000109f <+63>: je 0x10b6 <main()+86> # then jump to main+86 for
# the stack is grown as page size for every iteration of the loop.
0x00000000000010a1 <+65>: sub $0x1000,%rsp # grow the stack.
0x00000000000010a8 <+72>: orq $0x0,0xff8(%rsp) # probe stack(???).
0x00000000000010b1 <+81>: cmp %rax,%rsp # if rsp isn't equal to rax,
0x00000000000010b4 <+84>: jne 0x10a1 <main()+65> # then loop.
0x00000000000010b6 <+86>: and $0xfff,%edx # be less than 4096
0x00000000000010bc <+92>: sub %rdx,%rsp # allocate the remainder.
0x00000000000010bf <+95>: test %rdx,%rdx # if the remainder is not zero,
0x00000000000010c2 <+98>: jne 0x10cc <main()+108> # then, jump to probe stack(?).
0x00000000000010c4 <+100>: movb $0x0,(%rsp,%rcx,1) # s[sz] = '\0'
0x00000000000010c8 <+104>: xor %eax,%eax # eax = 0.
0x00000000000010ca <+106>: leaveq # restore the previous stack frame.
0x00000000000010cb <+107>: retq # return 0;
0x00000000000010cc <+108>: orq $0x0,-0x8(%rsp,%rdx,1) # probe stack(??).
0x00000000000010d2 <+114>: jmp 0x10c4 <main()+100> # jump back.
End of assembler dump.
"https://nullprogram.com/blog/2019/10/27/"
says that first, -fomit-frame-pointer
is ignored because VLA have to track the stack-frame dynamically. Second, when -fstack-clash-protection
is enabled the compiler generates extra code to probe every pages of allocation in case one of those pages is a guard page, etc..
But in my disassembly code, I don't understand these lines:
# the stack is grown as page size for every iteration of the loop.
0x00000000000010a1 <+65>: sub $0x1000,%rsp # grow the stack.
0x00000000000010a8 <+72>: orq $0x0,0xff8(%rsp) # probe stack(???).
0x00000000000010b1 <+81>: cmp %rax,%rsp # if rsp isn't equal to rax,
0x00000000000010b4 <+84>: jne 0x10a1 <main()+65> # then loop.
What does "orq $0x0, 0xff8(%rsp)" mean??. and what is probing stack?
Usually in normal operation stack is accessed sequentially as it grows. OS places guard (marked as non-existing) page at the end of stack space, so when stack overflows process tries to write to protected page and it causes segmentation fault. But when stack grows by more than one page at once stack pointer can jump over guard page and overflow wouldn't be detected, out-of-stack data may be overwritten. Probing every allocated page process ensures stack is not overflowed or segmentation fault occurres on overflow.
With -fstack-clash-protection
and array larger than stack limit your program is terminated by segmentation fault:
$ ./stack-protected
16777216
Segmentation fault (core dumped)
With -fno-stack-clash-protection
it countinue to work:
$ ./stack-unprotected
16777216
$
but some data out of stack is corrupted.
orq $0x0, 0xff8(%rsp)
performs OR operation on 64-bit word in every page with value 0, i.e. writes to page without real data modification.