I am attempting to code a simple stack canary in 64-bit assembly using NASM version 2.15.04 on Ubuntu 20.10. Executing the code below results in a segmentation fault when assembling and linking with the command nasm -felf64 canary.asm && ld canary.o
.
global _start
section .text
_start: endbr64
push rbp ; Save base pointer
mov rbp, rsp ; Set the stack pointer
call _func ; Call _func
mov rdi, rax ; Save return value of _func in RDI
mov rax, 0x3c ; Specify exit syscall
syscall ; Exit
_func: endbr64
push rbp ; Save the base pointer
mov rbp, rsp ; Set the stack pointer
sub rsp, 0x8 ; Adjust the stack pointer
mov rax, qword fs:[0x28] ; Get stack canary
mov qword [rbp - 0x8], rax ; Save stack canary on the stack
xor eax, eax ; Clear RAX
mov rax, 0x1 ; Specify write syscall
mov rdi, 0x1 ; Specify stdout
mov rsi, msg ; Char* buffer to print
mov rdx, 0xd ; Length of the buffer
syscall ; Write msg
mov rax, qword [rbp - 0x8] ; Retrieve the stack canary
xor rax, qword fs:[0x28] ; Compare to original value
je _return ; Jump to _return if canary matched original
xor eax, eax ; Clear RAX
mov rax, 0x1 ; Specify write syscall
mov rdi, 0x1 ; Specify stdout
mov rsi, stack_fail ; Char* buffer to print
mov rdx, 0x18 ; Length of the buffer
syscall ; Write stack_fail
mov rax, 0x3c ; Specify exit syscall
mov rax, 0x1 ; Specify error code 1
syscall ; Exit
_return: xor eax, eax ; Set return value to 0
add rsp, 0x8 ; Reset stack pointer
pop rbp ; Get original base pointer
ret ; Return
section .data
msg: db "Hello, World", 0xa, 0x0
stack_fail db "Stack smashing detected", 0xa, 0x0
Debugging with GDB shows that the segmentation fault happens on line 16: mov rax, qword fs:[0x28]
.
─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
0x40101b <_func+4> push rbp
0x40101c <_func+5> mov rbp, rsp
0x40101f <_func+8> sub rsp, 0x8
→ 0x401023 <_func+12> mov rax, QWORD PTR fs:0x28
0x40102c <_func+21> mov QWORD PTR [rbp-0x8], rax
0x401030 <_func+25> xor eax, eax
0x401032 <_func+27> mov eax, 0x1
0x401037 <_func+32> mov edi, 0x1
0x40103c <_func+37> movabs rsi, 0x402000
─────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "a.out", stopped 0x401023 in _func (), reason: SIGSEGV
However assembling and dynamically linking with libc via nasm -felf64 canary.asm && ld canary.o -lc -dynamic-linker /usr/lib64/ld-linux-x86-64.so.2
results in execution succeeding, no longer causing a segmentation fault.
Using Radare2 to compare the final binaries shows that both versions assembled the problem instruction identically as:
0x00401023 64488b042528. mov rax, qword fs:[0x28]
GDB in both cases also shows that the FS register is 0x0000 at execution time for that instruction.
So the instruction bytes and the FS register are identical whether or not the binary is linked with libc and the code has no use of external symbols from libc. Why is it that linking with libc causes execution to succeed while not linking libc causes a segmentation fault? Is it possible and/or how would I implement this without linking libc?
NOTE: The relevance or need of a stack canary in this example is not the focus of the question.
Accessing a segment register is no problem, just mov eax, fs
. But what you're trying to do is access thread-local storage at a small offset from the FS segment base, which libc init stuff will have asked the kernel to set up.
The simplest thing would be to just access your stack canary with a normal RIP-relative addressing mode, not relative to FS base, like GCC will do when targeting other ISAs. Only if you want to make it harder for some other exploit to reach the canary (and for its address to be separately randomizable) do you need TLS. (Or so library code can access it without the indirection of loading a pointer from the GOT, instead of only being efficient for code in the main executable.)
You can of course make the same system calls libc does to set up thread-local storage and use it, if you want to copy GCC's stack-canary code.
Fun fact: sub rax, qword fs:[0x28]
is a more efficient way to check the canary than XOR - it can macro-fuse with the JCC into a single uop. That's why current GCC changed to using sub
. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 - fixed in GCC10+.
My GCC bug report actually included self-contained microbenchmark code (to prove that sub
can macro-fuse even with an FS: addressing mode).
Without libc in a static executable, it sets up the FS segment so its base address is the address of a buffer so [fs: 0x28]
will work. This is a basic form of TLS.
global _start
_start:
cookie equ 12345
mov eax, 158 ; __NR_arch_prctl
mov edi, 0x1002 ; ARCH_SET_FS
lea rsi, [buf]
syscall
mov qword [fs: 0x28], cookie
...
section .bss
buf: resb 4096 ; fs.base will point at this buffer
If the kernel enabled wrfsbase
for user-space use, you could use wrfsbase rsi
instead of making a system call. I think the most recent Linux kernel (5.10) maybe has started using wrfsbase
itself, but I don't know if it enables user-space use of it.
(It probably doesn't toggle FSGSBASE on/off every time it uses it, so kernel usage would mean user-space can use it; the fault conditions in the manual don't mention privilege level, only the CPUID feature bit and a bit in the CR4 control register. And only in 64-bit mode; it will #UD in other modes including compat mode.)