segmentation-fault x86-64 nasm thread-local-storage memory-segmentation

How to access segment register without linking libc.so?

I am attempting to code a simple stack canary in 64-bit assembly using NASM version 2.15.04 on Ubuntu 20.10. Executing the code below results in a segmentation fault when assembling and linking with the command nasm -felf64 canary.asm && ld canary.o.

            global  _start

            section .text
_start:     endbr64
            push    rbp                     ; Save base pointer
            mov     rbp, rsp                ; Set the stack pointer
            call    _func                   ; Call _func
            mov     rdi, rax                ; Save return value of _func in RDI
            mov     rax, 0x3c               ; Specify exit syscall 
            syscall                         ; Exit

_func:      endbr64
            push    rbp                     ; Save the base pointer
            mov     rbp, rsp                ; Set the stack pointer
            sub     rsp, 0x8                ; Adjust the stack pointer
            mov     rax,  qword fs:[0x28]   ; Get stack canary
            mov     qword [rbp - 0x8], rax  ; Save stack canary on the stack
            xor     eax, eax                ; Clear RAX
            mov     rax, 0x1                ; Specify write syscall
            mov     rdi, 0x1                ; Specify stdout
            mov     rsi, msg                ; Char* buffer to print
            mov     rdx, 0xd                ; Length of the buffer
            syscall                         ; Write msg
            mov     rax, qword [rbp - 0x8]  ; Retrieve the stack canary
            xor     rax, qword fs:[0x28]    ; Compare to original value    
            je      _return                 ; Jump to _return if canary matched original
            xor     eax, eax                ; Clear RAX
            mov     rax, 0x1                ; Specify write syscall 
            mov     rdi, 0x1                ; Specify stdout
            mov     rsi, stack_fail         ; Char* buffer to print
            mov     rdx, 0x18               ; Length of the buffer 
            syscall                         ; Write stack_fail
            mov     rax, 0x3c               ; Specify exit syscall
            mov     rax, 0x1                ; Specify error code 1    
            syscall                         ; Exit

_return:    xor     eax, eax                ; Set return value to 0
            add     rsp, 0x8                ; Reset stack pointer
            pop     rbp                     ; Get original base pointer
            ret                             ; Return 

            section .data
msg:        db      "Hello, World", 0xa, 0x0
stack_fail  db      "Stack smashing detected", 0xa, 0x0

Debugging with GDB shows that the segmentation fault happens on line 16: mov rax, qword fs:[0x28].

─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
     0x40101b <_func+4>        push   rbp
     0x40101c <_func+5>        mov    rbp, rsp
     0x40101f <_func+8>        sub    rsp, 0x8
 →   0x401023 <_func+12>       mov    rax, QWORD PTR fs:0x28
     0x40102c <_func+21>       mov    QWORD PTR [rbp-0x8], rax
     0x401030 <_func+25>       xor    eax, eax
     0x401032 <_func+27>       mov    eax, 0x1
     0x401037 <_func+32>       mov    edi, 0x1
     0x40103c <_func+37>       movabs rsi, 0x402000
─────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "a.out", stopped 0x401023 in _func (), reason: SIGSEGV

However assembling and dynamically linking with libc via nasm -felf64 canary.asm && ld canary.o -lc -dynamic-linker /usr/lib64/ld-linux-x86-64.so.2 results in execution succeeding, no longer causing a segmentation fault.

Using Radare2 to compare the final binaries shows that both versions assembled the problem instruction identically as:

0x00401023 64488b042528. mov rax, qword fs:[0x28]

GDB in both cases also shows that the FS register is 0x0000 at execution time for that instruction.

So the instruction bytes and the FS register are identical whether or not the binary is linked with libc and the code has no use of external symbols from libc. Why is it that linking with libc causes execution to succeed while not linking libc causes a segmentation fault? Is it possible and/or how would I implement this without linking libc?

NOTE: The relevance or need of a stack canary in this example is not the focus of the question.

Solution

Accessing a segment register is no problem, just mov eax, fs. But what you're trying to do is access thread-local storage at a small offset from the FS segment base, which libc init stuff will have asked the kernel to set up.

The simplest thing would be to just access your stack canary with a normal RIP-relative addressing mode, not relative to FS base, like GCC will do when targeting other ISAs. Only if you want to make it harder for some other exploit to reach the canary (and for its address to be separately randomizable) do you need TLS. (Or so library code can access it without the indirection of loading a pointer from the GOT, instead of only being efficient for code in the main executable.)

You can of course make the same system calls libc does to set up thread-local storage and use it, if you want to copy GCC's stack-canary code.

Fun fact: sub rax, qword fs:[0x28] is a more efficient way to check the canary than XOR - it can macro-fuse with the JCC into a single uop. That's why current GCC changed to using sub. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 - fixed in GCC10+.

My GCC bug report actually included self-contained microbenchmark code (to prove that sub can macro-fuse even with an FS: addressing mode).

Without libc in a static executable, it sets up the FS segment so its base address is the address of a buffer so [fs: 0x28] will work. This is a basic form of TLS.

global _start
_start:

cookie equ 12345
    mov  eax, 158       ; __NR_arch_prctl
    mov  edi, 0x1002    ; ARCH_SET_FS
    lea  rsi, [buf]
    syscall

    mov  qword [fs: 0x28], cookie

...


section .bss
buf:    resb 4096         ; fs.base will point at this buffer

If the kernel enabled wrfsbase for user-space use, you could use wrfsbase rsi instead of making a system call. I think the most recent Linux kernel (5.10) maybe has started using wrfsbase itself, but I don't know if it enables user-space use of it.

(It probably doesn't toggle FSGSBASE on/off every time it uses it, so kernel usage would mean user-space can use it; the fault conditions in the manual don't mention privilege level, only the CPUID feature bit and a bit in the CR4 control register. And only in 64-bit mode; it will #UD in other modes including compat mode.)