This is the code I am playing with right now:
# file-name: test.s
# 64-bit GNU as source code.
.global main
.section .text
main:
lea message, %rdi
push %rdi
call puts
lea message, %rdi
push %rdi
call printf
push $0
call _exit
.section .data
message: .asciz "Hello, World!"
Compilation instructions: gcc test.s -o test
Revision 1:
.global main
.section .text
main:
lea message, %rdi
call puts
lea message, %rdi
call printf
mov $0, %rdi
call _exit
.section .data
message: .asciz "Hello, World!"
Final Revision (Works):
.global main
.section .text
main:
lea message, %rdi
call puts
mov $0, %rax
lea message, %rdi
call printf
# flush stdout buffer.
mov $0, %rdi
call fflush
# put newline to offset PS1 prompt when the program ends.
# - ironically, doing this makes the flush above redundant and can be removed.
# - The call to fflush is retained for display and
# to keep the block self contained.
mov $'\n', %rdi
call putchar
mov $0, %rdi
call _exit
.section .data
message: .asciz "Hello, World!"
I am struggling to understand why the call to puts succeeds but the call to printf results in a Segmentation fault.
Can somebody explain this behavior and how printf is intended to be called?
Thanks ahead of time.
Summary:
puts
appends a newline implicitly, and stdout is line-buffered (by default on terminals). So the text from printf
may just be sitting there in the buffer. Your call to _exit(2)
doesn't flush buffers, because it's the exit_group(2)
system call, not the exit(3)
library function. (See my version of your code below).
Your call to printf(3)
is also not quite right, because you didn't zero %al
before calling a var-args function with no FP arguments. (Good catch @RossRidge, I missed that). xor %eax,%eax
is the best way to do that. %al
will be non-zero (from puts()
's return value), which is presumably why printf segfaults. I tested on my system, and printf doesn't seem to mind when the stack is misaligned (which it is, since you pushed twice before calling it, unlike puts).
(Update: newer builds of glibc will segfault in printf with misaligned RSP even with AL=0, since gcc makes more use of SSE to load or store 16 bytes at a time, and of course takes advantage of the ABI-guaranteed alignment. See an example from scanf and how to avoid it)
Also, you don't need any push
instructions in that code. The first arg goes in %rdi
. The first 6 integer args go in registers, the 7th and later go on the stack. You're also neglecting to pop the stack after the functions return, which only works because your function never tries to return after messing up the stack.
The ABI does require aligning the stack by 16B, and a push
is one way to do that, which can actually be more efficient than sub $8, %rsp
on recent Intel CPUs with a stack engine, and it takes fewer bytes. (See the x86-64 SysV ABI, and other links in the x86 tag wiki).
Improved code:
.text
.global main
main:
lea message(%rip), %rdi # or mov $message, %edi if you don't need the code to be position-independent: default code model has all labels in the low 2G, so you can use shorter 32bit instructions
push %rbx # align the stack for another call
mov %rdi, %rbx # save for later
call puts
xor %eax,%eax # %al = 0 = number of FP args for var-args functions
mov %rbx, %rdi # or mov %ebx, %edi in a non-PIE executable, since the pointer is known to be pointing to static storage which will be in the low 2GiB
call printf
# optionally putchar a '\n', or include it in the string you pass to printf
#xor %edi,%edi # exit with 0 status
#call exit # exit(3) does an fflush and other cleanup
pop %rbx # restore caller's rbx, and restore the stack
xor %eax,%eax # return 0 from main is equivalent to exit(0)
ret
.section .rodata # constants should go in .rodata
message: .asciz "Hello, World!"
lea message(%rip), %rdi
is cheap, and doing it twice is fewer instructions than the two mov
instructions to make use of %rbx
. But since we needed to adjust the stack by 8B to strictly follow the ABI's 16B-aligned guarantee, we might as well do it by saving a call-preserved register. mov reg,reg
is very cheap and small, so taking advantage of the call-preserved reg is natural.
Modern distros now default to making PIE executables so pointers are 64-bit even for static storage. You need RIP-relative LEA, and need 64-bit operand-size to copy them. See How to load address of function or label into register for that vs. mov $message, %edi
in a non-PIE. There's never a reason to use lea message, %rdi
with a 32-bit absolute addressing mode, only ever RIP-relative LEA or mov-immediate.