i use NASM as assembler so, i try to make program print out the word "hello world" but it doesn't i put all arguments as calling convention says. As it says, first argument is rcx, second one is rdx, third one is r8, forth is r9, and fifth is [rsp + 32]
so i wrote this script:
section .data
phrase db 'Hello World' ; creating new var with size of db(Double World)
charsWritten dd 0 ; setting the var up is as things which will be printed out
section .text
global _start
extern ExitProcess ; importing the function ExitProcess to return 0
extern WriteConsoleA ; imporing the command of printing out
extern GetStdHandle ; descriptor
_start:
mov rcx, -10 ; setting 1-st argument
call GetStdHandle ; calling the result was saved in rax
mov r8, rax ; saving result
mov rcx, r8 ; setting the saved result up as 1-st argument
mov rdx, phrase ; setting the pointer as 2-st argument
mov r8, 11 ; setting r8 as 3-rd argument with data 11(lenght)
mov r9, charsWritten ; setting the pointer of printed words out as 4-st argument
mov qword [rsp + 32], 0 ; setting zero into stack as 5-th argument
call WriteConsoleA ; calling printing out
mov rcx, 0 ; its first argument of returing 0
call ExitProcess ; calling return with 0 as first argument of it
this script doesn't do anything. No mistakes. I tried to setup this .exe file. The error level of executing is 0
os: windows 11 pro
system: 64 bit system
processor: intel i7
the result of executing: result
Kai Burghardt does not recommend using Microsoft Windows. This answer may contain errors.
You may be tempted starting with a Hello, World program, but actually your very first W64 program written in assembly should be this:
global _start
section .text
_start:
ret
In UNIX‑like environments, the loader usually jmp
s to the entry point.
However, in Winblows the entry point is essentially call
ed† placing a return address on top of the stack you can ret
urn to.
As a sidebar, an infinite loop program can be written like this:
global _start
section .text
_start:
jmp rax ; rax containing the address of the entry point
Now ret
urning from _start
is not guaranteed to work properly because it is apparently not officially documented.
The official way uses ExitProcess
.
Let’s rewrite the first program using ExitProcess
instead.
First of all, before making any call
the ABI expects you to align the stack on 16‑byte boundaries.
That means the stack pointer written as a hexadecimal number should end in 0
.
Since the stack grows downward (toward numerically smaller addresses) a straightforward way of achieving 16‑byte alignment is:
and RSP, 0xFFFFFFFFFFFFFFF0
It is fair to assume that an “operating system” adheres to its own calling conventions.
When our program is call
ed† the stack has been aligned, too.
However, the call
† puts a return address on top of the stack destroying the alignment (implicit push(RIP)
by call
).
Therefore when our program commences with the first instruction following _start
, the stack pointer (written as a hexadecimal number) ends in 8
(or expressed arithmetically RSP mod 16 = 8
is true
).
Thus the above and
effectively does the same as a sub RSP, 8
.
As we do not intend to use the return address, we can simply discard the return address by popping the stack:
pop RAX ; discard return address, re‑align stack to 16 B
The Wix64 ABI asks you to (always) allocate 4 × 8 bytes of shadow space on the stack.
This space may be used by the callee (the function you call
) to spill the first four arguments RCX
, RDX
, R8
and R9
.
Spilling means storing the datum somewhere else so the register is freed up for a different use; the original argument value can be retrieved later from said shadow space.
global _start ; export _start
extern ExitProcess ; tell NASM the reference is resolved later
section .text
_start:
pop RAX ; discard return address, re‑align stack to 16 B
exit:
xor ECX, ECX ; ECX ≔ EXIT_SUCCESS
sub RSP, 4 * 8 ; reserve 32 bytes of shadow space
call ExitProcess ; 32 mod 16 = 0, so stack pointer still aligned
It is your duty to clean up the stack after a call
has finished.
While ExitProcess
is (ideally) non‑returning, GetStdHandle
does return, so we reverse the effects of sub RSP, 32
:
global _start
extern GetStdHandle, ExitProcess
STD_OUTPUT_HANDLE equ -11 ; a more meaningful constant instead of −11
section .text
_start:
pop RAX ; discard return address, re‑align stack to 16 B
get_output:
mov ECX, STD_OUTPUT_HANDLE ; retrieve HANDLE for standard output
sub RSP, 4 * 8 ; reserve 32 bytes of shadow space
call GetStdHandle ; RAX ≔ GetStdHandle(STD_OUTPUT_HANDLE)
add RSP, 4 * 8 ; unreserve 32 bytes of shadow space
exit:
xor ECX, ECX ; ECX ≔ EXIT_SUCCESS
sub RSP, 4 * 8 ; reserve 32 bytes of shadow space
call ExitProcess
You may notice half the instructions are just moving the stack pointer back and forth. You can cut down the juggling, but I recommend postponing that step until your algorithm works as expected, until you do not want to insert or remove anything anymore.
As you already know the first four parameters are passed in registers. Any additional parameters are pushed on the stack from right to left. While doing so you must take account of the shadow space and alignment.
Also, it is preferred to let the assembler calculate the string length instead of doing it by yourself (→ hello_world_length
).
global _start
extern GetStdHandle, WriteConsoleA, ExitProcess
default rel ; if not specified use RIP‑relative addressing
STD_OUTPUT_HANDLE equ -11 ; a more meaningful constant instead of −11
; --- initialized data ---------------------------------------------------------
section .data
hello_world:
db 'Hello, world!', `\r\n`
hello_world_length equ $ - hello_world
; --- uninitialized data -------------------------------------------------------
section .bss
result:
resd 1
; --- executable text ----------------------------------------------------------
section .text
_start:
pop RAX ; discard return address, re‑align stack to 16 B
get_output:
mov ECX, STD_OUTPUT_HANDLE ; retrieve HANDLE for standard output
sub RSP, 4 * 8 ; reserve 32 bytes of shadow space
call GetStdHandle ; RAX ≔ GetStdHandle(STD_OUTPUT_HANDLE)
add RSP, 4 * 8 ; unreserve 32 bytes of shadow space
write_hello_world:
add RSP, 1 * 8 ; keep alignment after push
mov ECX, EAX ; write destination
LEA RDX, [hello_world] ; source string start address
mov R8, hello_world_length ; length of source string
LEA R9, [result] ; destination for number of bytes written
push 0 ; reserved, must be zero
add RSP, 4 * 8 ; reserve 32 bytes of shadow space
call WriteConsoleA
sub RSP, 6 * 8 ; reclaim 48 bytes of stack space
exit:
xor ECX, ECX ; ECX ≔ EXIT_SUCCESS
sub RSP, 4 * 8 ; reserve 32 bytes of shadow space
call ExitProcess
; vim: set filetype=nasm:
Apparently, even on Win64 the size of a HANDLE
is 32 bits so using 32‑bit registers is sufficient.
The used I/O functions may fail so you should check the return values.
GetStdHandle
may return INVALID_HANDLE_VALUE
which you should not pass on to WriteConsoleA
.
†: For explanation purposes we presume it was call
ed. An intervening stub jmp
ing to _start
can be ignored.