I use the Netwide Assembler as my assembler.
I am attempting to create a program printing the string Hello World
but I don’t succeed.
I put all arguments on the stack as the calling convention dictates.
The CC says the first argument needs to be placed in RCX
, the second one in RDX
, the third one in R8
, fourth in R9
and the fifth at [RSP + 32]
.
So I wrote this program:
section .data
phrase db 'Hello World' ; create new var with size of db(Double World)
charsWritten dd 0 ; set the var up is as things which will be printed out
section .text
global _start
extern ExitProcess ; import the function ExitProcess to return 0
extern WriteConsoleA ; import the command of printing out
extern GetStdHandle ; descriptor
_start:
mov rcx, -10 ; set 1st argument
call GetStdHandle ; call the result was saved in rax
mov r8, rax ; save result
mov rcx, r8 ; set the saved result up as 1st argument
mov rdx, phrase ; set the pointer as 2nd argument
mov r8, 11 ; set r8 as 3rd argument with data 11 (length)
mov r9, charsWritten ; set the pointer of printed words out as 4th arg.
mov qword [rsp + 32], 0 ; set zero into stack as 5th argument
call WriteConsoleA ; call printing out
mov rcx, 0 ; its first argument of returning 0
call ExitProcess ; call return with 0 as first argument of it
Yet this program seemingly doesn’t do anything. No mistakes. I tried to set up this .exe
file. The exit status is 0
(success).
C:\Users\Мила ПК\Desktop>nasm -f win64 test.asm -o test.o
C:\Users\Мила ПК\Desktop>ld -o test.exe test.o -lkernel32
C:\Users\Мила ПК\Desktop>test.exe
C:\Users\Мила ПК\Desktop>echo %ERRORLEVEL%
0
Other details I deem relevant to the question:
datum | value |
---|---|
operating system | Windows 11 Pro |
system | 64‑bit system |
processor | Intel i7 |
Kai Burghardt does not recommend using Microsoft Windows. This answer may contain errors.
You may be tempted starting with a Hello, World program, but actually your very first W64 program written in assembly should be this:
global _start
section .text
_start:
ret
In UNIX‑like environments, the loader usually jmp
s to the entry point.
However, in Winblows the entry point is essentially call
ed† placing a return address on top of the stack you can ret
urn to.
As a sidebar, an infinite loop program can be written like this:
global _start
section .text
_start:
jmp rax ; rax containing the address of the entry point
Now ret
urning from _start
is not guaranteed to work properly because it is apparently not officially documented.
The official way uses ExitProcess
.
Let’s rewrite the first program using ExitProcess
instead.
First of all, before making any call
the ABI expects you to align the stack on 16‑byte boundaries.
That means the stack pointer written as a hexadecimal number should end in 0
.
Since the stack grows downward (toward numerically smaller addresses) a straightforward way of achieving 16‑byte alignment is:
and RSP, 0xFFFFFFFFFFFFFFF0
It is fair to assume that an “operating system” adheres to its own calling conventions.
When our program is call
ed† the stack has been aligned, too.
However, the call
† puts a return address on top of the stack destroying the alignment (implicit push(RIP)
by call
).
Therefore when our program commences with the first instruction following _start
, the stack pointer (written as a hexadecimal number) ends in 8
(or expressed arithmetically RSP mod 16 = 8
is true
).
Thus the above and
effectively does the same as a sub RSP, 8
.
As we do not intend to use the return address, we can simply discard the return address by popping the stack:
pop RAX ; discard return address, re‑align stack to 16 B
The Wix64 ABI asks you to (always) allocate 4 × 8 bytes of shadow space on the stack.
This space may be used by the callee (the function you call
) to spill the first four arguments RCX
, RDX
, R8
and R9
.
Spilling means storing the datum somewhere else so the register is freed up for a different use; the original argument value can be retrieved later from said shadow space.
global _start ; export _start
extern ExitProcess ; tell NASM the reference is resolved later
section .text
_start:
pop RAX ; discard return address, re‑align stack to 16 B
exit:
xor ECX, ECX ; ECX ≔ EXIT_SUCCESS
sub RSP, 4 * 8 ; reserve 32 bytes of shadow space
call ExitProcess ; 32 mod 16 = 0, so stack pointer still aligned
It is your duty to clean up the stack after a call
has finished.
While ExitProcess
is (ideally) non‑returning, GetStdHandle
does return, so we reverse the effects of sub RSP, 32
:
global _start
extern GetStdHandle, ExitProcess
STD_OUTPUT_HANDLE equ -11 ; a more meaningful constant instead of −11
section .text
_start:
pop RAX ; discard return address, re‑align stack to 16 B
get_output:
mov ECX, STD_OUTPUT_HANDLE ; retrieve HANDLE for standard output
sub RSP, 4 * 8 ; reserve 32 bytes of shadow space
call GetStdHandle ; RAX ≔ GetStdHandle(STD_OUTPUT_HANDLE)
add RSP, 4 * 8 ; unreserve 32 bytes of shadow space
exit:
xor ECX, ECX ; ECX ≔ EXIT_SUCCESS
sub RSP, 4 * 8 ; reserve 32 bytes of shadow space
call ExitProcess
You may notice half the instructions are just moving the stack pointer back and forth. You can cut down the juggling, but I recommend postponing that step until your algorithm works as expected, until you do not want to insert or remove anything anymore.
As you already know the first four parameters are passed in registers. Any additional parameters are pushed on the stack from right to left. While doing so you must take account of the shadow space and alignment.
Also, it is preferred to let the assembler calculate the string length instead of doing it by yourself (→ hello_world_length
).
global _start
extern GetStdHandle, WriteConsoleA, ExitProcess
default rel ; if not specified use RIP‑relative addressing
STD_OUTPUT_HANDLE equ -11 ; a more meaningful constant instead of −11
; --- initialized data ---------------------------------------------------------
section .data
hello_world:
db 'Hello, world!', `\r\n`
hello_world_length equ $ - hello_world
; --- uninitialized data -------------------------------------------------------
section .bss
result:
resd 1
; --- executable text ----------------------------------------------------------
section .text
_start:
pop RAX ; discard return address, re‑align stack to 16 B
get_output:
mov ECX, STD_OUTPUT_HANDLE ; retrieve HANDLE for standard output
sub RSP, 4 * 8 ; reserve 32 bytes of shadow space
call GetStdHandle ; RAX ≔ GetStdHandle(STD_OUTPUT_HANDLE)
add RSP, 4 * 8 ; unreserve 32 bytes of shadow space
write_hello_world:
add RSP, 1 * 8 ; keep alignment after push
mov ECX, EAX ; write destination
LEA RDX, [hello_world] ; source string start address
mov R8, hello_world_length ; length of source string
LEA R9, [result] ; destination for number of bytes written
push 0 ; reserved, must be zero
add RSP, 4 * 8 ; reserve 32 bytes of shadow space
call WriteConsoleA
sub RSP, 6 * 8 ; reclaim 48 bytes of stack space
exit:
xor ECX, ECX ; ECX ≔ EXIT_SUCCESS
sub RSP, 4 * 8 ; reserve 32 bytes of shadow space
call ExitProcess
; vim: set filetype=nasm:
Apparently, even on Win64 the size of a HANDLE
is 32 bits so using 32‑bit registers is sufficient.
The used I/O functions may fail so you should check the return values.
GetStdHandle
may return INVALID_HANDLE_VALUE
which you should not pass on to WriteConsoleA
.
†: For explanation purposes we presume it was call
ed. An intervening stub jmp
ing to _start
can be ignored.