winapinasmwin64

Hello World! using WinAPI does not do anything


I use the Netwide Assembler as my assembler. I am attempting to create a program printing the string Hello World but I don’t succeed.

I put all arguments on the stack as the calling convention dictates. The CC says the first argument needs to be placed in RCX, the second one in RDX, the third one in R8, fourth in R9 and the fifth at [RSP + 32].

So I wrote this program:

section .data
    phrase db 'Hello World'     ; create new var with size of db(Double World)
    charsWritten dd 0           ; set the var up is as things which will be printed out
section .text
    global _start
    extern ExitProcess          ; import the function ExitProcess to return 0
    extern WriteConsoleA        ; import the command of printing out
    extern GetStdHandle         ; descriptor
    _start:
        mov  rcx, -10           ; set 1st argument
        call GetStdHandle       ; call the result was saved in rax
        mov r8, rax             ; save result

        mov rcx, r8             ; set the saved result up as 1st argument
        mov rdx, phrase         ; set the pointer as 2nd argument
        mov r8, 11              ; set r8 as 3rd argument with data 11 (length)
        mov r9, charsWritten    ; set the pointer of printed words out as 4th arg.
        mov qword [rsp + 32], 0 ; set zero into stack as 5th argument
        call WriteConsoleA      ; call printing out

        mov rcx, 0              ; its first argument of returning 0
        call ExitProcess        ; call return with 0 as first argument of it

Yet this program seemingly doesn’t do anything. No mistakes. I tried to set up this .exe file. The exit status is 0 (success).

C:\Users\Мила ПК\Desktop>nasm -f win64 test.asm -o test.o

C:\Users\Мила ПК\Desktop>ld -o test.exe test.o -lkernel32
C:\Users\Мила ПК\Desktop>test.exe
C:\Users\Мила ПК\Desktop>echo %ERRORLEVEL%
0

Other details I deem relevant to the question:

datum value
operating system Windows 11 Pro
system 64‑bit system
processor Intel i7

Solution

  • Kai Burghardt does not recommend using Microsoft Windows. This answer may contain errors.

    Nop

    You may be tempted starting with a Hello, World program, but actually your very first W64 program written in assembly should be this:

    global _start
    section .text
    _start:
        ret
    

    In UNIX‑like environments, the loader usually jmps to the entry point. However, in Winblows the entry point is essentially called placing a return address on top of the stack you can return to.


    As a sidebar, an infinite loop program can be written like this:

    global _start
    section .text
    _start:
        jmp rax   ; rax containing the address of the entry point
    

    Exit

    Now returning from _start is not guaranteed to work properly because it is apparently not officially documented. The official way uses ExitProcess. Let’s rewrite the first program using ExitProcess instead.

    Stack alignment

    First of all, before making any call the ABI expects you to align the stack on 16­‑byte boundaries. That means the stack pointer written as a hexadecimal number should end in 0. Since the stack grows downward (toward numerically smaller addresses) a straightforward way of achieving 16‑byte alignment is:

        and RSP, 0xFFFFFFFFFFFFFFF0
    

    It is fair to assume that an “operating system” adheres to its own calling conventions. When our program is called the stack has been aligned, too. However, the call puts a return address on top of the stack destroying the alignment (implicit push(RIP) by call).

    Therefore when our program commences with the first instruction following _start, the stack pointer (written as a hexadecimal number) ends in 8 (or expressed arithmetically RSP mod 16 = 8 is true). Thus the above and effectively does the same as a sub RSP, 8.

    As we do not intend to use the return address, we can simply discard the return address by popping the stack:

        pop RAX                     ; discard return address, re‑align stack to 16 B
    

    Dark space

    The Wix64 ABI asks you to (always) allocate 4 × 8 bytes of shadow space on the stack. This space may be used by the callee (the function you call) to spill the first four arguments RCX, RDX, R8 and R9. Spilling means storing the datum somewhere else so the register is freed up for a different use; the original argument value can be retrieved later from said shadow space.

    global _start                   ; export _start
    extern ExitProcess              ; tell NASM the reference is resolved later
    
    section .text
    _start:
        pop RAX                     ; discard return address, re‑align stack to 16 B
        
    exit:
        xor ECX, ECX                ; ECX ≔ EXIT_SUCCESS
        sub RSP, 4 * 8              ; reserve 32 bytes of shadow space
        call ExitProcess            ; 32 mod 16 = 0, so stack pointer still aligned
    

    Get output

    It is your duty to clean up the stack after a call has finished. While ExitProcess is (ideally) non‑returning, GetStdHandle does return, so we reverse the effects of sub RSP, 32:

    global _start
    extern GetStdHandle, ExitProcess
    
    STD_OUTPUT_HANDLE   equ  -11    ; a more meaningful constant instead of −11
    
    section .text
    _start:
        pop RAX                     ; discard return address, re‑align stack to 16 B
    
    get_output: 
        mov ECX, STD_OUTPUT_HANDLE  ; retrieve HANDLE for standard output
        sub RSP, 4 * 8              ; reserve 32 bytes of shadow space
        call GetStdHandle           ; RAX ≔ GetStdHandle(STD_OUTPUT_HANDLE)
        add RSP, 4 * 8              ; unreserve 32 bytes of shadow space
        
    exit:
        xor ECX, ECX                ; ECX ≔ EXIT_SUCCESS
        sub RSP, 4 * 8              ; reserve 32 bytes of shadow space
        call ExitProcess
    

    You may notice half the instructions are just moving the stack pointer back and forth. You can cut down the juggling, but I recommend postponing that step until your algorithm works as expected, until you do not want to insert or remove anything anymore.

    Write output

    As you already know the first four parameters are passed in registers. Any additional parameters are pushed on the stack from right to left. While doing so you must take account of the shadow space and alignment.

    Also, it is preferred to let the assembler calculate the string length instead of doing it by yourself (→ hello_world_length).

    global _start
    extern GetStdHandle, WriteConsoleA, ExitProcess
    default rel                     ; if not specified use RIP‑relative addressing
    
    STD_OUTPUT_HANDLE   equ  -11    ; a more meaningful constant instead of −11
    
    ; --- initialized data ---------------------------------------------------------
    section .data
    hello_world:
        db 'Hello, world!', `\r\n`
    hello_world_length  equ  $ - hello_world
    
    ; --- uninitialized data -------------------------------------------------------
    section .bss
    result:
        resd 1
    
    ; --- executable text ----------------------------------------------------------
    section .text
    _start:
        pop RAX                     ; discard return address, re‑align stack to 16 B
        
    get_output:
        mov ECX, STD_OUTPUT_HANDLE  ; retrieve HANDLE for standard output
        sub RSP, 4 * 8              ; reserve 32 bytes of shadow space
        call GetStdHandle           ; RAX ≔ GetStdHandle(STD_OUTPUT_HANDLE)
        add RSP, 4 * 8              ; unreserve 32 bytes of shadow space
    
    write_hello_world:
        add RSP, 1 * 8              ; keep alignment after push
        
        mov ECX, EAX                ; write destination
        LEA RDX, [hello_world]      ; source string start address
        mov R8, hello_world_length  ; length of source string
        LEA R9, [result]            ; destination for number of bytes written
        push 0                      ; reserved, must be zero
        
        add RSP, 4 * 8              ; reserve 32 bytes of shadow space
        call WriteConsoleA
        sub RSP, 6 * 8              ; reclaim 48 bytes of stack space
        
    exit:
        xor ECX, ECX                ; ECX ≔ EXIT_SUCCESS
        sub RSP, 4 * 8              ; reserve 32 bytes of shadow space
        call ExitProcess
    
    ; vim: set filetype=nasm:
    

    Apparently, even on Win64 the size of a HANDLE is 32 bits so using 32‑bit registers is sufficient.

    Errors

    The used I/O functions may fail so you should check the return values. GetStdHandle may return INVALID_HANDLE_VALUE which you should not pass on to WriteConsoleA.

    : For explanation purposes we presume it was called. An intervening stub jmping to _start can be ignored.