NASM x86_64 code on Apple Silicon Mac outputs 0.0 instead of correct floating-point result

I’m writing a simple NASM assembly program targeting x86_64 macOS (Apple Silicon M1). The code calculates the sum of 0.1 + 0.2 as a double and prints the result using printf. Here is the relevant code snippet:

section .data
    fmt db "0.1 + 0.2 = %f", 10, 0
    num1 dq 0.1
    num2 dq 0.2
    result dq 0.0

section .text
    extern _printf
    global _main

_main:
    fld qword [rel num1]
    fadd qword [rel num2]
    fstp qword [rel result]

    lea rdi, [rel fmt]
    movsd xmm0, [rel result]
    xor eax, eax
    call _printf

    ret

I assemble and link it with:

nasm -f macho64 main.asm -o main.o
clang -arch x86_64 main.o -o main
./main

The program runs but outputs:

0.1 + 0.2 = 0.000000

I also get this linker warning:

ld: warning: no platform load command found in 'main.o', assuming: macOS

I suspect the issue might be related to the Apple Silicon (ARM64) machine running x86_64 code or that I’m not passing parameters to printf correctly.

How can I properly write NASM code calling printf with a double argument on a Mac M1? Do I need to compile for ARM64 and how do I do that?

Outputs in debug mode. I used lidb for MacOS:

(lldb) breakpoint set --name main
Breakpoint 2: where = float_bug`main, address = 0x0000000100000f6f
(lldb) run
Process 31221 launched: '/Users/dmitroparhomenko/FlowyROS/float_bug' (x86_64)
warning: libobjc.A.dylib is being read from process memory. This indicates that LLDB could not read from the host's in-memory shared cache. This will likely reduce debugging performance.

Process 31221 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x0000000100000f6f float_bug`main
float_bug`main:
->  0x100000f6f <+0>:  movsd  0x1099(%rip), %xmm0 ; num1, xmm0 = mem[0],zero 
    0x100000f77 <+8>:  movsd  0x1099(%rip), %xmm1 ; num2, xmm1 = mem[0],zero 
    0x100000f7f <+16>: addsd  %xmm1, %xmm0
    0x100000f83 <+20>: movsd  %xmm0, 0x1095(%rip) ; result
Target 0: (float_bug) stopped.
(lldb) register read xmm0
    xmm0 = {0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00}
(lldb) expr *(double*)&$xmm0
(double) $0 = 0
(lldb)

Solution

The bug in your program

When calling a variadic function such as printf under the x86-64 SysV calling convention, al is expected to contain the number of xmm registers being used to pass arguments. You're setting it to zero, probably because you copied this from code which printed a string or something. But you are passing one floating point argument in xmm0, so you need to set al to 1 instead of 0.

Change xor eax, eax to mov eax, 1 and the program should work correctly.

General code review

It's a bit silly to use x87 instructions for floating point in this day and age, especially given that you have to move the result into an SSE xmm register afterward anyway. The modern and more sensible approach would be to do your computation with SSE registers and instructions, e.g. using addsd to add. So that part of the program could become simply

movsd xmm0, [rel num1]
addsd xmm0, [rel num2]

You should set eax to the desired exit status of your program before returning. In this case, it would make sense to use 0, indicating success.

Background info on x86-64 vs ARM64

Your M1 CPU uses the ARM64 architecture. Therefore, the x86-64 instructions in this code are not being executed directly by the CPU; instead, they're emulated using Rosetta2. So you can successfully build and run x86-64 code, much as if you actually had a physical x86-64 CPU, but you should be aware that it isn't running natively, and may be slow.

If you do want to write native ARM64 code instead, it's not just a matter of changing compilation settings; you have to rewrite the code from scratch. It's an entirely different architecture, with a different instruction set, register set, assembler syntax, etc. (For instance, instead of having sixteen 64-bit integer registers named rax, rbx, ..., you have 31 of them named x0, x1, ..., x30.) https://github.com/below/HelloSilicon is a tutorial you could look at.

In particular, nasm only supports x86 so you'll have to switch to a different assembler that supports ARM64. Most people use the one built into clang.