I am looking to take two double
variables, pass them to a function, subtract them, return them and print the result.
My example:
#include <stdio.h>
extern double dev(double a, double b);
int main() {
double a = 10.2;
double b = 3.6;
double result = dev(a, b);
printf("%.2f", result);
return 0;
}
Asm:
section .text
global dev
dev:
movsd xmm0, [rdi]
movsd xmm1, [rsi]
subsd xmm0, xmm1
ret
It compiles, however when I attempt to run the code I get a Segmentation fault (core dumped)
I believe I have identified the problem, I just am not sure how to fix it.
double
is a 64 bit value. Whereas xmm
is a 128 bit register. It would seem logical (at least to me) I am attempting to return 128 bits, however I see no way of doing it. I tried movsd rax xmm0
but same issue
In this case, one approach is to look at how the compiler implements a similar function: https://godbolt.org/z/zv8cY3GfG (no optimization) or https://godbolt.org/z/zn8ecKT1f (optimization level "O2").
The first one gives the following (making it apparent that the parameters are passed in registers xmm0
and xmm1
):
dev:
push rbp
mov rbp, rsp
movsd QWORD PTR [rbp-8], xmm0
movsd QWORD PTR [rbp-16], xmm1
movsd xmm0, QWORD PTR [rbp-8]
subsd xmm0, QWORD PTR [rbp-16]
pop rbp
ret
The optimized version is as follows:
dev:
subsd xmm0, xmm1
ret
Which illustrates one reason why it perhaps is increasingly uncommon to see embedded assembly: compilers are getting increasingly good at creating optimized assembler code and assembler code is getting more and more optimized towards compilers making it difficult for humans to create good assembler code. The effort and restriction to a specific target CPU is seldomly worth it.