I am trying to familiarise myself with gnu inline assembly. I wrote a single line inline asm to reinterpret int as float. While this prints the correct result, I am wondering whether this is correct.
#include <stdio.h>
int main() {
int x = 0x80000000; // -0.0
float y;
asm("vmovd %0, %%xmm0"
: "=r"(y)
: "r"(x));
printf("%f\n", y);
}
I am a beginner in assembly.
Just for the record, inline asm is a bad way to type-pun aka bit_cast
except as a learning exercise for inline asm syntax. Just use memcpy
. (https://gcc.gnu.org/wiki/DontUseInlineAsm).
And if you do use inline asm, generally don't write your own move instructions; use the power of operand constraints to tell the compiler where you want the input and where your asm leave the output, and let it invent the data movement.
I am wondering whether this is correct.
No, "=r"
lets the compiler pick its choice of any integer register for that output operand, which will vary depending on the surrounding code and optimization options. You're writing XMM0 without having told the compiler about it through an operand or clobber.
%0
is the first operand, the output "=r"
, and you're using it as the source for vmovd
. (This is AT&T syntax, so destination is on the right.) So it' actually %1
(the source operand) you aren't referencing. The compiler will often pick the same GPR for an input and output operand (if you don't specify "=&r"
to make an early-clobber output, for cases where an output is written before the last time all inputs are read.)
If you put %0
and %1
in the format string, e.g. as a comment, you can look at the compiler's asm output and see what it picked for that operand.
float buggy_foo() {
int x = 0x80000000; // -0.0
float y;
asm("vmovd %0, %%xmm0 # op0 picked %0 op1 picked %1"
: "=r"(y)
: "r"(x));
return y; // will be in XMM0 per the calling convention
}
From the asm output with GCC -O3 targeting Linux (on Godbolt):
buggy_foo:
movl $-2147483648, %eax # materialize x in EAX
# from your asm statement
vmovd %eax, %xmm0 # op0 was %eax op1 was %eax
# end of your asm statement
movd %eax, %xmm0 # compiler-generated copy of y to retval reg (XMM0 for a float)
ret
Keeping the function minimal reduces the noise we have to sift through to find exactly what GCC thought was necessary just for the asm statement itself. return
of a float in x86-64 needs it in an XMM register, and can use a float
directly without needing to convert it to double
like for printf
.
If we compile your main
using this asm statement, we see why it happened to work there, too, since x
and y
again pick the same register (EAX).
movl $-2147483648, %eax
pxor %xmm0, %xmm0
# from your asm statement. I compiled without -march=x86-64-v3 so the compiler-generated insns are legacy-SSE
vmovd %eax, %xmm0 # op0 was %eax op1 was %eax
movd %eax, %xmm1 # compiler-generated
leaq .LC0(%rip), %rdi
movl $1, %eax
cvtss2sd %xmm1, %xmm0 # this is why GCC used pxor to zero XMM0 earlier
call *printf@GOTPCREL(%rip)
...
You can do asm("" : "=rx"(y) : "0"(x))
to ask for the input in the same register as the output, and give the compiler a choice of integer (r) or vector (x) registers. That's like "+r"(y)
but allowing a different variable of a different type for the input half.
That avoids ever materializing the integer in an general-purpose integer register if the compiler doesn't think that's optimal. FP constants are often just loaded from .rodata
.
Although since GCC 12 or 13 or so, GCC has preferred mov
-immediate to a GPR and vmovd
plus vpbroadcastd
for vector constants with duplicate elements, instead of loading them from static storage. But for x
as an input to an asm statement, even older GCC want to start with it in a GPR.
float matching_constraint() {
int x = 0x80000000; // -0.0
float y;
asm(".space 0 # op0 was %0 op1 was %1" // .space 0 is zero byte, only used to avoid Godbolt filtering comments. Empty string would be fine.
: "=x"(y)
: "0"(x));
// Older GCC: with just "=x", the "0" input is loaded from .rodata with movd
// but with "=rx", it does mov-immediate + movd reg, xmm like newer GCC always wants to do
return y;
}
With this, the compiler picked XMM0 for y
(and thus also for x
), and did a movd
itself to get the value there. Older GCC like GCC6 do a mov-immediate to a GPR, then store/reload to stack space, with the default -mtune
, only using movd reg, xmm
with -mtune=intel
or a specific Intel like -mtune=haswell
. Using "=x"
instead of "=rx"
for some reason makes GCC6 do movd .LC1(%rip), %xmm0
, even though it still picks %xmm0
for the operand in both cases.
matching_constraint:
movl $-2147483648, %eax
movd %eax, %xmm0
.space 0 # op0 was %xmm0 op1 was %xmm0
ret
Again, .space 0
is just a zero-byte nop which is only there so I don't have to uncheck "filter comments" on the Godbolt compiler explorer. I could have used any GAS statement like .Ldummy=42 # comment
to define a symbol, or .global main
I could have written asm("" : "=rx"(y) : "0"(x));
with the empty string as the template. For correctness at least, it doesn't matter what register the compiler picked (integer or XMM), because "0"
guarantees that input will pick the same register as output operand 0. (https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html)
You could use register float y asm("xmm0");
to force "=r"(y)
to pick XMM0 specifically if you want to hard-code a register name into the asm template, but usually better to let the compiler do register allocation. In this case where you're passing it to printf, the x86-64 System V calling convention will indeed want it in XMM0. Windows x64 would want it in XMM1 and EDX.
Except it needs a cvtss2sd
anyway because variadic functions promote their float args to double
so it doesn't matter what register the float is in, other than avoiding a false dependency (for performance reasons) by writing the same register it reads. GCC is being silly here pxor-zeroing XMM0; if it had done movd %eax, %xmm0
; cvtss2sd %xmm0, %xmm0
, that would have an output dependency on the same register that's already an input dependency, making it a non-problem.
See also https://stackoverflow.com/tags/inline-assembly/info for more guides and useful Q&As.
The manual is also quite good these days: https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html