cassemblygccx86inline-assembly

Is this inline asm correct for bitcasting int as float?


I am trying to familiarise myself with gnu inline assembly. I wrote a single line inline asm to reinterpret int as float. While this prints the correct result, I am wondering whether this is correct.

#include <stdio.h> 

int main() {
    int x = 0x80000000; // -0.0
    float y;
    asm("vmovd %0, %%xmm0"
     : "=r"(y)
     : "r"(x));
    printf("%f\n", y); 
}

I am a beginner in assembly.


Solution

  • Just for the record, inline asm is a bad way to type-pun aka bit_cast except as a learning exercise for inline asm syntax. Just use memcpy. (https://gcc.gnu.org/wiki/DontUseInlineAsm).

    And if you do use inline asm, generally don't write your own move instructions; use the power of operand constraints to tell the compiler where you want the input and where your asm leave the output, and let it invent the data movement.


    I am wondering whether this is correct.

    No, "=r" lets the compiler pick its choice of any integer register for that output operand, which will vary depending on the surrounding code and optimization options. You're writing XMM0 without having told the compiler about it through an operand or clobber.

    %0 is the first operand, the output "=r", and you're using it as the source for vmovd. (This is AT&T syntax, so destination is on the right.) So it' actually %1 (the source operand) you aren't referencing. The compiler will often pick the same GPR for an input and output operand (if you don't specify "=&r" to make an early-clobber output, for cases where an output is written before the last time all inputs are read.)

    If you put %0 and %1 in the format string, e.g. as a comment, you can look at the compiler's asm output and see what it picked for that operand.

    float buggy_foo() {
        int x = 0x80000000; // -0.0
        float y;
        asm("vmovd %0, %%xmm0   # op0 picked %0  op1 picked %1"
         : "=r"(y)
         : "r"(x));
        return y;  // will be in XMM0 per the calling convention
    }
    

    From the asm output with GCC -O3 targeting Linux (on Godbolt):

    buggy_foo:
        movl    $-2147483648, %eax    # materialize x in EAX
    # from your asm statement
        vmovd %eax, %xmm0   # op0 was %eax  op1 was %eax
    # end of your asm statement
        movd    %eax, %xmm0    # compiler-generated copy of y to retval reg (XMM0 for a float)
        ret
    

    Keeping the function minimal reduces the noise we have to sift through to find exactly what GCC thought was necessary just for the asm statement itself. return of a float in x86-64 needs it in an XMM register, and can use a float directly without needing to convert it to double like for printf.

    If we compile your main using this asm statement, we see why it happened to work there, too, since x and y again pick the same register (EAX).

        movl    $-2147483648, %eax
        pxor    %xmm0, %xmm0
     # from your asm statement.  I compiled without -march=x86-64-v3 so the compiler-generated insns are legacy-SSE
        vmovd %eax, %xmm0   # op0 was %eax  op1 was %eax
    
        movd    %eax, %xmm1  # compiler-generated
    
        leaq    .LC0(%rip), %rdi
        movl    $1, %eax
        cvtss2sd        %xmm1, %xmm0   # this is why GCC used pxor to zero XMM0 earlier
        call    *printf@GOTPCREL(%rip)
       ...
    

    You can do asm("" : "=rx"(y) : "0"(x)) to ask for the input in the same register as the output, and give the compiler a choice of integer (r) or vector (x) registers. That's like "+r"(y) but allowing a different variable of a different type for the input half.

    That avoids ever materializing the integer in an general-purpose integer register if the compiler doesn't think that's optimal. FP constants are often just loaded from .rodata.

    Although since GCC 12 or 13 or so, GCC has preferred mov-immediate to a GPR and vmovd plus vpbroadcastd for vector constants with duplicate elements, instead of loading them from static storage. But for x as an input to an asm statement, even older GCC want to start with it in a GPR.

    float matching_constraint() {
        int x = 0x80000000; // -0.0
        float y;
        asm(".space 0   # op0 was %0  op1 was %1"  // .space 0  is zero byte, only used to avoid Godbolt filtering comments.  Empty string would be fine.
         : "=x"(y)
         : "0"(x));
         // Older GCC: with just "=x", the "0" input is loaded from .rodata with movd
         // but with "=rx", it does mov-immediate + movd reg, xmm like newer GCC always wants to do
        return y;
    }
    

    With this, the compiler picked XMM0 for y (and thus also for x), and did a movd itself to get the value there. Older GCC like GCC6 do a mov-immediate to a GPR, then store/reload to stack space, with the default -mtune, only using movd reg, xmm with -mtune=intel or a specific Intel like -mtune=haswell. Using "=x" instead of "=rx" for some reason makes GCC6 do movd .LC1(%rip), %xmm0, even though it still picks %xmm0 for the operand in both cases.

    matching_constraint:
            movl    $-2147483648, %eax
            movd    %eax, %xmm0
            .space 0   # op0 was %xmm0  op1 was %xmm0
            ret
    

    Again, .space 0 is just a zero-byte nop which is only there so I don't have to uncheck "filter comments" on the Godbolt compiler explorer. I could have used any GAS statement like .Ldummy=42 # comment to define a symbol, or .global main

    I could have written asm("" : "=rx"(y) : "0"(x)); with the empty string as the template. For correctness at least, it doesn't matter what register the compiler picked (integer or XMM), because "0" guarantees that input will pick the same register as output operand 0. (https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html)


    You could use register float y asm("xmm0"); to force "=r"(y) to pick XMM0 specifically if you want to hard-code a register name into the asm template, but usually better to let the compiler do register allocation. In this case where you're passing it to printf, the x86-64 System V calling convention will indeed want it in XMM0. Windows x64 would want it in XMM1 and EDX.

    Except it needs a cvtss2sd anyway because variadic functions promote their float args to double so it doesn't matter what register the float is in, other than avoiding a false dependency (for performance reasons) by writing the same register it reads. GCC is being silly here pxor-zeroing XMM0; if it had done movd %eax, %xmm0 ; cvtss2sd %xmm0, %xmm0, that would have an output dependency on the same register that's already an input dependency, making it a non-problem.

    See also https://stackoverflow.com/tags/inline-assembly/info for more guides and useful Q&As.

    The manual is also quite good these days: https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html