c++assemblyx86-64sseasmjit

Set XMM register via address location for X86-64


I have a float value at some address in memory, and I want to set an XMM register to that value by using the address. I'm using asmjit.

This code works for a 32 bit build and sets the XMM register v to the correct value *f:

using namespace asmjit;
using namespace x86;

void setXmmVarViaAddressLocation(X86Compiler& cc, X86Xmm& v, const float* f)
{
   cc.movq(v, X86Mem(reinterpret_cast<std::uintptr_t>(f)));
}

When I compile in 64 bits, though, I get a segfault when trying to use the register. Why is that?

(And yes, I am not very strong in assembly... Be kind... I've been on this for a day now...)


Solution

  • The simplest solution is to avoid the absolute address in ptr(). The reason is that x86/x86_64 requires a 32-bit displacement, which is not always possible for arbitrary user addresses - the displacement is calculated by using the current instruction pointer and the target address - if the difference is outside a signed 32-bit integer the instruction is not encodable (this is an architecture constraint).

    Example code:

    using namespace asmjit;
    
    void setXmmVarViaAddressLocation(x86::Compiler& cc, x86::Xmm& v, const float* f)
    {
        x86::Gp tmpPtr = cc.newIntPtr("tmpPtr");
        cc.mov(tmpPtr, reinterpret_cast<std::uintptr_t>(f);
        cc.movq(v, x86::ptr(tmpPtr));
    }
    

    If you want to optimize this code for 32-bit mode, which doesn't have the problem, you would have to check the target architecture first, something like:

    using namespace asmjit;
    
    void setXmmVarViaAddressLocation(x86::Compiler& cc, x86::Xmm& v, const float* f)
    {
        // Ideally, abstract this out so the code doesn't repeat.
        x86::Mem m;
        if (cc.is32Bit() || reinterpret_cast<std::uintptr_t>(f) <= 0xFFFFFFFFu) {
            m = x86::ptr(reinterpret_cast<std::uintptr_t>(f));
        }
        else {
            x86::Gp tmpPtr = cc.newIntPtr("tmpPtr");
            cc.mov(tmpPtr, reinterpret_cast<std::uintptr_t>(f);
            m = x86::ptr(tmpPtr);
        }
    
        // Do the move, now the content of `m` depends on target arch.
        cc.movq(v, x86::ptr(tmpPtr));
    }
    

    This way you would save one register in 32-bit mode, which is always precious.