assemblyx86-64asmjit

Moving 64bit constants to memory


I am playing around with asmjit and generating assembly. Thereby I noticed that one can not use 64bit constants for instructions (excluding mov which makes sense).

Because of that, I push 64bit constants to the stack and use them by accessing the stack instead of using the constant as an operand. Different resources say, it is fine to use memory as an operand for the and instruction (e.g., [1], [2]).

However, I noticed that the and instruction does not work as expected. I will give you an example from my code:

mov r14, qword ptr [r15+32]   ; r14 holds a masked pointer now
mov qword ptr [rsp], 281474976710655    ; 0xFFFFFFFFFFFF is the mask for the pointer
and r14, [rsp]                ; Using pointer&mask I want to unmask the pointer

After that and instruction, the value in r14 remains as before:

When using a register instead, everything works like expected:

mov r14, qword ptr [r15+32]   ; r14 holds a masked pointer now
mov r13, 281474976710655      ; 0xFFFFFFFFFFFF is the mask for the pointer
and r14, r13                  ; Using pointer&mask I want to unmask the pointer

Of course, I could use a register instead of accessing the stack, but I would be interested in why this behaves differently.


Solution

  • Looks like you didn't check for asmjit errors. The docs say there's a kErrorInvalidImmediate - Invalid immediate (out of bounds on X86 and invalid pattern on ARM).

    The only x86-64 instruction that can use a 64-bit immediate is mov-immediate to register, the special no-modrm opcode that gives us 5-byte mov eax, 12345, or 10-byte mov rax, 0x0123456789abcdef, where a REX.W prefix changes that opcode to look for a 64-bit immediate. See https://www.felixcloutier.com/x86/mov / why we can't move a 64-bit immediate value to memory?


    Your title is a red herring. It's nothing to do with having an m64 operand for and, it's the constant that's the problem. You can verify that by single-stepping the asm with a debugger and checking both operands before the and, including the one in memory. (It's probably -1 from 0xFFFFFFFF as an immediate for mov m64, sign_extended_imm32, which would explain AND not changing the value in R14).

    Also disassembly of the JITed machine code should show you what immediate is actually encoded; again a debugger could provide that as you single-step through it.


    Use your temporary register for the constant (like mov r14, 0xFFFFFFFFFFFF), then and reg,mem to load-and-mask.

    Or better, if the target machine you're JITint for has BMI1 andn, construct the inverted constant once outside a loop with mov r13, ~0xFFFFFFFFFFFF then inside the loop use andn r14, r13, [r15+32] which does a load+and without destroying the mask, all with one instructions which can decode to a single uop on Intel/AMD CPUs.

    Of if you can't reuse a constant register over a loop, maybe mov reg,imm64, then push reg or mov mem,reg and use that in future AND instructions. Or emit some constant data somewhere near enough to reference with a RIP-relative addressing mode, although that takes a bit more code-size at every and instruction. (ModRM + 4 byte rel32, vs. ModRM + SIB + 0 or 1 bytes for data on the stack close to RSP).


    BTW, if you're just truncating instead of sign-extending, you're also assuming this is address is in the low half of virtual address space (i.e. user-space). That's fine, though. Fun fact: future x86 CPUs (first Sapphire Rapids) will have an optional feature that OSes can enable to transparently ignore the high bits, except for the MSB: LAM = Linear Address Masking. See Intel's future-extensions manual.

    So if this feature is enabled with 48-bit masking for user-space, you can skip the AND masking entirely. (If your code makes sure bit 47 matches bit 63; you might want to keep the top bit unmodified or 0 so your code can take advantage of LAM when available to save instructions).


    If you were masking to keep the low 32, you could just mov r14d, [r15+32] to zero-extend the low dword of the value into 64-bit R14. But for keeping the high 48 or 57 bits, you need a mask or BMI2 bzhi with 48 in a register.