assemblyx86-64nasmyasm

yasm movsx, movsxd invalid size for operand 2


I am trying to assemble the code below using yasm. I have put 'here' comments where yasm reports the error "error: invalid size for operand 2". Why is this error happening ?

segment .data
    a db 25
    b dw 0xffff
    c dd 3456
    d dq -14

segment .bss
    res resq 1

segment .text
    global _start

_start:
    movsx rax, [a] ; here
    movsx rbx, [b] ; here 
    movsxd rcx, [c] ; here
    mov rdx, [d]
    add rcx, rdx
    add rbx, rcx
    add rax, rbx
    mov [res], rax
    ret

Solution

  • For most instructions, the width of the register operand implies the width of the memory operand, because both operands have to be the same size. e.g. mov rdx, [d] implies mov rdx, qword [d] because you used a 64-bit register.

    But the same movsx / movzx mnemonics are used for the byte-source and word-source opcodes, so it's ambiguous unless the source is a register (like movzx eax, cl). Another example is crc32 r32, r/m8 vs. r/m16 vs. r/m32. (Unlike movsx/zx, its source size can be as wide as the operand-size.)

    movsx / movzx with a memory source always need the width of the memory operand specified explicitly.

    The movsxd mnemonic is supposed to imply a 32-bit source size. movsxd rcx, [c] assembles with NASM, but apparently not with YASM. YASM requires you to write dword, even though it doesn't accept byte, word, or qword there, and it doesn't accept movsx rcx, dword [c] either (i.e. it requires the movsxd mnemonic for 32-bit source operands).

    In NASM, movsx rcx, dword [c] assembles to movsxd, but movsxd rcx, word [c] is still rejected. i.e. in NASM, plain movsx is fully flexible, but movsxd is still rigid. I'd still recommend using dword to make the width of the load explicit, for the benefit of humans.

    movsx    rax,  byte [a]
    movsx    rbx,  word [b]
    movsxd   rcx, dword [c]
    

    Note that the "operand size" of the instruction (as determined by the operand-size prefix to make it 16-bit, or REX.W=1 to make it 64-bit) is the destination width for movsx / movzx. Different source sizes use different opcodes.


    In case it's not obvious, there's no movzxd because 32-bit mov already zero-extends to 64-bit implicitly. movsxd eax, ecx is encodeable, but not recommended (use mov instead).

    In AT&T syntax, you need to explicitly specify both the source and destination width in the mnemonic, like movsbq (%rsi), %rax. GAS won't let you write movsb (%rsi), %eax to infer a destination width (operand-size) because movsb/movsw/etc are the mnemonics for string-move instructions with implicit (%rsi), (%rdi) operands.

    Fun fact: GAS and clang do allow it for things like movzb (%rsi), %eax as movzbl, but GAS only has extra logic to allow disambiguation (not just inferring size) based on operands when it's necessary, like movsd (%rsi), %xmm0 vs. movsd. (Clang12.0.1 actually does accept movsb (%rcx), %eax as movsbl, but GAS 2.36.1 doesn't, so for portability it's best to be explicit with sign-extension, and not a bad idea for zero-extension too.)


    Other stuff about your source code:

    NASM/YASM allow you to use the segment keyword instead of section, but really you're giving ELF section names, not executable segment names. Also, you can put read-only data in section .rodata (which is linked as part of the text segment). What's the difference of section and segment in ELF file format.

    You can't ret from _start. It's not a function, it's your ELF entry point. The first thing on the stack is argc, not a valid return address. Use this to exit cleanly:

    xor    edi,edi
    mov    eax, 231
    syscall            ; sys_exit_group(0)
    

    See the tag wiki for links to more useful guides (and debugging tips at the bottom).