I am trying to assemble the code below using yasm. I have put 'here' comments where yasm reports the error "error: invalid size for operand 2". Why is this error happening ?
segment .data
a db 25
b dw 0xffff
c dd 3456
d dq -14
segment .bss
res resq 1
segment .text
global _start
_start:
movsx rax, [a] ; here
movsx rbx, [b] ; here
movsxd rcx, [c] ; here
mov rdx, [d]
add rcx, rdx
add rbx, rcx
add rax, rbx
mov [res], rax
ret
For most instructions, the width of the register operand implies the width of the memory operand, because both operands have to be the same size. e.g. mov rdx, [d]
implies mov rdx, qword [d]
because you used a 64-bit register.
But the same movsx
/ movzx
mnemonics are used for the byte-source and word-source opcodes, so it's ambiguous unless the source is a register (like movzx eax, cl
). Another example is crc32 r32, r/m8
vs. r/m16
vs. r/m32
. (Unlike movsx/zx, its source size can be as wide as the operand-size.)
movsx
/ movzx
with a memory source always need the width of the memory operand specified explicitly.
The movsxd
mnemonic is supposed to imply a 32-bit source size. movsxd rcx, [c]
assembles with NASM, but apparently not with YASM. YASM requires you to write dword
, even though it doesn't accept byte
, word
, or qword
there, and it doesn't accept movsx rcx, dword [c]
either (i.e. it requires the movsxd
mnemonic for 32-bit source operands).
In NASM, movsx rcx, dword [c]
assembles to movsxd
, but movsxd rcx, word [c]
is still rejected. i.e. in NASM, plain movsx
is fully flexible, but movsxd
is still rigid. I'd still recommend using dword
to make the width of the load explicit, for the benefit of humans.
movsx rax, byte [a]
movsx rbx, word [b]
movsxd rcx, dword [c]
Note that the "operand size" of the instruction (as determined by the operand-size prefix to make it 16-bit, or REX.W=1 to make it 64-bit) is the destination width for movsx
/ movzx
. Different source sizes use different opcodes.
In case it's not obvious, there's no movzxd
because 32-bit mov
already zero-extends to 64-bit implicitly. movsxd eax, ecx
is encodeable, but not recommended (use mov
instead).
In AT&T syntax, you need to explicitly specify both the source and destination width in the mnemonic, like movsbq (%rsi), %rax
. GAS won't let you write movsb (%rsi), %eax
to infer a destination width (operand-size) because movsb
/movsw
/etc are the mnemonics for string-move instructions with implicit (%rsi), (%rdi) operands.
Fun fact: GAS and clang do allow it for things like movzb (%rsi), %eax
as movzbl
, but GAS only has extra logic to allow disambiguation (not just inferring size) based on operands when it's necessary, like movsd (%rsi), %xmm0
vs. movsd
. (Clang12.0.1 actually does accept movsb (%rcx), %eax
as movsbl
, but GAS 2.36.1 doesn't, so for portability it's best to be explicit with sign-extension, and not a bad idea for zero-extension too.)
Other stuff about your source code:
NASM/YASM allow you to use the segment
keyword instead of section
, but really you're giving ELF section names, not executable segment names. Also, you can put read-only data in section .rodata
(which is linked as part of the text segment). What's the difference of section and segment in ELF file format.
You can't ret
from _start
. It's not a function, it's your ELF entry point. The first thing on the stack is argc
, not a valid return address. Use this to exit cleanly:
xor edi,edi
mov eax, 231
syscall ; sys_exit_group(0)
See the x86 tag wiki for links to more useful guides (and debugging tips at the bottom).