I am a little bit confused by how movzx
behaves in the following example. (Please note that I am assuming the print_int function used in my code sample works and the problem is not there but in my understanding of movzx
, since this was recommended by our professor, and it is said to us that it just prints out whatever is in the register as a decimal number):
%include "../linux-ex/asm_io.inc"
extern printf
section .text
global main
main:
push ebp
mov ebp, esp
xor eax, eax
mov ax, [n1]
call print_int
leave
ret
section .data
n1 dw 01234h
This piece of NASM code on a 32-bit architecture prints out 4660
as expected. If I change:
mov ax, [n1]
to
movzx ax, [n1]
I get an output of 52
. I know it doesn't make any sense to try to zero-extend to ax
since the size of n1
is 16 bits too, but I was surprised that it would yield a different output. It seems that the first 8 bits are cut off and set to zero in the second example, leaving us with the hex number: 34
which is 52
decimal. Why is this happening?
When the assembler sees
movzx ax, [n1]
it looks for an instruction that fits the mnemonic and operands used here. I.e. an instruction with mnemonic movzx
whose first operand is a 16-bit register and whose second operand is a memory operand. The only variant of movzx
that fits the bill is
movzx r16, r/m8
and this is indeed what NASM assembles this code to.
In contrast to MASM, NASM does not track the type of symbols. While MASM might warn or fail assembly with an error because [n1]
has type word but is used as an operand of type byte, NASM does no such thing. Instead, the size of operands must be explicitly specified using a keyword like byte
, word
, or dword
if it is ambiguous (i.e. if there are multiple instructions with the same mnemonic and operands, but at different operand sizes).