I want to know how,by looking through a sample source code, recognise if the syntax used is em8086, TASM or NASM? I am a new to assembly..I would like to know more about emu8086 please.
NASM/YASM is easy to distinguish from MASM/TASM/emu8086. YASM uses NASM syntax, with a few minor differences in what it accepts for constants and directives.
I don't know how to distinguish MASM from TASM, or TASM from emu8086, or FASM, so I'll leave that for another answer to address.
In NASM, explicit sizes on things like memory operands use dword
or byte
. In TASM/MASM style, you have to write dword ptr
or byte ptr
.
In MASM (and I think TASM/emu8086), a bare symbol name referes to the contents. You have to use offset foo
to get the address of foo. In NASM, you have to use [foo]
to create a memory operand, and foo
is the address.
There are probably other differences in syntax, too (e.g. in segment overrides), but these should be enough to tell by looking whether something is NASM-style or MASM-style.
NASM:
global foo
foo: ; a function called foo()
add dword [ecx], 2
add dword [counter], 1 ; Error without "dword", because neither operand implies an operand-size for the instruction. And the [] is required.
mov eax, [static_var]
mov eax, [static_array + ecx*4] ; Everything *must* be inside the []
mov esi, static_var ; mov esi,imm32 with the address of the static_var
ret
section .data
static_var: dd 0xdeadbeef ; NASM can use 0x... constant. MASM only allows 0DEADBEEFh style
section .bss
counter: resd 1 ; reserve space for one dword (initialized to zero)
buf: resb 256 ; reserve 256 bytes
Note the :
after label names here, even for data. This is recommended but not required: any unknown token at the start of a line is assumed to be a label so counter resd 1
will assemble. But loop resd 1
won't because loop
is a valid instruction mnemonic.
MASM/TASM (I may have some of this wrong, I don't use MASM or TASM):
GNU GAS .intel_syntax noprefix
is mostly the same, but without the magic operand-size association for labels. And GAS directives / pseudo-instruction are totally different, like .byte 0x12
vs. db 12h
.
.CODE
foo PROC ; PROC/ENDP definitely means not NASM
add dword ptr [ecx], 2
add counter, 1 ; operand-size magically implied by the dd after the counter label. [] is optional
mov eax, static_var ; mov eax, [static_var] is the same, and recommended by some for clarity
mov eax, static_array[ecx*4] ; [ static_array + ecx*4 ] is also allowed, but not required.
mov esi, OFFSET static_var ; mov esi,imm32 with the address.
ret
ENDP
.data ; no SECTION directive, just .data directly
static_var dd 0deadbeefH
;;; With a : after the name, it would be just a label, not a "variable" with a size associated.
.bss
; (In most OSes, the BSS is initialized to zero. I assume MASM/TASM allows you to write dd 0 in the BSS, but I'm not sure)
counter: dd 0 ; reserve space for one dword (zeroed)
buf db 256 dup(?) ; reserve 256 bytes (uninitialized).
Except where I commented otherwise, any of these differences are a guaranteed sign that it's NASM/YASM or MASM/TASM/emu8086
e.g. if you ever see a bare symbol as the destination operand (e.g. mov foo, eax
), it's definitely not NASM, because mov imm32, r32
makes no sense. Unless the symbol is actually a macro definition for a register, e.g. %define result eax
would allow mov result, 5
. (Good catch, @MichaelPetch). If the source is full of macros, then look for the defs. %define
means NASM, while MACRO
means MASM/TASM.
MASM/TASM doesn't have resb
/ resd
directives. Instead, they have count DUP(value)
, where value can be ?
.
NASM has times 30 db 0x10
to repeat the byte 0x10
30 times. You can use it on anything, even instructions. It also has %rep
directives to repeat a block.
MASM and NASM have significant macro capabilities, but they use different syntax.
The x86 tag wiki has links to assembler manuals and much more.
In MASM, dword
by itself (not dword ptr) evaluates as the number 4
, because that's the width of a dword. So mov dword [foo], 123
will disastrously assemble as mov 4[foo], 123
which is the same as [foo+4]
. And the operand-size will be whatever size is implied by how you declared foo
, e.g. foo db 1,2,3,4
is an array of bytes, so mov dword [foo], 123
assembled by MASM is actually mov byte ptr:foo, 123
.
See also Confusing brackets in MASM32 for the disaster of syntax-design that is MASM. mov eax, [const]
is a mov-immediate if const
was declared like const=0xb8000
.