assemblyx86tasmattintel-syntax

How to know if an assembly code has particular syntax (emu8086, NASM, TASM, ...)?


I want to know how,by looking through a sample source code, recognise if the syntax used is em8086, TASM or NASM? I am a new to assembly..I would like to know more about emu8086 please.


Solution

  • NASM/YASM is easy to distinguish from MASM/TASM/emu8086. YASM uses NASM syntax, with a few minor differences in what it accepts for constants and directives.

    I don't know how to distinguish MASM from TASM, or TASM from emu8086, or FASM, so I'll leave that for another answer to address.


    In NASM, explicit sizes on things like memory operands use dword or byte. In TASM/MASM style, you have to write dword ptr or byte ptr.

    In MASM (and I think TASM/emu8086), a bare symbol name referes to the contents. You have to use offset foo to get the address of foo. In NASM, you have to use [foo] to create a memory operand, and foo is the address.

    There are probably other differences in syntax, too (e.g. in segment overrides), but these should be enough to tell by looking whether something is NASM-style or MASM-style.

    NASM:

    global foo
    foo:         ; a function called foo()
        add    dword [ecx], 2
        add    dword [counter], 1   ; Error without "dword", because neither operand implies an operand-size for the instruction.  And the [] is required.
        mov    eax, [static_var]
        mov    eax, [static_array + ecx*4] ; Everything *must* be inside the []
    
        mov    esi, static_var      ; mov esi,imm32 with the address of the static_var
        ret
    
    section .data
     static_var: dd 0xdeadbeef     ; NASM can use 0x... constant.  MASM only allows 0DEADBEEFh style
    
    section .bss
     counter: resd 1    ; reserve space for one dword (initialized to zero)
     buf:     resb 256  ; reserve 256 bytes
    

    Note the : after label names here, even for data. This is recommended but not required: any unknown token at the start of a line is assumed to be a label so counter resd 1 will assemble. But loop resd 1 won't because loop is a valid instruction mnemonic.

    MASM/TASM (I may have some of this wrong, I don't use MASM or TASM):

    GNU GAS .intel_syntax noprefix is mostly the same, but without the magic operand-size association for labels. And GAS directives / pseudo-instruction are totally different, like .byte 0x12 vs. db 12h.

    .CODE
    foo PROC      ; PROC/ENDP definitely means not NASM
        add    dword ptr [ecx], 2
        add    counter, 1            ; operand-size magically implied by the dd after the counter label.  [] is optional
        mov    eax, static_var       ; mov  eax, [static_var] is the same, and recommended by some for clarity
        mov    eax, static_array[ecx*4] ; [ static_array + ecx*4 ] is also allowed, but not required.
    
        mov    esi, OFFSET static_var   ; mov esi,imm32 with the address.
        ret
    ENDP
    
    .data       ; no SECTION directive, just .data directly
    
      static_var dd 0deadbeefH
    ;;; With a : after the name, it would be just a label, not a "variable" with a size associated.
    
    .bss
      ; (In most OSes, the BSS is initialized to zero.  I assume MASM/TASM allows you to write dd 0 in the BSS, but I'm not sure)
    
     counter: dd 0        ; reserve space for one dword (zeroed)
     buf   db 256 dup(?)  ; reserve 256 bytes (uninitialized).
    

    Except where I commented otherwise, any of these differences are a guaranteed sign that it's NASM/YASM or MASM/TASM/emu8086

    e.g. if you ever see a bare symbol as the destination operand (e.g. mov foo, eax), it's definitely not NASM, because mov imm32, r32 makes no sense. Unless the symbol is actually a macro definition for a register, e.g. %define result eax would allow mov result, 5. (Good catch, @MichaelPetch). If the source is full of macros, then look for the defs. %define means NASM, while MACRO means MASM/TASM.

    MASM/TASM doesn't have resb / resd directives. Instead, they have count DUP(value), where value can be ?.

    NASM has times 30 db 0x10 to repeat the byte 0x10 30 times. You can use it on anything, even instructions. It also has %rep directives to repeat a block.

    MASM and NASM have significant macro capabilities, but they use different syntax.

    The tag wiki has links to assembler manuals and much more.


    Other random things when assembling code with the wrong assembler:

    In MASM, dword by itself (not dword ptr) evaluates as the number 4, because that's the width of a dword. So mov dword [foo], 123 will disastrously assemble as mov 4[foo], 123 which is the same as [foo+4]. And the operand-size will be whatever size is implied by how you declared foo, e.g. foo db 1,2,3,4 is an array of bytes, so mov dword [foo], 123 assembled by MASM is actually mov byte ptr:foo, 123.

    See also Confusing brackets in MASM32 for the disaster of syntax-design that is MASM. mov eax, [const] is a mov-immediate if const was declared like const=0xb8000.