assemblyx86armmipsintel-syntax

Differences between Assembly Languages


I'm very new to Assembly Language and I know there are many, many types of assembly languages. Such as ARM, MIPS, x86 etc. So I have questions.

Below is more Information

Hello World in ARM

.text            
.global _start
_start:
    mov r0, #1
    ldr r1, =message
    ldr r2, =len
    mov r7, #4
    swi 0

    mov r7, #1
    swi 0

.data
message:
    .asciz "hello world\n"
len = .-message 

Hello World in NASM

section .data                        ; section to define memory
    msg db  "Hello, World!", 10      ; define string to the name 'msg', 10 is '\n' character
    len equ $ - msg                  ; assign the length of the string to the name 'len'
        
section .text                        ; section to put code
    global _start                    ; a global label to be declared for the linker (ld)
        
_start:
    mov rax, 1                       ; syscall ID number (sys_write)
    mov rdi, 1                       ; file descriptor (stdout)
    mov rsi, msg                     ; address of string to write
    mov rdx, len                     ; length of string
    syscall                          ; call kernel
        
    mov rax, 60                      ; syscall ID number (sys_exit)
    mov rdi, 0                       ; error code 0
    syscall                          ; call kernel

Hello World in GAS Syntax

.data
hello:
    .string "Hello world!\n"

.text
.globl _start
_start:
    movl $4, %eax # write(1, hello, strlen(hello))
    movl $1, %ebx
    movl $hello, %ecx
    movl $13, %edx
    int  $0x80

    movl $1, %eax # exit(0)
    movl $0, %ebx
    int  $0x80

Hello World in MIPS Assembly

        .data
msg:   .asciiz "Hello World"
    .extern foobar 4

        .text
        .globl main
main:   li $v0, 4       # syscall 4 (print_str)
        la $a0, msg     # argument: string
        syscall         # print the string
        lw $t1, foobar
        
        jr $ra          # retrun to caller

I searched the internet for a lot of information on these.

So I found this information:

And I have questions to ask from you:

  1. Although the assembly languages has changed, isn't the code writing order different?
  2. The only difference between assembly languages is the syntax they used?

More information

GAS : At&T Syntax

movq    $2, %r8                 # %r8 = 2
movq    $3, %r9                 # %r9 = 3
movq    $5, %r10                # %r10 = 5
imulq   %r9, %r10               # %r10 = 3 * 5 = 15
addq    %r8, %r10               # %r10 = 2 + 15 = 17

NASM : INTEL Syntex

mov r8, 2               ; r8 = 2
mov r9, 3               ; r9 = 3
mov r10, 5              ; r10 = 5
mul r9, r10             ; r10 = 3 * 5 = 15
add r8, r10             ; r10 = 2 + ( 15 ) = 17

Solution

  • Assembly language is defined by the tool, not the target. You can have 100 assemblers for something that let's say is not changing any more like pdp-11, and that means somewhere between 2 and 100 incompatible assembly languages. All at the whim of the author.

    ARM, x86, risc-v, mips, etc are different architectures with ideally incompatible machine code. So naturally the assembly language choices by the authors would reflect the architecture.

    The processor vendor, creator/inventor of, will have documentation and in some form that documentation will include the machine code as well as example assembly language and hopefully a description. That assembly language tends to follow the tool created by or for that processor vendor. Not all vendors have tools, risc-v for example simply caused in some way direct or indirect to have gnu ports made, but it was not uncommon in the old days for the vendor to have some tools, even if there were third party tools that came later.

    Intel vs AT&T is not a "syntax" there is a vast array of incompatible Intel assemblers as well as AT&T style assemblers, for example

    mov 5h,ax
    
    mov ax,5h 
    

    would each work on various tools in history but do not necessarily work on today's; one will, the other will not. Both ideally generate the same machine code if built for the same target (variant of x86 desired).

    There is no reason why one assembler could support

    mov r15,r15
    

    another

    mov pc,lr
    

    and another

    move pickle,banana
    

    and another

    return
    

    As long as they generate the same machine code (were intended to implement the same instruction). One would hope that at least mov r15,r14 would be supported across arm assemblers (pre aarch64) but no actual expectation that it will work, each tool is its own deal. Certainly, one of those syntaxes would not attract a lot of users compared to the others. But there is absolutely nothing wrong with them as the job of assembly language or at least the assumption of what an assembler does is take some syntax and generate machine code from it.

    x86 being one of the biggest nightmares not only because of Intel vs AT&T

    A fair amount different, incompatible, assembly languages exist for Intel style as well as for AT&T style within those domains without crossing over. Within the Intel style world you had Borland's tasm Microsoft's masm which somewhat conformed to the Intel documentation so as a result Intel's (there was at least one good shareware one out at that time too) and nasm has attempted to carry that syntax as well as some percentage of compatibility on. While each port of an ISA to gnu binutils is possibly a different person/team, my experience is that gnu assembler seems proud of their incompatibilities with other assembly languages for the same target, otherwise why would the do it so often. The first one many of is find is

    mov r0,r1 ; this is a comment
    

    but with gas it is not

    mov r0,r1 @now this is a comment
    

    Sure, we know why, but it is not consistent with all of the targets. The mrs/msr instructions were originally not compatible with the arm docs, and further there were white space rules you had to conform to, now while still being gas that is a bit more relaxed and easier to cope with.

    Assembly language is not just the mnemonics it is the whole deal, you can only go so far with the mnemonics as soon as you need a label then is there a colon or not, is there some other marker or not, does it have to start in the first column, are there reserved words other than mnemonics you can't use, etc. All very specific to the tool authors choices and not the architecture. Including pseudo code that is not assumed to port from one tool to another

    ldr r0,=0x12345678
    ldr r1,=message
    pop {r0,r1}
    nop
    

    and some others (for ARM in this example). And some fun gnu ones that I think people use to happily confuse others

    1:
    
       beq 1b
    
       b 1f
    
    1:
    

    pdp-11 (lsi-11, whatever) which I mentioned above does actually fall into this category it is currently supported for gcc and binutils, being from an "octal" company most everything about that is is octal based to how the instruction is broken into fields, but the gnu port is not octal, the disassemblies are in hex making it harder to parse the instruction. But beyond that does not seem there was any attempt to be remotely compatible with the assembly language(s) that have been in use for decades for that target (oh, yeah, pdp-11s are still a major part of our lives right now as are 8051s which are a massive part of our lives right now)

    Gas and Intel Assembly are same. But syntax is different

    Definitely not, especially since "Intel Assembly" has within itself been incompatible.

    Syntax is only different for copyright reasons

    Nope, not the reason, personal choices. Why create an new assembler when a perfectly good one is there? Because you want to change something, now with good free ones you can't have the change be the price.

    ARM's have their own Syntax type

    ARM is a different architecture from x86 with its own history, ARM's own tools have had three or more generations, then there are the gnu tools and its at least two generations of assembly languages. And they are quite incompatible.

    Although the assembly languages has changed, is not the code writing order different?

    Destination on the left or right is in no way an x86 thing. I think it probably has to do with your first assembly language that you struggled through if you even struggled since asm is the easiest of all the programming languages. But some initial tools for initial targets are destination first, others are destination last. The AT&T one baffles me, as most of the x86s were PC's not Unix systems, but whatever down the road because of their not using the Intel documentation as a starting point and the gnu port choices, it is what it is. When I went to school we learned

    y = x + 5
    

    not

    x + 5 = y
    

    Oh and ponder this

    ldr r0,[r1]
    str r0,[r2]
    

    instead of

    str [r2],r0
    

    The only difference between assembly languages is the syntax they used?

    Yes of course assembly language is a programming language, the difference between python and C is...the syntax. The difference between C and asm is the syntax they are different programming languages and the difference between one assembly language and another from another tool is the syntax. I could have two languages where x = y + 3 is valid syntax in both doesn't mean they are or should be compatible. I can take Rust and C and C++ and Pascal and even Python and make machine code for a target out of it, does not mean the languages need to be compatible. Specifically the differences between assembly languages within the same target is the syntax.

    When you then compare one from one target to one from another the differences also reflect the target instruction set architecture, there is no actual reason for it but x86 syntaxes overuse mov, where others will have separate load and store instructions to reflect those operations. those instructions sets could use mov as well likewise there is no reason to not have a load and store in x86 syntax.

    Instruction sets are as common as they are different, once you learn one if you start with a good one (pdp-11, msp430, arm original thumb and a small list of others) then the next instruction set is that much easier and eventually you can learn 20 or 30 or 100 no problem because the all tend to have an add, a subtract, an or, an xor a load and store a push and pop and a register to register move/copy and some other things. Just the flags and number and size of registers and size and shape of immediates vary as of course does the machine code. So basic programming using primitive operators is one hurdle for some folks, then beyond that it is "just a matter of syntax".