assemblycpu-architecturemachine-code

Do assembly instructions map 1-1 to machine language?


I'm reading, in parallel, various books on computer architecture and I'm confused. Some book state that assembly instructions are just mnemonics for machine instructions, and each instruction corresponds to exactly one machine instruction. However, Tanenbaum's Structured Computer Organization puts assembly on the layer above the operating system, and seems to imply that assembly somehow uses the operating system (I haven't read the whole book yet...)

Which one is true? Are assembly instructions simply machine instructions? Can they be also be system calls which are interpreted by the OS to machine instructions? Can they be something else?


Solution

  • Mostly yes, one line of assembly corresponds to one CPU instruction. But there are some caveats.

    Label definitions don't correspond to any instructions - they just mark up the memory so that you can refer to it elsewhere. Labels definitely don't correspond to instructions, even though under some assemblers they occupy separate lines.

    Data directives like db 0x90 or .byte 0x90 manually assemble bytes into the output file. Using such directives in a region that will be reached by execution lets you manually encode instructions, or create bugs if you did that by accident.

    Assemblers often support directives - lines that provide some guidance to the assembler itself. Those don't correspond to CPU instructions, and they can sometimes be mistaken for genuine commands.

    Some assemblers support macros - think inline functions.


    Some RISC assemblers, notably MIPS, have a notion of combined instructions - one line of assembly corresponds to a handful of instructions. (These are called pseudo-instructions.) Those are like built-in macros, provided by the assembler.

    But depending on the operand, it might only need to assemble to 1 machine instruction. e.g. li $t0, 1 can assemble to ori $t0, $zero, 1 but li $t0, 0x55555555 needs both lui and ori (or addiu).

    On ARM, ldr r0, =0x5555 can choose between a PC-relative load from a literal pool or a movw if assembling for an ARM CPU that supports movw with a 16-bit immediate. You wouldn't see ldr r0, =0x5555 in disassembly, you'd see whichever machine instruction(s) the assembler picked to implement it. (Editor's note: I'm not sure if any ARM assemblers will ever pick 2 instructions (movw + movk) for a wider constant for ldr reg, =value)


    Do you count a procedure call as "multiple instructions per line"? There's CALL on Intel, BL on ARM. As far the CPU docs are concerned, those are single instructions. They're just branches that also store a return address somewhere.

    But if you're debugging and stepping over function calls instead of into them, they invoke a procedure/function/subroutine that may contain arbitrarily many instructions. Same goes for syscalls: an instruction like syscall or svc #0 is basically a function call into the kernel.


    Assembly programs can definitely consume services from the operating system. How do you think regular programs do that? Whatever a high level program can do, assembly can do also. The specifics vary though.