assembly

How a PC knows what is code or data in assembly?


I am learning assembly MIMPS (A version slightly different from my University).

When I translate my code to binary, how a computer knows which part of the binary file is code, and which part is data? To give an example of what I am trying to ask, in c I have studied that words end in "\0". It is a convention that tells us when a word ends and another thing begins. In order to load code and data to different parts of memory, I suposse that the computer has to know somehow where data ends and where code begins in my binary file.

It is true that there are directives in assembly like ".data" or ".text", however, as far as I know, when I assemble those directives are removed (they are not in the binary file), in theory they are only for the compiler. So it does not answer my question, those directives belong to assembly lenguage, not to the binary file.


Solution

  • Tommaso Bianchi's answer is very detailed, but maybe a bit too technical for what you expected. So, let me explain in a simpler way.

    1. Assembler and Binary Structure
      When you assemble your code, the assembler converts it into a binary file. But this binary is not just a random bunch of bytes—it has a structured format. The .text section (instructions) and .data section (data, more precisely, static variables) are placed in different parts of the binary. Actually, a program called the linker does some extra processing, but I’ll skip that since it’s not essential here. In the end, you get an executable file, like a .exe on Windows.

    2. How Does the Computer Know?
      When your OS runs the executable, a system component called the loader reads the binary format. Since the binary includes section info, the loader can properly separate instructions (.text) from data (.data, static variables) and put them in the right memory locations.

    Does this make sense? If you want to go deeper, I recommend the book Linkers & Loaders by John R. Levine.

    Hope this helps!

    P.S. Answer to the questioner's comment below.

    How the computers knows in the binary file when one part ends so that another begins?

    That’s a very fundamental topic in software science. There are generally two common approaches:

    1. Storing the length in a header
      One way is to store the sizes of each section at the beginning of the file (header). For example, if the .text section is 1000 bytes and the .data section is 200 bytes, the binary can store this information at the start. Then, when reading the file, the system knows exactly where each section starts and ends.

    2. Using a special marker at the end
      Another method is placing a special byte sequence at the end of each section, something that never appears inside the data itself. For example, in the C programming language, a string like "Hello World!" is stored with an extra null byte (0x00) at the end. Since C strings never contain 0x00 inside them, this marks the end clearly.
      But for things like .text sections in assembly, it’s hard to guarantee that a specific byte never appears, so this method is not practical.

    For executable formats like ELF (Linux) or PE (Windows), the first approach (using section size information) should be used. If you want to study deeper, you can check books about executable formats like ELF, but honestly, this level of detail is not necessary for most programmers.