armcpu-architectureinstruction-setprogram-counterrisc

How many bits do instruction sets have in ARM?


When working with ARM, we commonly understand that the data width residing on an address is 8 bits (I hope this assumption is correct).

How does the program counter increment? Does the program counter increment by 4 every time? Inferring that the instruction sets are all 32 bits? I also read somewhere that there are also thumb instruction sets with some mention about 16 bit instruction sets which would imply that the program counter should increment by 2 each time.

So, the other day I was looking at the disassembly and saw that it doesn't always increment uniformly. This is then confusing because I always thought for RISC processors (ARM in this case) the instruction sets are all the same data width.

How would the program counter know what to increment each time? By looking at the op code of the previous instruction? Seems complicated. I always thought that the program counter is just a simple counter incrementing by some fixed value (obviously my underlying assumptions were wrong).


Solution

  • It sounds like you are trying to overcomplicate this. Also note that you can download the instruction set documentation yourself.

    ARM is a bit generic (as is MIPS and RISC-V and so on). ARM has a number of instruction sets. If we want to think about the traditional Acorn ARM days it is a 32 bit instruction, fixed length. So the program counter moves four bytes each instruction. Starting with ARMv4T you now have thumb mode as well which at that time were fixed length 16 bit instructions so in thumb mode two bytes per instruction in ARM mode four bytes.

    The cortex-ms with ARMv6-m and ARMv7-m (initially) you were fixed in thumb mode, no arm mode. The "all thumb variant" instructions are again 16 bit so two bytes per. But once you start to decode the instruction there are thumb2 extensions, made from formerly invalid thumb instructions, so you basically need to fetch two more bytes. Total of a 32 bit instruction, but variable length like x86 and so many others. (You fetch a byte, you decode the byte, maybe you need another byte then decode that then know how many bytes total more you need to fetch).

    I assume folks do not know this but mips has a 16 bit mode as well in some of their products just like ARM you switch modes then switch back.

    ARMv7 (not cortex-m the full size) also supports a list of thumb2 instructions so you have normal arm 32 bit instructions you have thumb 16 bit instructions and you have thumb2 extensions which add another 16 bits to specific instructions in thumb mode.

    AARCH64 which is ARMv8 is a completely new and incompatible instruction set to the former, in this context called AARCH32. And these are fixed 32 bit instructions, so four bytes per.

    Jazelle is JAVA thing, JAVA compiles down to byte code so you decode a byte and go from there.

    RISC-V is mostly 32 bit instructions but there is a compressed mode and those are 16 bit instructions. In RISC-V 32 and 16 bit instructions can coexist back to back you do not switch modes. Lower bits of each instruction are used to determine the size of the instruction. You can easily get the RISC-V docs and read this for yourself. In RV32I for example the instructions are aligned. But if you add compressed RV32IC then, obviously, the 32 bit instructions can be unaligned. It is up to who implements this to choose if they want to fetch 16 at a time all the time or 32 at a time and do extra work if unlucky...

    I cannot imagine any modern (implementation of a) processor would simply move the pc one byte at a time. Great for textbooks and 6502, 8051, z80, x86, homework assignments/semester projects. But that would be painfully inefficient and the processors you use would run significantly slower. Memory is not even implemented as 8 bit bytes. Your internal srams, think caches, are not 8 bits wide, they are going to be multiples of 32 or 64 bits wide, or 32+parity or 32+ecc, depending on the design. If you want to write a byte then the controller has to read the 32 bit value modify 8 of those bits then write it back. With all the overhead you cannot see this performance hit in an x86 but you can see it in ARMs and other high performance processors. In an x86 your cache line and cache widths are pretty big and the fetches are big and there are stages that decode this variable length instruction set.

    We can assume that the ARMv1 may have really had an actual program counter that was used for both fetching and execution. And when you get to execution, the program counter is two ahead, and the instruction set is designed around that. Just like we assume the very first MIPS pipeline keeps going and can't stop on a branch so you have the branch shadow that has to get executed. No one should assume that the implementation of ARM processors today have one program counter that is used for fetching and execution. You can write an emulator in a weekend and you would likely write the code similar in some respects to how you would do a one instruction at a time emulator. A "program counter" variable that you use to fetch the next instruction, for execution you do the math based on mode to what the program counter would be during execution. And you would possibly compute the conditional branch address which is another program counter. At some point in execution of a conditional branch you have two possibly next addresses, the address of the next instruction linearly, and the address of the branch destination. And before you fetch the next instruction you pick one.

    You then need to think about prefetching and branch prediction in all of its forms. Adding more "program counters" that are used to fetch instructions at the same time.

    Do the same for any instruction set.

    RISC/CISC do not matter here. For the specific XYZ instruction set, here are the rules for that instruction set. And then for each implementation that author chooses how to implement it. How many things called a program counter or that function like a program counter are up to that author/implementation.

    Look at x86 and how many different implementations that have happened over the years. There was a period where they had two teams that would leapfrog and you could see that the ones from the same team would sometimes resemble a prior one from that team but the would not necessarily resemble ones from the other team (performance, clearly they would all execute the same instruction set).

    In short this is one of those cases where you move from the textbook to real world. (textbook 5 stage pipeline is another one).

    Registers like r0 in mips/riscv and the program counter in any processor that you can access the program counter, without seeing the implementation we do not know if these actually exist in the register file (if it is implemented that way even) or if they are faked through an if-then-else. You have to do extra work either way, if then else then the register file gets this value. If if the register file is read then if it is the pc then fake it else read the file.