x86cpu-architectureriscvinstruction-setprogram-counter

x86 Program Counter abstracted from microarchitecture?


I'm reading the book The RISC-V Reader: An Open Architecture Atlas. The authors, to explain the isolation of an ISA (Instruction Set Architecture) from a particular implementation (i.e., microarchitecture) wrote:

The temptation for an architect is to include instructions in an ISA that helps performance or cost of one implementation at a particular time, but burden different or future implementations.

As far as I understand, it states that when designing an ISA, the ISA should ideally refrain from exposing the details of a particular microarchitecture that implements it.


Keeping the quote above in mind: When it comes to the program counter, on the RISC-V ISA, the program counter (pc) points to the instruction being currently executed. On the other hand, on the x86 ISA, the program counter (eip) does not contain the address of the instruction being currently executed, but the address of the one following the current instruction.

Is the x86 Program Counter abstracted away from the microarchitecture?


Solution

  • I'm going to answer this in terms of MIPS instead of x86, because (1) MIPS and x86 have a similarity in this area, and because (2) RISC V was developed by Patterson, et al, after decades of experience with MIPS.  I feel these statement from their books are best understood in this comparison because x86 and MIPS both encode branch offsets relative to the end of the instruction (pc+4 in MIPS).

    In both MIPS and x86, PC-relative addressing modes were only found in branches in early ISA versions. Later revisions added PC-relative address calculation (e.g. MIPS auipc or x86-64's RIP-relative addressing mode for LEA or load/store). These are all consistent with each other: the offset is encoded relative to (one past) the end of the instruction (i.e. the next instruction start) — whereas, as you're noting, in RISC V, the encoded branch offset (and auipc, etc..) is relative to the start of the instruction instead.

    The value of this is that it removes an adder from certain datapaths, and sometimes one of these datapaths can be on the critical path, so for some implementations this minor shortening of the datapath means a higher clock rate.

    (RISC V, of course, still has to produce instruction + 4 for pc-next and the return address of call instructions, but that is much less on the critical path.  Note that in the diagrams below neither shows the capture of pc+4 as a return address.)


    Let's compare hardware block diagrams:

    MIPS datapath (simplified)                                                MIPS datapath (simplified)


    RISC V datapath (simplified)                                                RISC V datapath (simplified)

    You can see on the RISC V datapath diagram the line tagged #5 (in red, just above the control oval), bypasses the adder (#4, which adds 4 to the pc for pc-next).


    Attribution for diagrams


    Why did x86 / MIPS make that different choice back in their initial versions?

    Of course, I can't say for sure.  What it looks like to me is that there was a choice to be made and it simply didn't matter for the earliest implementations, so they probably were not even aware of the potential issue.  Almost every instruction needs to compute instruction-next anyway, so this probably seemed like the logical choice.

    At best, they might have saved a few wires, as pc-next is indeed required by other instructions (e.g. call) and pc+0 is not necessarily otherwise needed.

    An examination of prior processors might show this was just the way things were done back then, so this might have been more of a carry over of existing methods rather than a design choice.

    8086 is not pipelined (other than the instruction prefetch buffer) and variable-length decoding has already found the end of an instruction before it starts to execute.

    With years of hindsight, this datapath issue is now addressed in RISC V.

    I doubt they made the same level of conscious decision about this, as was done for example, for branch delay slots (MIPS).


    As per discussion in comments, 8086 may not have had any exceptions that push the instruction start address. Unlike on later x86 models, divide exceptions pushed the address of the instruction after div/idiv. And in 8086, interrupt-resume after cs rep movsb (or other string instruction) pushed the address of the last prefix, not the whole instruction including multiple prefixes. This "bug" is documented in Intel's 8086 manual (scanned PDF). So it's quite possible 8086 really didn't record the instruction start address or length, only the address where decoding finished before starting execution. This was fixed by at least 286, maybe 186, but applies to all 8086 / 8088 CPUs.

    MIPS had virtual memory from the start, so it did need to be able to record the address of a faulting instruction so it could be rerun after exception-return. Plus software TLB-miss handling also required re-rerunning a faulting instruction. But exceptions are slow and flush the pipeline anyway, and aren't detected until well after fetch, so presumably some calculation would be needed regardless.