I've recently been looking into a concept for a CPU architecture called the Mill.
The Mill (though it may be vaporware) uses metadata for various things in the CPU, such as a software speculative load producing a value tagged as not a result (NaR
). If a later instruction tries to store that result non-speculatively, hardware detects that and faults.
I was wondering if any other CPU's are similar in the sense of using metadata in the architecture.
A few random examples I know of, certainly not an exhaustive list. IDK if there are any that use metadata for all the things the Mill does. Some of what the Mill does is unique, but some of the ideas have appeared in similar forms in other ISAs.
Yes, IA-64 Itanium also had not-a-thing
load results that would fault if you read them, for the same software-speculation reason as the Mill. Its architects described it as an EPIC ISA. (EPIC = Explicitly Parallel Instruction Computing, as opposed to CISC or RISC. It's also a VLIW.) From Wikipedia:
The architecture implements a large number of registers:
128 general integer registers, which are 64-bit plus one trap bit ("NaT", which stands for "not a thing") used for speculative execution. 32 of these are static, the other 96 are stacked using variably-sized register windows, or rotating for pipelined loops. gr0 always reads 0.
128 floating point registers. The floating point registers are 82 bits long to preserve precision for intermediate results. Instead of a dedicated "NaT" trap bit like the integer registers, floating point registers have a trap value called "NaTVal" ("Not a Thing Value"), similar to (but distinct from) NaN. These also have 32 static registers and 96 windowed or rotating registers. fr0 always reads +0.0, and fr1 always reads +1.0.
So for integer, there truly is separate metadata. For FP, the metadata is encoded in-band.
Other examples of metadata that aren't related to software-visible speculation include:
The x87 FPU has 8 architectural registers, but normal instructions access them as a register stack where the underlying register for st(0)
is determined by a field in the x87 status word. (i.e. the metadata is architecturally visible and can be modified with fincstp
to rotate the "revolver barrel".) See https://masm32.com/masmcode/rayfil/tutorial/fpuchap1.htm for a good diagram and intro to the x87 design. Also, x87 has a free / in-use flag for each register; trying to load into an already in use register produces an FP exception (and a NaN if exceptions are masked). Normally the in-use flag is cleared by "popping" the register stack with fstp
to store and pop, or whatever, but there's also ffree
to mark any x87 register as free.
Obviously a microarchitecture has to keep lots of info about instructions that are in flight, like whether they've finished executing or not. But there is at least one interesting case of metadata about data, not code:
In AMD Bulldozer-family and Bobcat/Jaguar, the SIMD FPUs apparently keep some extra metadata alongside the actual architectural register value. As Agner Fog explains in his microarchitecture PDF, (Bulldozer-family) 19.11 Data delay between different execution domains:
There is a large penalty when the output of a floating point calculation is input to a floating point calculation with a different precision, for example if the output of a double precision floating point addition is input to a single precision addition. This has hardly any practical significance since such a sequence is most likely to be a programming error, but it indicates that the processor stores extra information about floating point numbers beyond the 128 bits in an XMM register. This effect is not seen on Intel processors.
This might possibly be related to the fact that Bulldozer has FP latency 1 cycle lower when forwarding from an FMA-unit instruction to another FMA instruction, like mulps
forwarding to addps
with no sqrtps
or xorps
in between.
Also various AMD uarches have marked instruction boundaries in L1 I-cache, reducing / latency of decoding repeatedly. Intel Silvermont also does this.