I'm doing a project that is to implement a dual-processor system with some kind of cache coherency (for which I chose MESI) in VHDL. I just want to confirm this one thing: a write-hit on a shared cache line should cause the cache controller to send invalidation messages on the shared bus, and depending on the contention, it should stall the processor for some time, right?
I was thinking of this scenario; suppose a processor does something like this:
for (int i = 0; i < 5; ++i)
arr[i * 10] = 0; //just so each write is in a different cache line
If the array is entirely resident in the cache, and are is shared with other processors, each write will generate an invalidation message, each of which takes multiple cycles to complete; for the processor to continue execution, all these invalidation messages will have to be buffered, and the buffer wouldn't be bounded, so the write hit will have to stall the CPU for some time. Am I right about this?
Do not confuse latency with throughput. Invalidation messages will take multiple cycles to complete but you can pipeline the process. It is possible to built a pipelined cache, that is able to start the processing of new invalidation messages before the previous ones have been completed.
The MESI protocol does not require that all previous messages to different cache lines have been completed before a new message can be started.
The number of invalidation messages in flight will be bound as long as the cache offers enough throughput. If you can generate 1 invalidation message per cycle and each messages takes 10 cycles to process, but your cache can also handle 1 invalidation message per cycle, then up to 10 invalidation messages will be in flight and your processor does not have to stall on write hit of a shared line.