caching cpu-architecture cpu-cache write-through

In a multilevel cache system does write-through policy allows to write to all caches till main memory?

In a multi-level cache system containing L1,L2,L3 cache and finally a main memory. L1 cache follows write through policy, L2 & L3 caches follow write back policy in case of hit.

Case1: Some address A is present in L1 cache, present in L2 cache, present in L3 cache.

Case2: Some address A is present in L1 cache, absent in L2 cache, present in L3 cache.

We write to an address A, and it is a hit in L1, so does the data to be stored at A is written at L1,L2, L3 cache as well as main memory or just the L1 and L2 cache or just L1 cache and in the main memory?

For the above situation what is the proper flow of data through the caches till the main memory given the write policies in case of hit for the cache for both the cases?

I tried looking through Hennessey and Patterson textbook which says "write through more viable for the upper-level caches, as the writes need only propagate to the next lower level rather than all the way to main memory", but many references say it writes to the DRAM as well.

Also in the flowchart for the write policies available in Wikipedia at link: Write Policies does the term "lower memory" mean just the next lower cache, all the lower caches, or the main memory only? That is when in the flowchart they say "Write data into lower memory" do they mean write to just the next lower cache only or do they mean write to all lower caches + main memory?

Solution

Write-through caches don't have to be write-allocate; on miss they could just pass the write along to the outer level without doing anything else.

Hit or miss, by definition they always pass the write along to the next level, which might be a write-back cache that on hit will avoid bothering any further levels of the memory hierarchy. How far a write propagates outward depends on which levels of cache are write-back.

(On miss, a write-back cache does a read-for-ownership the get the values for the rest of the cache line, so it can merge this write into it and have the line in Modified state, with the dirty bit set. It might also have to evict a dirty line, so a store miss on a WB cache can result in a different line being sent to the next level outward.)

write through more viable for the upper-level caches, as the writes need only propagate to the next lower level rather than all the way to main memory

Patterson & Hennessey are saying that write-through is more usable for inner caches (like L1) because there's still another cache which can be write-back to act as a backstop, to avoid limiting cache write speeds to DRAM speed.

Generally write-through caches suck for CPU workloads and aren't used these days. (Bulldozer-family's L1d cache was a notable exception, although they used a small 4K write-combining buffer to make it less bad. Even then it was one of Bulldozer-family's more notable design choices that didn't prove to be very good. AMD's Zen was much more successful.)

A write-through L2 or L3 would be much stranger than a small write-through L1 paired with a larger write-back L2 that's also per-core private. (In that case the L1 mostly exists as a bandwidth filter for reads but not writes, with the L2 still necessary to cache writes before they have to go outside the CPU core.)