linux-kernelshared-memorymemory-model

How can a writer after a barrier be visible before a write preceding the barrier?


In the memory barrier documentation of the linux kernel (Documentation/memory-barriers.txt), there are examples showing that a writer after a memory barrier is visible before a write preceding the memory barrier to other CPUs. How can this happen? Why is the write barrier not sufficient in ordering these writes?

In particular the following:

843         CPU 1                   CPU 2
844         ======================= =======================
845                 { B = 7; X = 9; Y = 8; C = &Y }
846         STORE A = 1
847         STORE B = 2
848         <write barrier>
849         STORE C = &B            LOAD X
850         STORE D = 4             LOAD C (gets &B)
851                                 LOAD *C (reads B)
852 
853 Without intervention, CPU 2 may perceive the events on CPU 1 in some
854 effectively random order, despite the write barrier issued by CPU 1:
855 
856         +-------+       :      :                :       :
857         |       |       +------+                +-------+  | Sequence of update
858         |       |------>| B=2  |-----       --->| Y->8  |  | of perception on
859         |       |  :    +------+     \          +-------+  | CPU 2
860         | CPU 1 |  :    | A=1  |      \     --->| C->&Y |  V
861         |       |       +------+       |        +-------+
862         |       |   wwwwwwwwwwwwwwww   |        :       :
863         |       |       +------+       |        :       :
864         |       |  :    | C=&B |---    |        :       :       +-------+
865         |       |  :    +------+   \   |        +-------+       |       |
866         |       |------>| D=4  |    ----------->| C->&B |------>|       |
867         |       |       +------+       |        +-------+       |       |
868         +-------+       :      :       |        :       :       |       |
869                                        |        :       :       |       |
870                                        |        :       :       | CPU 2 |
871                                        |        +-------+       |       |
872             Apparently incorrect --->  |        | B->7  |------>|       |
873             perception of B (!)        |        +-------+       |       |
874                                        |        :       :       |       |
875                                        |        +-------+       |       |
876             The load of X holds --->    \       | X->9  |------>|       |
877             up the maintenance           \      +-------+       |       |
878             of coherence of B             ----->| B->2  |       +-------+
879                                                 +-------+
880                                                 :       :
881 
882 
883 In the above example, CPU 2 perceives that B is 7, despite the load of *C
884 (which would be B) coming after the LOAD of C.

Solution

  • The write barrier does order the write correctly.

    As the following text explains, the problem is that CPU 2 can read *C before C because it does not use any kind of read barrier.