In the memory barrier documentation of the linux kernel (Documentation/memory-barriers.txt), there are examples showing that a writer after a memory barrier is visible before a write preceding the memory barrier to other CPUs. How can this happen? Why is the write barrier not sufficient in ordering these writes?
In particular the following:
843 CPU 1 CPU 2
844 ======================= =======================
845 { B = 7; X = 9; Y = 8; C = &Y }
846 STORE A = 1
847 STORE B = 2
848 <write barrier>
849 STORE C = &B LOAD X
850 STORE D = 4 LOAD C (gets &B)
851 LOAD *C (reads B)
852
853 Without intervention, CPU 2 may perceive the events on CPU 1 in some
854 effectively random order, despite the write barrier issued by CPU 1:
855
856 +-------+ : : : :
857 | | +------+ +-------+ | Sequence of update
858 | |------>| B=2 |----- --->| Y->8 | | of perception on
859 | | : +------+ \ +-------+ | CPU 2
860 | CPU 1 | : | A=1 | \ --->| C->&Y | V
861 | | +------+ | +-------+
862 | | wwwwwwwwwwwwwwww | : :
863 | | +------+ | : :
864 | | : | C=&B |--- | : : +-------+
865 | | : +------+ \ | +-------+ | |
866 | |------>| D=4 | ----------->| C->&B |------>| |
867 | | +------+ | +-------+ | |
868 +-------+ : : | : : | |
869 | : : | |
870 | : : | CPU 2 |
871 | +-------+ | |
872 Apparently incorrect ---> | | B->7 |------>| |
873 perception of B (!) | +-------+ | |
874 | : : | |
875 | +-------+ | |
876 The load of X holds ---> \ | X->9 |------>| |
877 up the maintenance \ +-------+ | |
878 of coherence of B ----->| B->2 | +-------+
879 +-------+
880 : :
881
882
883 In the above example, CPU 2 perceives that B is 7, despite the load of *C
884 (which would be B) coming after the LOAD of C.
The write barrier does order the write correctly.
As the following text explains, the problem is that CPU 2 can read *C
before C
because it does not use any kind of read barrier.