c++multithreadingatomicmemory-fencesmemory-barriers

Which fences exactly provided by std::memory_order in C++?


As I know std::memory_order enum provide memory fences but I need be sure about fences which provided by each std::memory_order enum element. Below i am explain as i understand each of std::memory_order enum element:

  1. std::memory_order_relaxed - no fence provided
  2. std::memory_order_acquire - LoadLoad_LoadStore
  3. std::memory_order_release - LoadStore_StoreStore
  4. std::memory_order_consume - usually equals to memory_order_acquire
  5. std::memory_order_acq_rel - LoadLoadLoadStore_LoadStoreStoreStore ???
  6. std::memory_order_seq_cst - StoreLoad_StoreLoad ???

About first 4 elements, i am not sure. But about last 2 elements, I don`t know anything.

Anyone exactly knows about that?

Also, I need know in which place compiler puts memory fence when using std::atomic or std::atomic_flag?

As I understand, using fences with atomics means apply fence and perform operation. Am I right? For example:

atomic.load(std::memory_order_acquire);

means apply memory_order_acquire fence and load data atomically?


Solution

  • Anyone exactly knows about that?

    Sure, there are a variety of sources, for example C++ Reference:

    memory_order_relaxed — Relaxed operation: there are no synchronization or ordering constraints imposed on other reads or writes, only this operation's atomicity is guaranteed.

    memory_order_consume — A load operation with this memory order performs a consume operation on the affected memory location: no reads or writes in the current thread dependent on the value currently loaded can be reordered before this load. Writes to data-dependent variables in other threads that release the same atomic variable are visible in the current thread. On most platforms, this affects compiler optimizations only.

    memory_order_acquire — A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread.

    memory_order_release — A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic.

    memory_order_acq_rel — A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store. All writes in other threads that release the same atomic variable are visible before the modification and the modification is visible in other threads that acquire the same atomic variable.

    memory_order_seq_cst — A load operation with this memory order performs an acquire operation, a store performs a release operation, and read-modify-write performs both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order.

    Please also have a look at atomic<> Weapons presentation by Herb Sutter, which explains a lot.

    Also, I need know in which place compiler puts memory fence

    This is architecture-dependent. On some architectures it is a no op, on some it is a instruction prefix, on some it will be a special instruction before/after the load/store.

    There is a paper called "Memory Barriers: a Hardware View for Software Hackers", which analyses barriers on many architectures if you are interested.

    For example: atomic.load(std::memory_order_acquire); means apply memory_order_acquire fence and load data atomically?

    This is architecture-dependent too, but for acquire barrier I would say quite the opposite: we load a variable and then we make sure no further reads/writes go before the load, i.e. put a fence.

    But on some platforms it could be a single processor instruction. For example on ARMs there is a load acquire (LDA) and store-release (STL) instructions.