c++11 atomic lock-free memory-fences memory-barriers

lock-free synchronization, fences and memory order (store operation with acquire semantics)

I am migrating a project that was run on bare-bone to linux, and need to eliminate some {disable,enable}_scheduler calls. :)

So I need a lock-free sync solution in a single writer, multiple readers scenario, where the writer thread cannot be blocked. I came up with the following solution, which does not fit to the usual acquire-release ordering:

class RWSync {
    std::atomic<int> version; // incremented after every modification
    std::atomic_bool invalid; // true during write
public:
  RWSync() : version(0), invalid(0) {}
  template<typename F> void sync(F lambda) {
    int currentVersion;
    do {
      do { // wait until the object is valid
        currentVersion = version.load(std::memory_order_acquire);
      } while (invalid.load(std::memory_order_acquire));
      lambda();
      std::atomic_thread_fence(std::memory_order_seq_cst);
      // check if something changed
    } while (version.load(std::memory_order_acquire) != currentVersion
        || invalid.load(std::memory_order_acquire));
  }
  void beginWrite() {
    invalid.store(true, std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);
  }
  void endWrite() {
    std::atomic_thread_fence(std::memory_order_seq_cst);
    version.fetch_add(1, std::memory_order_release);
    invalid.store(false, std::memory_order_release);
  }
}

I hope the intent is clear: I wrap the modification of a (non-atomic) payload between beginWrite/endWrite, and read the payload only inside the lambda function passed to sync().

As you can see, here I have an atomic store in beginWrite() where no writes after the store operation can be reordered before the store. I did not find suitable examples, and I am not experienced in this field at all, so I'd like some confirmation that it is OK (verification through testing is not easy either).

Is this code race-free and work as I expect?
If I use std::memory_order_seq_cst in every atomic operation, can I omit the fences? (Even if yes, I guess the performance would be worse)
Can I drop the fence in endWrite()?
Can I use memory_order_acq_rel in the fences? I don't really get the difference -- the single total order concept is not clear to me.
Is there any simplification / optimization opportunity?

+1. I happily accept any better idea as the name of this class :)

Solution

The code is basically correct.

Instead of having two atomic variables (version and invalid) you may use single version variable with semantic "Odd values are invalid". This is known as "sequential lock" mechanism.

Reducing number of atomic variables simplifies things a lot:

class RWSync {
    // Incremented before and after every modification.
    // Odd values mean that object in invalid state.
    std::atomic<int> version; 
public:
  RWSync() : version(0) {}
  template<typename F> void sync(F lambda) {
    int currentVersion;
    do {
      currentVersion = version.load(std::memory_order_seq_cst);
      // This may reduce calls to lambda(), nothing more
      if(currentVersion | 1) continue;

      lambda();

      // Repeat until something changed or object is in an invalid state.
    } while ((currentVersion | 1) ||
        version.load(std::memory_order_seq_cst) != currentVersion));
  }
  void beginWrite() {
    // Writer may read version with relaxed memory order
    currentVersion = version.load(std::memory_order_relaxed);
    // Invalidation requires sequential order
    version.store(currentVersion + 1, std::memory_order_seq_cst);
  }
  void endWrite() {
    // Writer may read version with relaxed memory order
    currentVersion = version.load(std::memory_order_relaxed);
    // Release order is sufficient for mark an object as valid
    version.store(currentVersion + 1, std::memory_order_release);
  }
};

Note the difference in memory orders in beginWrite() and endWrite():

endWrite() makes sure that all previous object's modifications have been completed. It is sufficient to use release memory order for that.
beginWrite() makes sure that reader will detect object being in invalid state before any futher object's modification is started. Such garantee requires seq_cst memory order. Because of that reader uses seq_cst memory order too.

As for fences, it is better to incorporate them into previous/futher atomic operation: compiler knows how to make the result fast.

Explanations of some modifications of original code:

1) Atomic modification like fetch_add() is intended for cases, when concurrent modifications (like another fetch_add()) are possible. For correctness, such modifications use memory locking or other very time-costly architecture-specific things.

Atomic assignment (store()) does not use memory locking, so it is cheaper than fetch_add(). You may use such assignment because concurrent modifications are not possible in your case (reader does not modify version).

2) Unlike to release-acquire semantic, which differentiate load and store operations, sequential consistency (memory_order_seq_cst) is applicable to every atomic access, and provide total order between these accesses.