c++multithreadingconcurrencymemory-barrierslockless

Multithread share 2 variable problem with nonlock


I have a question about multithread share variable problem. the two variable is like:

{
    void* a;
    uint64_t b;
}

only one thread can modify the two variable, other thread will frequently read these two variable. I want to change a and b at one time, other thread will see the change together(see new value a and new value b). Because many thread will frequently read these two variables, so I don't want to add lock, I want to ask if there is a method to combine change a and b operation, make it like a atomic operation? like use memory fence, will it work? Thank you!


Solution

  • You're looking for a SeqLock.

    It's ideal for this use-case, especially with infrequently-changed data. (e.g. like a time variable updated by a timer interrupt, read all over the place.)

    SeqLock advantages include perfect read-side scaling (readers don't need to get exclusive ownership of any cache lines, they're truly read-only not just lock-free), so any number of readers can read as often as they like with zero contention with each other. The downside is occasional retry, if a reader happens to try to read at just the wrong time. That's rare, and doesn't happen when the writer hasn't just written something.

    So readers aren't quite wait-free, and in fact if the writer sleeps at just the wrong time, the readers are stuck retrying until it wakes up again! So overall the algorithm isn't even lock-free or obstruction-free. But the very common fast-path is just two extra reads from the same cache line as the data, and whatever is necessary for LoadLoad ordering in the reader. If there's been no write since the last read, the loads can all be L1d cache hits.


    The only thing better is if you have efficient 16-byte atomic stores and loads, like Intel (but not AMD yet) CPUs with AVX, if your compiler / libatomic uses it for 16-byte loads of std::atomic<struct_16bytes> instead of x86-64 lock cmpxchg16b. (In practice most AMD CPUs are though to have atomic 16-byte load/store as well, but only Intel has officially put it in their manuals that the AVX feature bit implies atomicity for aligned 128-bit load/store such as movaps, so compilers can safely start uting it.)

    Or AArch64 guarantees 16-byte atomicity for plain stp / ldp in ARMv8.4 I think.

    But without those hardware features, and compiler+options to take advantage of them, 16-byte loads often get implemented as an atomic RMW, meaning each reader takes exclusive ownership of the cache line. That means reads contend with other reads, instead of the cache line staying in shared state, hot in the cache of every core that's reading it.


    like use memory fence, will it work?

    No, memory fences can't create atomicity (glue multiple operations into a larger transaction), only create ordering between operations.

    Although you could say that the idea behind a SeqLock is to carefully order the write and reads (wrt. to sequence variable) in order to detect torn reads and retry when it happens. So yes, barriers are important for that.