[SOLVED] Memory barriers in virtual environments - do they interrupt other cores?

Memory barriers in virtual environments - do they interrupt other cores?

Let's say I call a memory barrier like:

std::atomic_thread_fence(std::memory_order_seq_cst);

From the documentation I read that this implement strong ordering among all cores, even for non atomic operations, and that it's very expensive so it should be used sparingly.

My questions are:

If I'm running in a VM on a cloud provider, do my fences interrupt other guests on the machine?
If not, how's that possible since this is an op implemented in hardware and not software?
Does this depend on the specific virtualization technology? Does KVM/QEMU implement this differently from GCP or AWS machines?

Solution

Fences are local, affecting only the current thread. In terms of hardware, only the current logical core executing this thread. The cost of one thread executing a fence doesn't scale with the number of cores in the machine. (It does potentially scale with the size of the core, the number of in-flight loads and stores it has to wait for.)

this implements strong ordering among all cores

Only if all threads of your program use seq_cst memory order for operations and fences. If you look at the C++ standard, an operation or operation+fence only synchronizes with another non-relaxed operation, or an operation+fence. (See https://preshing.com/20130922/acquire-and-release-fences/ for example.)

The C++ guarantee of sequential consistency for data-race free programs only applies if all atomic operations are seq_cst, or if you use equivalent fences. One thread using a fence can't necessarily recover sequential consistency when other threads are using relaxed or release and acquire operations on std::atomic. std::mutex operations are only acquire and release, but that's fine because the semantics of a lock provide additional constraints on what orders can happen.

An SC fence (full barrier) is local to the (logical) core executing it, draining the store buffer and finishing earlier loads before any later loads can execute or later stores can commit. It doesn't even have to block out-of-order exec of ALU work on that core. But usually memory loads are part of most dependency chains, so a full barrier is pretty expensive, hurting instruction-level parallelism around it a lot on the one logical core which executed it.

Zero effect on other cores, though, so not interrupting anything. Fence instructions run in user-space, not a system-call either.

What you're thinking of is like the Linux membarrier(2) system call which does indeed have to interrupt every other core to run a barrier there, allowing you to make some threads very fast (requiring only compiler barriers against compile-time reordering, in C++ terms like atomic_signal_fence), at the cost of making the slow path very costly.

https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ - Real machines are like this model, with coherent cache and local reordering. So fences only need to have local effects to satisfy the C++ memory model.
Does a memory barrier acts both as a marker and as an instruction? - some discussion of microarchitectural effects / implementation of fence instructions.
How is load->store reordering possible with in-order commit? and - some about how CPUs naturally create memory reordering.
Can a speculatively executed CPU branch contain opcodes that access RAM? - yes, because of store buffers. A full barrier has to wait for it to drain, unlike acquire, release, or acq_rel barriers. (e.g. acquire barriers just wait for earlier loads before later loads/stores, not earlier stores.)
How does memory reordering help processors and compilers? - to hide cache miss latency, and for StoreLoad, to hide even cache hit latency.
Does hardware memory barrier make visibility of atomic operations faster in addition to providing necessary guarantees? - no, they just wait (blocking execution of later work) until earlier loads and/or stores complete on their own.
Out of Order Execution and Memory Fences - some other people's explanations of what memory fences are.