atomicshared-memorymemory-barriersinterprocessposix-mq

Should I handle the memory order when using Posix-MQ and SHM within multi-processes?


I'm using a block of SHM to sharing data within multiple Linux processes. When the producer put some data into the SHM, it sends a message to the consumer through a Posix-MQ. This message carries a offset of the SHM, with it the consumer can exactly access the data.

I'm worrying about there is a memory order/barrier problem also like that we are often facing within multi-threads schema. The difference is that we are using a Posix-MQ. Is there any synchronizing mechanism inside it like std::memory_order operations of a atomic variable in STL? If so, I don't need to do any special treatments. In another words, in the view of the consumer, the data is definitely ready/clean as long as the consumer received the message.

Even though I have tested in my own program, but I don't have sufficient confidence to let it land in my production code.


Solution

  • The POSIX spec unfortunately doesn't include a formal memory model, and isn't very careful about specifying things like memory ordering. The best we get is a list in 4.12 of various system calls that "synchronize memory with respect to other threads" (a term which is not itself defined in the spec). The list includes the obvious pthread_* and sem_* calls, but not mq_*.

    That said, it seems to me like an entirely reasonable assumption that the message queue calls would act as appropriate memory barriers. I've tended to interpret POSIX as intending to say that there is sequential consistency for system calls that have some globally visible effect on the system, since I think that's what everyone would have assumed in the days when it was written. So for instance, if you write a byte to a file in one process, and then read it in another, that should likewise act as an acquire/release pair, and any shared memory accesses done before the file write should be globally visible after the file read. I'd expect the same to apply to message queues.

    On a more practical level, it seems to me that an implementation would have to include appropriate barriers in order for the mq_* functions to operate correctly in and of themselves, and so you get the benefit of that as well.

    If you want to use extra caution, you could precede your mq_send with an atomic_thread_fence(memory_order_release), and follow your mq_recv with an atomic_thread_fence(memory_order_acquire). The overhead should be negligible compared to the cost of the message queue operations themselves.

    On strongly ordered systems like x86, there'd be no overhead at all. There, normal loads and stores already have acquire/release semantics, so the fence doesn't emit any barrier instructions. It only acts as a compiler barrier, and since the mq_* system call functions are presumably opaque to the compiler, it wouldn't have reordered memory accesses around them anyway.