My understanding of the semantics of volatile
in C and C++ is that it turns memory access into (observable) side effects. Whenever reading or writing to a memory mapped file (or shared memory) I would expect the pointer to be volatile qualified, to indicate that this is in fact I/O. (John Regehr wrote a very good article on the semantics of volatile
).
Furthermore, I would expect using functions like memcpy()
to access shared memory to be incorrect, since the signature suggests the volatile qualification is cast away, and the memory access not be treated as I/O.
In my mind, this is an argument in favor of std::copy()
, where the volatile qualifier won't be cast away, and memory accesses being correctly treated as I/O.
However, my experience of using pointers to volatile objects and std::copy()
to access memory mapped files is that it's orders of magnitude slower than just using memcpy()
. I am tempted to conclude that perhaps clang and GCC are overly conservative in their treatment of volatile
. Is that the case?
What guidance is there for accessing shared memory with regard to volatile
, if I want to follow the letter of the standard and have it back the semantics I rely on?
Relevant quote from the standard [intro.execution] §14:
Reading an object designated by a volatile glvalue, modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a subexpression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call to a library I/O function returns or an access through a volatile glvalue is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.
I think that you're overthinking this. I don't see any reason for mmap
or equivalent (I'll use the POSIX terminology here) memory to be volatile.
From the point of view of the compiler mmap
returns an object that is modified and then given to msync
or munmap
or the implied unmap during _Exit
. Those functions need to be treated as I/O, nothing else.
You could pretty much replace mmap
with malloc
+read
and munmap
with write
+free
and you would get most of the guarantees of when and how I/O is done.
Note that this doesn't even require the data to be fed back to munmap
, it was just easier to demonstrate it this way. You can have mmap
return a piece of memory and also save it internally in a list, then a function (let's call it msyncall
) that doesn't have any arguments that writes out all the memory all calls to mmap
previously returned. We can then build from that, saying that any function that performs I/O has an implicit msyncall
. We don't need to go that far though. From the point of view of a compiler libc is a black box where some function returned some memory, that memory has to be in sync before any other call into libc because the compiler can't know which bits of memory that were previously returned from libc are still referenced and in active use inside.
The above paragraph is how it works in practice, but how can we approach it from the point of view of a standard? Let's look at a similar problem first. For threads the shared memory is only synchronized at some very specific function calls. This is quite important because modern CPUs reorder reads and writes and memory barriers are expensive and old CPUs could need explicit cache flushes before written data was visible by others (be it other threads, processes or I/O). The specification for mmap
says:
The application must ensure correct synchronization when using mmap() in conjunction with any other file access method
but it doesn't specify how that synchronization is done. I know in practice that synchronization pretty much has to be msync
because there are still systems out there where read/write are not using the same page cache as mmap.