Suppose a program has a caching mechanism where, at the end of some specific calculation, the program writes the output of that calculation to the disk to avoid re-computing it later, when the program is re-ran. It does so for a large number of calculations, and saves each output to separate files (one per calculation, with filenames determined by hashing the computation parameters). The data is written to the file with standard C++ streams:
void* data = /* result of computation */;
std::size_t dataSize = /* size of the result in bytes */;
std::string cacheFile = /* unique filename for this computation */;
std::ofstream out(cacheFile, std::ios::binary);
out << dataSize;
out.write(static_cast<const char *>(data), dataSize);
The calculation is deterministic, hence the data written to a given file will always be the same.
Question: is it safe for multiple threads (or processes) to attempt this simultaneously, for the same calculation, and with the same output file? It does not matter if some threads or processes fail to write the file, as long as at least one succeeds, and as long as all programs are left in a valid state.
In the manual tests I ran, no program failure or data corruption occurred, and the file was always created with the correct content, but this may be platform-dependent. For reference, in our specific case, the size of the data ranges from 2 to 50 kilobytes.
is it safe for multiple threads (or processes) to attempt this simultaneously, for the same calculation, and with the same output file?
It is a race condition when multiple threads try to write into the same file, so that you may end up with a corrupted file. There is no guarantee that ofstream::write
is atomic and that depends on a particular filesystem.
The robust solution for your problem (works both with multiple threads and/or processes):
rename
to not move data).rename
the temporary file to its final name. It replaces the existing file if one is there. Non-portable renameat2
is more flexible.