I'm setting up IO for a largescale CFD code using the MPI library and the file IO is starting to eat into computation time as my problems scale.
As far as I can find the "done" thing in the modern context is heavy utilisation of collective IO operations, (Performance of Parallel IO on ARCHER - whitepaper from 2015).
My problem is there appears to be three ways of calling a collective write:
MPI_File_write_all
, blocking
MPI_File_iwrite_all
, non blocking
and somewhat speculatively:
MPI_File_iwrite
followed by a call to MPI_File_sync
, nonblocking then blocking?
I say speculatively because the former call is explicitly non collective but the latter (which to my knowledge is what actually pushes the data to storage) is collective.
My question is are multiple MPI_File_iwrite
s followed by a MPI_File_sync
equivalent to a MPI_File_write_all
, in that the file sync makes the non collective write effectively collective?
Edit - for clarity here I am aware sync is a collective routine im asking whether the IO that happens when sync is called is analagous to the collective IO of a write_all.
followup: does an MPI_File_iwrite_all
call require an MPI_File_sync
call, and if it does what is the purpose of a collective non blocking write if it just becomes blocking down the line?
I'm focusing quite a bit on blocking vs non-blocking here because I'm trying to fully remove all synchronisation from my code to improve CPU utilisation (ie processes only wait if they lack the information they need from their neighbours, as opposed to waiting for all process to sync up) but obviously this becomes somewhat problematic when it comes to outputting.
Your question concerns three orthogonal MPI concepts: local completion of operations, process synchronization, and data consistency.
The main difference of blocking vs. non-blocking concerns the process-local state of the operation. A blocking operation completes before return from the blocking call; a non-blocking operation completes with a successful completion call. Until the operation completes locally, the MPI library "owns" the buffers you pass into a function.
Only a small subset of MPI functions imply synchronization. Especially collective communication does not necessarily imply synchronization.
Completion of File-IO functions does not establish data consistency (or global visibility of the operation's impact).
MPI_File_sync
estabilishes data consistency for file accesses. It is only necessary, if data written to a file should be visible to a successive read from a different process. Example 14.6 in MPI-4.1 points out that actually a sequence equivalent to MPI_File_sync
+ MPI_Barrier
+ MPI_File_sync
is necessary to establish data consistency between writing and reading from a file. The reason is that MPI_File_sync
is collective but not synchronizing.
Whether you need MPI_File_sync
at all depends an how your application accesses the file. If you need MPI_File_sync
, you need it independent of the flavor of write call. You will need it with collective write and the non-collective write functions. Using non-blocking writes, you need to locally complete (test/wait) all active File-IO operations for the file handle before you can call MPI_File_sync
.