I have a working serial code and a working parallel single GPU code parallelized via OpenACC. Now I am trying to increase the parallelism by running on multiple GPUs, employing mpi+openacc paradigm. I wrote my code in Fortran-90 and compile it using Nvidia's HPC-SDK's nvfortran compiler.
I have a few beginner level questions:
As @Vladimir F pointed out, your question is very broad, so if you have further questions about specific points you should consider posting each point individually. That said, I'll try to answer each.
mpif90
for everything. For instance, mpif90 -acc=gpu
will build the files with OpenACC to include GPU support and files that don't include OpenACC will compile normally. The MPI module should be found automatically during compilation and the MPI libraries will be linked in.acc host_data use_device
directive to pass the GPU version of your data to MPI. I don't have a Fortran example with MPI, but it looks similar to the call in this file. https://github.com/jefflarkin/openacc-interoperability/blob/master/openacc_cublas.f90#L19host_data
directive I referenced in 3. If I find another, I'll update this answer. It's a common pattern, but I don't have an open code handy at the moment. https://github.com/UK-MAC/CloverLeaf_OpenACC