c++mpimpich

Calculating sum of array with nonblocking operations


Every process need to calculate its partial sums and send them to 0 process, then to count sum of array I wrote this code

    double* a;
    a = new double[N];
    for (int i = 0; i < N; i++)
        a[i] = 1.0;
    int k = (N - 1) / proc_count + 1;
    int ibeg = proc_this * k;
    int iend = (proc_this + 1) * k - 1;
    if (ibeg >= N)
        iend = ibeg - 1;
    else if(iend >= N)
        iend = N - 1;
    double s = 0;
    for (int i = ibeg; i <= iend; i++)
        s += a[i];
    MPI_Status* stats = new MPI_Status[proc_count];
    MPI_Request* reqs = new MPI_Request[proc_count];
    double* inmes = new double[proc_count];
    inmes[0] = s;
    if (proc_this != 0)
        MPI_Isend(&s, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &reqs[proc_this]);
    else
        for (int i = 1; i < proc_count; i++) 
            MPI_Irecv(&inmes[i], 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &reqs[0]);
    MPI_Waitall(proc_count, reqs, stats);
    MPI_Finalize();
    if (proc_this == 0) {
        for (int i = 1; i<proc_count; i++)
            inmes[0] += inmes[i];
        printf("sum = %f", inmes[0]);
    }
    delete[] a;

but it keeps giving an error

Fatal error in PMPI_Waitall: Invalid MPI_Request, error stack:
PMPI_Waitall(274): MPI_Waitall(count=1, req_array=00000212B7B24A40, status_array=00000212B7B34740) failed
PMPI_Waitall(250): The supplied request in array element 0 was invalid (kind=3)

Could you explain what am I doing wrong?


Solution

  • In short, you need to set all elements of reqs to MPI_REQUEST_NULL right after allocating it.

    The longer answer is that MPI programs run as multiple instances of one or more source programs and each instance (rank) has its own set of variables that aren't shared. When you have:

        if (proc_this != 0)
            MPI_Isend(&s, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &reqs[proc_this]);
        else
            for (int i = 1; i < proc_count; i++) 
                MPI_Irecv(&inmes[i], 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &reqs[0]);
    

    you expect that the result will be that reqs will be full of values:

    reqs: [ Irecv req | Isend req 1 | Isend req 2 | ... ]
    

    The reality is that you'll have:

    reqs in rank 0: [ Irecv req |    ???    |    ???    | ... ??? ... ]
    reqs in rank 1: [    ???    | Isend req |    ???    | ... ??? ... ]
    reqs in rank 2: [    ???    |    ???    | Isend req | ... ??? ... ]
    etc.
    

    where ??? stands for uninitialised memory. MPI_Waitall() is a local operation and it only sees the local copy of reqs. It cannot complete requests posted by other ranks.

    Uninitialised memory can have any value in it and if that value results in an invalid request handle, MPI_Waitall() will abort with an error. If you set all the requests to MPI_REQUEST_NULL, this will not happen as null requests are ignored.

    There is also a semantic error in your code:

    MPI_Irecv(&inmes[i], 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &reqs[0]);
    

    stores all receive requests in the same place with each new request overwriting the previous one. So you will never be able to wait for any request except the last one.


    Given that you are calling MPI_Waitall() right after MPI_Isend(), there is no point in using non-blocking sends. A much cleaner version of the code would be:

        if (proc_this != 0)
            MPI_Send(&s, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
        else {
            MPI_Request* reqs = new MPI_Request[proc_count - 1];
            for (int i = 1; i < proc_count; i++) 
                MPI_Irecv(&inmes[i], 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &reqs[i-1]);
            MPI_Waitall(proc_count-1, reqs, MPI_STATUSES_IGNORE);
            delete [] reqs;
        }