I am having some problems with an MPI code (written by me to test for another program where different workloads are associated to different processors). The problem is that when I use a different number of processors than 1 or arraySize(4 in this case), the program is blocked during MPI_Send, in particular when I run
mpirun -np 2 MPItest
the program is blocked during the call. I am not using any debugger for now, i just want to understand why it works with 1 and 4 processors, but it does not with 2 processors(2 spots in the array per processor), the code is below:
#include <mpi.h>
#include <iostream>
int main(int argc, char** argv) {
int rank, size;
const int arraySize = 4;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// every processor have a different workload (1 or more spots on the array to send to the other processors)
// every processor sends to every other processor its designated spots
int* sendbuf = new int[arraySize];
int* recvbuf = new int[arraySize];
int istart = arraySize/size * rank;
int istop = (rank == size) ? arraySize : istart + arraySize/size;
for (int i = istart; i < istop; i++) {
sendbuf[i] = i;
}
std::cout << "Rank " << rank << " sendbuf :" << std::endl;
//print the sendbuf before receiving its other values
for (int i = 0; i < arraySize; i++) {
std::cout << sendbuf[i] << ", ";
}
std::cout << std::endl;
// sending designated spots of sendbuf to other processors
for(int i = istart; i < istop; i++){
for(int j = 0; j < size; j++){
MPI_Send(&sendbuf[i], 1, MPI_INT, j, i, MPI_COMM_WORLD);
}
}
// receiving the full array
for(int i = 0; i < arraySize ; i++){
int recvRank = i/(arraySize/size);
MPI_Recv(&recvbuf[i], 1, MPI_INT, recvRank, i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
// print the recvbuf after receiving its other values
std::cout << "Rank " << rank << " recvbuf :" << std::endl;
for (int i = 0; i < arraySize; i++) {
std::cout << recvbuf[i] << ", ";
}
std::cout << std::endl;
delete[] sendbuf;
delete[] recvbuf;
MPI_Finalize();
return 0;
}
I am using the tags to differentiate between different spots in the array (maybe that is the problem?)
I tried using different numbers of processors, with 1 processor the program works, also with 4 processors the program also works, with 3 processors it crashes, and with 2 processors the program is blocked. I also tried using MPI_Isend but it doesn't work either (the flag is 0), the modified code with MPI_Isend is below:
#include <mpi.h>
#include <iostream>
int main(int argc, char** argv) {
int rank, size;
const int arraySize = 4;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// every processor have a different workload (1 or more spots on the array to send to the other processors)
// every processor sends to every other processor its designated spots
int* sendbuf = new int[arraySize];
int* recvbuf = new int[arraySize];
int istart = arraySize/size * rank;
int istop = (rank == size) ? arraySize : istart + arraySize/size;
for (int i = istart; i < istop; i++) {
sendbuf[i] = i;
}
std::cout << "Rank " << rank << " sendbuf :" << std::endl;
//print the sendbuf before receiving its other values
for (int i = 0; i < arraySize; i++) {
std::cout << sendbuf[i] << ", ";
}
std::cout << std::endl;
// sending designated spots of sendbuf to other processors
for(int i = istart; i < istop; i++){
for(int j = 0; j < size; j++){
MPI_Request request;
//MPI_Send(&sendbuf[i], 1, MPI_INT, j, i, MPI_COMM_WORLD);
MPI_Isend(&sendbuf[i], 1, MPI_INT, j, i, MPI_COMM_WORLD, &request);
// control if the send is completed
int flag = 0;
MPI_Test(&request, &flag, MPI_STATUS_IGNORE);
const int numberOfRetries = 10;
if(flag == 0){ // operation not completed
std::cerr << "Error in sending, waiting" << std::endl;
for(int k = 0; k < numberOfRetries; k++){
MPI_Test(&request, &flag, MPI_STATUS_IGNORE);
if(flag == 1){
break;
}
}
if(flag == 0){
std::cerr << "Error in sending, aborting" << std::endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}
}
}
// receiving the full array
for(int i = 0; i < arraySize ; i++){
int recvRank = i/(arraySize/size);
MPI_Recv(&recvbuf[i], 1, MPI_INT, recvRank, i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
// print the recvbuf after receiving its other values
std::cout << "Rank " << rank << " recvbuf :" << std::endl;
for (int i = 0; i < arraySize; i++) {
std::cout << recvbuf[i] << ", ";
}
std::cout << std::endl;
//MPI_Alltoall(sendbuf, 1, MPI_INT, recvbuf, 1, MPI_INT, MPI_COMM_WORLD);
delete[] sendbuf;
delete[] recvbuf;
MPI_Finalize();
return 0;
}
with this code, also -np 4 doesn't work either
Since I have not received any answer yet to the problem, I want to add some insight of my problem to help some people if they find themselves in the same conditions.
I tested another code to see if the OpenMPI standard on my laptop worked well since there were too many problems that were not wrong for the standard and even examples of code on the internet that won't work on my laptop. I tested the following code, a very simple code that sends a part of an array between two processes:
#include <mpi.h>
#include <iostream>
int main(int argc, char** argv) {
int rank, size;
const int arraySize = 5;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// initialize sendbuf
int* sendbuf = new int[arraySize];
for(int iteration = 0; iteration < 3; iteration++){
if(rank){
std::cout << "Rank " << rank << " sendbuf :" << std::endl;
for (int i = 0; i < arraySize; i++) {
std::cout << sendbuf[i] << ", ";
}
std::cout << std::endl;
}
// first process send first three elements to second process
if(rank == 0){
for(int i = 0; i < 3; i++){
sendbuf[i] = i;
}
MPI_Send(&sendbuf[0], 3, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else {
for(int i = 3; i < 5; i++){
sendbuf[i] = i;
}
}
// receive the full array with MPI_Wait
if(rank){
// second process receive the first three elements from first process
MPI_Recv(&sendbuf[0], 3, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
// print the full array
if(rank){
std::cout << "Rank " << rank << " sendbuf after:" << std::endl;
for (int i = 0; i < arraySize; i++) {
std::cout << sendbuf[i] << ", ";
}
std::cout << std::endl;
}
// reset MPI requests and buffers
for(int i = 0; i < arraySize; i++){
sendbuf[i] = -1;
}
}
MPI_Finalize();
}
I wanted to see if a single send and a single receive would work in a loop on my laptop, and surprise to me (after two days of trying everything), it is a problem of my laptop and the OpenMPI implementation. I tested this code on a cluster that I have and where the MPI implementation works to see if it was a problem of my hardware or not. The code works on the cluster, but not on my laptop.
To conclude, this is the hardware that I have:
This is not a solution but answers my question of why the code was not working.
As pointed out by @GillesGouaillardet , it seems it was a problem of the default network interface used with mpirun, specifying a network interface with no firewall rules seems to be the solution.