c++parallel-processing mpi communication broadcasting

What's the point of using MPI_Bcast when all ranks can see the data which will be broadcasted?

I'm just thinking about the reason behind using MPI_Bcast because when I do not broadcast integer N to all ranks, they can see the N. Look at the code and its result. Both before and after broadcasting, the integer N is visible to all ranks. so, what's the point here? Also, Does this make sense that using MPI_Bcast changes the order that each rank is called?

#include <iostream>
#include "mpi.h"
using namespace std; 
int main()
{
   MPI_Init(NULL, NULL);
    int rank, size; 
    int N=9 ;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    cout << " Hello from rank : " << rank << " N is: " << N << endl;
    MPI_Bcast(&N, 1, MPI_INT, 0, MPI_COMM_WORLD);
    cout << " Hello from rank : " << rank << " N is: " <<N<<endl;
    MPI_Finalize();

}

Result:

 Hello from rank : 1 N is: 9
 Hello from rank : 3 N is: 9
 Hello from rank : 0 N is: 9
 Hello from rank : 2 N is: 9
 Hello from rank : 0 N is: 9
 Hello from rank : 1 N is: 9
 Hello from rank : 2 N is: 9
 Hello from rank : 3 N is: 9

Solution

MPI_Bcast gets useful in cases where N in your example would actually be changed by the broadcasting rank. Consider this instead:

#include <iostream>
#include "mpi.h"

using namespace std;

int main()
{
    MPI_Init(NULL, NULL);
    int rank, size;
    int N=9 ;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    cout << " Hello from rank : " << rank << " N is: " << N << endl;

    // Change N on the rank that will broadcast
    if (rank == 0) {
        N = 18;
    }

    MPI_Bcast(&N, 1, MPI_INT, 0, MPI_COMM_WORLD);
    cout << " Hello from rank : " << rank << " N is: " <<N<<endl;
    MPI_Finalize();
}

Rank 0 changed N and wants to send its own value to all other ranks. It can send it to each rank individually using MPI_Send and MPI_Recv or it can use some form of collective communication, like MPI_Bcast.

MPI_Bcast is the right choice though, because it is optimized for the functionality it provides: sending information from one to all. I advice you check out its communication algorithm to understand.

Since communication is the bottleneck of CPU parallelization, one wants to choose the algorithms for communication carefully. The rule of thumb is that whatever was designed for collective communication, should be used for such, instead of the point-to-point send-receive approach (i.e. rank 0 sends to each other rank individually).

The one-to-all broadcasting is actually a common situation in practice, at least in scientific computing, where for example, one rank needs to read input parameters, prepare the grounds for simulation, and then broadcast some necessary information to all other ranks. Broadcasting may also be needed during the computations, where one rank needs to share its results with a subset of others, or all of them.