parallel-processingmpidistributed-computingmessage-passing

All to All broadcasting implementation and MPI


I am learning MPI and currently trying to implement a all to all broadcasting using send and receive operations. I know that number of processors is going to be the power of 2 and I wanted to implement an effective solution for this problem. I structured my thoughts like following.

all to all in a balanced binary tree with 8 processors

First, the red copy and exchange operations are done. Then the green operations should be completed and finally, the purple operations would conclude the all to all message sharing.

Assuming:

Some of the steps:

  1. Processor 0 shares its message with Processor 7 and vice versa. So they both have messages A and H.
  2. Processor 1 shares its message with Processor 6 and vice versa. So they both have messages B and G.
  3. Processor 2 shares its message with Processor 5 and vice versa. So they both have messages C and F.
  4. Processor 3 shares its message with Processor 4 and vice versa. So they both have messages D and E. (The red operations are done at this step.)
  5. Processor 0 shares its message with Processor 3 and vice versa. So they both have messages A H D and E.
  6. Processor 1 shares its message with Processor 2 and vice versa. So they both have messages B G C and F

...

I could not write a satisfying code for finding out the pair given the rank of the current processor. I thought finding out all the possible pairs recursively at the beginning like a look-up table to be used. But I would like to ask if there is a better way I could continue with? Also is the approach correct?


Solution

  • Both based on the comment of the Gilles Gouaillardet and other research I have conducted on this problem, I solved my problem and wanted to share the methodology I used in doing so.

    First of all, before implementing an All-to-all broadcasting, I highly suggest you to take a look at this page. Since for my problem I knew the number of processors would be power of 2, I implemented the algorithm using hypercube approach.

    procedure ALL_TO_ALL_BC_HCUBE(my_id, my_msg, d, result) 
    begin 
       result := my_msg; 
       for i := 0 to d - 1 do 
           partner := my id XOR 2i; 
           send result to partner; 
           receive msg from partner; 
           result := result U msg; //U for union
       endfor; 
    end ALL_TO_ALL_BC_HCUBE 
    

    the algorithm is taken from the mentioned page

    I compared my results with the built-in MPI_Alltoall routine. The runtime for my implementation seems to be working around 1.5x-2x faster than the built-in version.