algorithmperformanceparallel-processingmpi

Solving for cost-optimal processes for bucket sort using Ts/pTp = 1


I am doing a class on parallelization. We're given an MPI program and told to determine the maximum big O for processes so that the program is cost-optimal.

Cost optimality is defined where big theta of efficency is 1. Efficency is defined as Tserial/pTparallel.

The algorithm we're solving is a bucket sort implementation using MPI. I won't share the code for academic integrity reasons. n is defined as the length of teh array in which to sort. p is defined as total processes used to run the program. The program consists of an O(n) operation to generate an array. It then creates p processes giving them n/p of the array to sort into buckets. This an O(n) operation. Then it performs a selection sort which is O(n^2).

During the sorting into buckets the main thread sends values to the other processes based on which bucket they are in. After they use an all gather. This gets us

Tp = n^2/p^2 + ts * log(p) + tw * (n/p) * (p-1) where ts is the overhead to start communication and tw is the time it takes to send a word over communication. We assume hypercube network topology in this class. n^2/p^2 is derived as the selection sort does an outter loop of n/p and an inner loop of n/p which multiplies to n^2/p^2.

My issue comes up when the solution solves for this problem.

To solve for cost-optimal the big theta of serial and of cost should be equal as E(Ts/pTp). Cost being pTp. We know big theta of serial is n^2 due to the selection sort being n^2.

If we take my Tp formula: Tp = n^2/p^2 + ts * log(p) + tw * (n/p) * (p-1) and multiply be p we get n^2/p + ts * p * log(p) + tw * n * (p-1).

If we solve for big theta here we get theta(n^2/p)+theta(plog(p))+theta(np) because p cannot be bigger than n we can drop the plog(p) term. So we get theta(n^2/p)+theta(np) as our cost.

From here I solved theta(n^2) = theta(n^2/p)+theta(np) where if p = O(n) you get theta(n^2)=theta(n)+theta(n^2).

The solution got the same cost function as me. But it derives Ts/pTp as O(n^2/p)+O(n) / (O(n^2/p) + pO(n)) which solves for p = O(sqrt(n)). I cannot for the life of me derive the same equation and the TA's have dodged my questions indefinitely. So I cannot for the life of me determine how to get O(n^2/p)+O(n) / (O(n^2/p) + pO(n)).


Solution

  • When calculating the parallel cost, you assume that keys will be roughly evenly distributed among buckets. This is a powerful assumption that makes bucket sort more efficient. You should make the same assumption when calculating the serial cost.

    Serially, if you evenly distribute n keys into p buckets, then selection sort each bucket, each selection sort takes O((n/p)²) time, for a total cost of Ts = O(n + n²/p).