csortingmergesortsorting-network

How to fix this non-recursive odd-even-merge sort algorithm?


I was searching for non-recursive odd-even-merge sort algorithm and found 2 sources:

Both algorithms are identical but false. The resulting sorting network is not an odd-even-merge sort network.

Here is an image of the resulting network with 32 inputs. A vertical line between 2 horizontal lines means compare value a[x] with a[y], if greater then swap the values in the array.

odd-even-merge sort for 32 inputs
(source: flylib.com)
(clickable)

I copied the code from Java to C and replaced the exch function by a printf to print the exchange candidates.

When one draws a diagram of the pairs, it can be seen that too many pairs are generated.

Has anyone an idea how to fix this algorithm?

Why do I need non-recursive version?
I want to transform this sorting network into hardware. It's easy to insert pipeline stages into a non-recursive algorithms.

I also investigated the recursive version, but it's too complex to transform the algorithm into pipelined hardware.

My C code:

#include <stdlib.h>
#include <stdio.h>

void sort(int l, int r)
{ int n = r-l+1;

  for (int p=1; p<n; p+=p)
    for (int k=p; k>0; k/=2)
      for (int j=k%p; j+k<n; j+=(k+k))
        for (int i=0; i<n-j-k; i++)
          if ((j+i)/(p+p) == (j+i+k)/(p+p))
              printf("%2i cmp %2i\n", l+j+i, l+j+i+k);
}
int main(char* argv, int args)
{ const int COUNT = 8;
  sort(0, COUNT);
}

The result:

0 -o--------o-------------------------o---------------o-------------------------
   |        |                         |               |
1 -o--------|-o------o----------------|-o-------------o-o-----------------------
            | |      |                | |               |
2 -o-o------o-|------o-o--------------|-|-o----o--------o-o---------------------
   | |        |        |              | | |    |          |
3 -o-o--------o--------o--------------|-|-|-o--|-o--------o-o-------o-----------
                                      | | | |  | |          |       |
4 -o-o-o----o---o----o-----o----------o-|-|-|--o-|-o--------o-o-----o-o---------
   | | |    |   |    |     |            | | |    | |          |       |
5 -o-o-o----|-o-|-o--o-o---o-o---o------o-|-|----o-|-o--------o-o-----o-o---o---
            | | | |    |     |   |        | |      | |          |       |   |
6 -o-o-o-o--o-|-o-|----o-o---o-o-o-o------o-|------o-|----------o-o-----o-o-o-o-
   | | | |    |   |      |     |   |        |        |            |       |   |
7 -o-o-o-o----o---o------o-----o---o--------o--------o------------o-------o---o-

When I know the correct exchange pairs and the algorithm is equal to the image, I'll translate it into VHDL for tests on my hardware platform.

Other open source hardware sorting network implementations:


Appendix:
Odd-even mergesort (a.k.a Batcher's sort) is like bitonic sort (not to confuse with Batcher's bitonic sort). But in hardware this algorithm has a better size complexity than bitonic sort, while latency is the same.

These algorithm can be implemented with good resource usage compared to fast software algorithms like quicksort.

Wikipedia: odd-even mergesort

Note:
Because sorting networks are static and independent of the input values, no compare and swap is needed to generate the network. That's one reason why it can be transformed into hardware. My code generates the indices for the compare operations. In hardware, these vertical connections will be replaced by compare and swap circuits. So unsorted data will travel throught the network and on the output side it will be sorted.


Solution

  • I think I found a solution. I checked it for length = 2, 4, 8, 16.

    Here is my code:

    void sort(int length)
    { int G = log2ceil(length);                      // number of groups
      for (int g = 0; g < G; g++)                    // iterate groups
      { int B = 1 << (G - g - 1);                    // number of blocks
        for (int b = 0; b < B; b++)                  // iterate blocks in a group
        { for (int s = 0; s <= g; s++)               // iterate stages in a block
          { int d = 1 << (g - s);                    // compare distance
            int J = (s == 0) ? 0 : d;                // starting point
            for (int j = J; j+d < (2<<g); j += 2*d)  // iterate startpoints
            { for (int i = 0; i < d; i++)            // shift startpoints
              { int x = (b * (length / B)) + j + i;  // index 1
                int y = x + d;                       // index 2
                printf("%2i cmp %2i\n", x, y);
              }
            }
          }
       }
    }
    

    This solution introduces a fifth for-loop to handle subblocks in a group. The j loop has a changed start and abort value to handle odd counts of post merge steps without generating doubled compare steps.