cquicksortcilkcilk-plus

Why does my quicksort crash on large inputs?


I have created a median-of-3 standard Quick-sort implementation which sorts a large array of random integers. I would like to go up to at least a 100 million elements but preferably a billion. To increase speed, I am attempting to parallelize the algorithm in Cilk++. The algorithm uses double recursion, and I spawn Cilk tasks to perform each recursive sort.

My algorithm works for arrays up to size 10 000 000. Without the Cilk keywords, my sequential algorithm handles 100 million elements easily but when I try to use Cilk the program crashes to desktop. I would now like to find out the reason for this. Am I generating too many Cilk tasks too quickly?

I am using Windows 7 64bit, Intel++ compiler and Intel Parallel Studio XE 2013 integrated in Visual Studio 2010. The program is compiled as a 32-bit application. The memory where the random data is stored is allocated as a single call to malloc, and the pointer is checked. In median calculation integer overflow is also guarded against when calculating the middle element.

This is mostly likely a Buffer Overrun issue.

This is my partition:

int pivotvalue = medianOf3(data, low, high);
// pivot is placed next to the last element

int left = low;
int right = high - 1;
while (left < right) {
    while (data[left] < pivotvalue) {
        left++;
        if (left > high) {
            break;
        }
    }
    while (data[right] >= pivotvalue) {
        right--;
        if (right < low) {
            break;
        }
    }

    if (left < right) {
        swap(data, left, right);
    }
}

// restore pivot
swap(data, left, high - 1);
return left;

Solution

  • I don't know how Cilk works, but I recall needing to fix a quicksort implementation on an embedded platform that could overflow the stack with recursion. The fix was to use a recursive call for the smaller "half" of the data and process the larger "half" inside the function (i.e., tail recursion). Sorted (or reverse sorted) lists would always trigger overflow since the depth of the call graph was equal to the number of elements in the list.

    Can you use the debugger to find out what causes the crash? Can you dump data to a log file on each entry into swap(), and then see what the function calls before crash looked like? Is it possible to dump the size of the stack on each call? Does each Cilk task have its own stack that is possibly smaller than the stack used in the non-Cilk version?

    You could track stack usage by adding a parameter to swap() with a stack address. The first call and any new Cilk thread uses NULL. Each recursive call uses that parameter if it wasn't NULL, or the address of one of its local variables to establish where its stack is. Dump the difference in addresses, or track it in a global and only report it when it exceeds the previous maximum.