I am working on a problem in which I have 80GB
of data which I need to sort. I only have 1GB
main memory to sort the data. Obviously we will apply the external sort method here. But my question is which k-merge sort will be more efficient?
The complexity of K-merge sorting is O(nk^2)
, where n is the number of elements. Suppose I use this method to calculate complexity:
8 way merge followed by 10 way merge
8 way merge - O(n*8^2) => O(64n)
10 way merge - O(8n*10^2) => O(800n)
Total time complexity => O(64n) + O(800n)
5 way merge followed by 16 way merge
5 way merge - O(n*5^2) => O(25n)
16 way merge - O(5n*16^2) => O(1280n)
Total time complexity => O(25n) + O(1280n)
Looking at time complexity 5 way merge followed by 16 way merge
seems taking more time. Do you think my process is right? I don't feel very confident about it.
UPDATE: @rcgldr Since you are saying larger block size will take less time to read/write so what do you think about this formula:
Time to read/write 1 block = Average Seek time +
Average rotational latency + blocksize/Maximum Transfer Rate
According to this formula, if the block size is small then overall read/write time will be also less. Do you think there is something wrong here? or we need to multiply the total number of blocks along with this to get accurate picture of total time required.
It's an external sort, so the normal time complexity doesn't apply, at least not in a real world case.
To clarify the first statement, a k-way merge (not a merge sort) in memory (not external) merges k arrays of size n, so k n data is moved, and in a simple version, k-1 compares are made for each move, so (k n) (k-1) = n k^2 - n k => O(n k^2). If a heap / priority queue / ... is used to keep track of the current set of minimum elements, then the number of compares is reduced to log2(k), so O(n k log2(k)).
A k-way merge sort of n elements in memory takes ⌈logk(n)⌉ == ceil(logk(n)) passes, moving n elements on each pass, so n ⌈logk(n)⌉. Using heap / priority queue / ... to implement the compare of k elements takes ⌊log2(k)⌋ == floor(log2(k)), so (n ⌈logk(n)⌉ ) (⌊log2(k)⌋). If n is a power of k, and k is a power of 2, then the complexity is n logk(n) log2(k) = n log2(n). The k doesn't matter for complexity, but actual run time may be better or worse, k > 2 means more compares, but fewer moves, so the issue is compare overhead versus run overhead, such as sorting an array of pointers to objects.
Back to external sort, assuming the process is I/O bound, then the complexity is related to I/O, not cpu operations.
Both methods will take 3 passes reading / writing 80GB of data. The first pass creates 80 instances of 1GB clusters. The next pass merges 5 or 8 clusters at a time, and the last pass merges 16 or 10 clusters at a time. The main issue is reading or writing large chunks of data at a time to reduce random access overhead. The 16 way merge will break up the 1GB of memory into smaller chunks than the 10 way merge, which could make a slight difference in random access overhead, so the 8 / 10 method might be a bit faster. If using a solid state drive for the external sort, then random access / chunk size is much less of an issue.
The other issue is if the 10 way or 16 way merge operates in memory fast enough so that the merge process is I/O bound.