algorithmsortingrecursionclosest-points

Closest pair algorithm from (n log^2 n) to (n log n) time


I am trying to understand how to go from n log^2 n time to n log n time for the closet pair algorithm. I get the below part (from http://www.cs.mcgill.ca/~cs251/ClosestPair/ClosestPairDQ.html)

  1. Divide the set into two equal sized parts by the line l, and recursively compute the minimal distance in each part.
  2. Let d be the minimal of the two minimal distances.
  3. Eliminate points that lie farther than d apart from l
  4. Sort the remaining points according to their y-coordinates
  5. Scan the remaining points in the y order and compute the distances of each point to its five neighbors.
  6. If any of these distances is less than d then update d.

Step 4 is a sort that takes O(n log n) time, which dominates all other steps and this is the one that needs to be reduced to O(n) in order for the overall algorithm to achieve O(n log n) time. And this is the part I am having a hard time understanding. The author proposes

Step 1: Divide the set into..., and recursively compute the distance in each part, returning the points in each set in sorted order by y-coordinate. Step 4: Merge the two sorted lists into one sorted list in O(n) time.

You still have to sort the points by the y-coordinate in the recursive step, which takes O(n log n) time. How can we avoid this? The merging is O(n), but we still have to sort somewhere.


Solution

  • The reason that O(n log n) is a problem is that we do it over and over and over again: if we partition the set into two subsets, and partition each of those into two subsets, then that involves seven sorts (one over the whole set, two over halves of it, four over quarters of it).

    So the proposal fixes this by reusing the results of previous sorts: so we do full merge-sorts of the smallest partitions (each of which is O(1), totaling O(n)), but for larger partitions we just need to do a single O(n) merge pass to combine the results of those. So we only pay the O(n log n) price a total of once, which is fine.