I have a set of points in the plane that I want to sort based on when they encounter an arbitrary sweepline. An alternative definition is that I want to be able to sort them based on any linear combination of the x-
and y-
coordinates. I want to do the sorting in linear time, but am allowed to perform precomputation on the set of points in quadratic time (but preferably O(n log(n))
). Is this possible? I would love a link to a paper that discusses this problem, but I could not find it myself.
For example, if I have the points (2,2)
, (3,0)
, and (0,3)
I want to be able to sort them on the value of 3x+2y
, and get [(0,3), (3,0), (2,2)]
.
Edit: In the comments below the question a helpful commenter has shown me that the naive algorithm of enumerating all possible sweeplines will give a O(n^2 log(n))
preprocessing algorithm (thanks again!). Is it possible to have a O(n log(n))
preprocessing algorithm?
First note, that enumerating all of the sweeplines takes O(n^2 log(n))
, but then you have to sort the n^2
sweeplines. Doing that naively will take time O(n^3 log(n))
and space O(n^3)
.
I think I can get average performance down to O(n)
with O(n^2 log*(n))
time and O(n^2)
space spent on preprocessing. (Here log*
is the iterated logarithm and for all intents and purposes it is a constant.) But this is only average performance, not worst case.
The first thing to note is that there are n choose 2 = n*(n-1)/2
pairs of points. As we rotate 360 degrees, each pair will cross the other twice, for at most O(n^2)
different orderings and O(n^2)
pair crossings between them. Also note that after a pair crosses, it does not cross again for 180 degrees. Over any range of less than 180 degrees, a given pair either will cross once or won't.
Now the idea is that we'll store a random O(n)
of those possible orderings and which sweepline they correspond to. Between any sweepline and the next, we'll see O(n^2 / n) = O(n)
pairs of points cross. Therefore both sorts are correct to on average O(1)
, and every inversion between the first and the order we want is an inversion between the first and second sorts. We'll use this to find our final sort in O(n)
.
Let me fill in details backwards.
We have our O(n)
sweeplines precalculated. In time O(log(n))
we find the two nearest. Let's assume we find the following data structures.
pos1
: Lookup from point to its position in sweepline 1.points1
: Lookup from position to the point there in sweepline 1.points2
: Lookup from position to the point there in sweepline 2.We will now try to sort in time O(n)
.
We initialize the following data structures:
upcoming
: Priority queue of points that could be next.is_seen
: Bitmap from position to whether we've added the point to upcoming.answer
: A vector/array/whatever you language calls it that will hold the answer at the end.max_j
: The farthest point in line 2 that we have added to upcoming. Starts at -1
.And now we do the following.
for i in range(n):
while is_seen[i] == 0:
# Find another possible point
max_j++
point = points2[max_j]
upcoming.add(point with where it is encountered as priority)
is_seen[pos1[point]] = 1
# upcoming has points1[i] and every point that can come before it.
answer.append(upcoming.pop())
Waving my hands vigorously, every point is put into upcoming
once, and taken out once. On average, upcoming
has O(1)
points in it, so all operations average out to O(1)
. Since there are n
points, the total time is O(n)
.
OK, how do we set up our sweeplines? Since we only care about average performance, we cheat. We randomly choose O(n)
pairs of points. Each pair of points defines a sweepline. We sort those sweeplines in O(n log(n))
.
Now we have to sort O(n)
sweeplines. How do we do this?
Well we can sort a fixed number of them by any method we want. Let's pick 4 evenly chosen sweeplines and do that. (We actually only need to do the calculation 2x. We pick 2 pairs of points. We pick the sweepline where the first 2 cross, then the second 2 cross, then the other 2 sweeplines are at 180 degrees from the first 2, and therefore are just reversed order.) After that, we can use the algorithm above to sort a sweepline between 2 others. And do that through bisection to smaller and smaller intervals.
Now, of course, the sweeplines will not be as close as they were above. But let's note that if we expect the points to agree to within an average O(f(n))
places between the sweepline, then the heap will have O(f(n))
elements in it, and operations on it will take O(log(f(n)))
time, and so we get the intermediate sweepline in O(n log(f(n))
. How long is the whole calculation?
Well, we have kind of a tree of calculations to do. Let's divide the sweeplines by what level they are, the group them. The grouping will be the top:
1 .. n/log(n)
n/log(n) .. n/log(log(n))
n/log(log(n)) .. n/log(log(log(n)))
...and so on.
In each group we have O(n / log^k(n))
sweeplines to calculate. Each sweepline takes O(n log^k(n))
time to calculate. Therefore each level takes O(n^2)
. The number of levels is the iterated logarithm, log*(n)
. So total preprocessing time is O(n^2 log*(n))
.