I'm looking for an algorithm for finding largest subset of points (by largest i mean in number) that form a convex polygon from the given set of point. I think this might be solvable using DP but i'm not sure. Is it possible to do this in O(n^3) ? Actually i just need the size of the largest subset, so it doesn't need to have unique solution
Edit :
just to keep this simple,
Given input : a set of points in 2D
Desired output : maximum number of points that form a convex polygon, like in the example the output is 5 (ABHCD is one of the possible convex polygon)
There's a similar problem spoj.com/problems/MPOLY which is solvable using DP in O(N^3), my question is about the generalization of that problem which doesn't have to contain (0,0)
This problem has been already published in these competitions:
And its solution (O(N3) algorithm) could be found on this page: "USACO DEC08 Problem 'fence' Analysis" by Bruce Merry and Jacob Steinhardt.
The following is an attempt to explain this algorithm. Also here is my implementation of this algorithm in C++11. This code works for N<=250 and integer coordinates in range 0 .. 1023. No three points should be on the same line. Here is Python script that generates test data satisfying these requirements.
Simplified problem: find largest subset of points that form a convex polygon and contain the leftmost point of given set (or if there are several leftmost points, this polygon should contain topmost of them).
To solve this problem we could connect each pair of points by directed line segment, then (implicitly) treat each segment as a graph node as shown in diagram:
Here parent node and corresponding segment have blue color. Line segment corresponding to its left child (red color) starts at the end of "parent" segment and it is the line segment making least possible left turn relative to the direction of "parent" segment. Line segment corresponding to its right child (green color) starts at the start of "parent" segment and it is also the line segment making least possible left turn relative to the direction of "parent" segment.
Any segment ending at leftmost point corresponds to a "leaf" node, i.e. it has no child nodes. Such construction of the graph guarantees that there are no cycles, in other words this graph is a DAG.
Now to find largest convex subset of points it is enough to find the longest path in this graph. And the longest path in DAG could be found with dynamic programming algorithm in time O(E), where E is number of edges in the graph. Since each node has only 2 outgoing edges, each corresponding to a pair of points, this problem could be solved in O(N2) time.
All this is possible to implement if we preprocess input data, sorting directed line segments starting at the same point by angle. This allows to perform depth-first traversal in the graph. We should memorize the longest path starting from each processed node, so that each graph edge is visited at most once, and we have O(E) = O(N2) time algorithm (ignoring preprocessing time for now). Space requirements are also O(N2) because we need to store sorted directions for each pair of points and memorization uses one value per node (which is also a pair of points).
Here is C++ implementation of this dynamic programming algorithm:
unsigned dpStep(ind_t i_left, ind_t from, ind_t to_ind)
{
ind_t child = sorted[from][to_ind];
while (child < i_left)
child = sorted[from][++to_ind];
if (child == i_left)
return 1;
if (auto v = memorize[from][child])
return v;
const auto x = max(dpStep(i_left, child, lefts[from][child]) + 1,
dpStep(i_left, from, static_cast<ind_t>(to_ind + 1)));
memorize[from][child] = static_cast<ind_t>(x);
return x;
}
Input parameters are the leftmost point and a pair of points forming current line segment (where starting point of the segment is given directly but ending point is given as an index in sorted by angle array of points). The word "left" in this code fragment is slightly overused: it means both the leftmost point (i_left
) and preprocessed array containing left childs for the graph (lefts[][]
).
General problem (where the leftmost point is not fixed) could be solved this way:
Step 4 is optional. It does not improve O(N3) time complexity. But it slightly improves speed of DP algorithm by excluding unneeded points. In my tests this gives 33% speed boost.
There are several alternatives for preprocessing. We could compute every angle (with atan2
) and sort them, as it is done in sample code by Bruce Merry and Jacob Steinhardt. This is a simplest way but atan2
may be too expensive, especially for smaller point sets.
Or we could exclude atan2
and sort tangents instead of angles, as it is done in my implementation. This is a little bit more complicated but faster.
Both these alternatives need O(N2 log N) time (unless we use some distribution sorting). Third alternative is to use well known computational geometry methods (arrangements and duality) to get O(N2) preprocessing. But I don't think it is useful for our problem: it is probably too slow for smaller point sets because of large constant factor, as for larger point sets, it might give some speed improvement, but too insignificant because step 3 will greatly outweigh step 2. Also it is much more difficult to implement.
Some other results: I tried to implement iterative DP instead of recursion; this did not noticeably change the speed. Also I tried to perform two DP searches at once, interleaving steps of each search (something similar to fibers or HyperThreading implemented in software); this might help because DP consists mostly of memory accesses having high latency and preventing full utilization of CPU throughput; the results are not very impressive, only about 11% speed boost, most likely with real HyperThreading it would be much faster.