I have n
points and have to find the maximum united area between k
points (k <= n
). So, its the sum of those points area minus the common area between them.
]1
Suppose we have n=4, k=2
. As illustrated in the image above, the areas are calculated from each point to the origin and, the final area is the sum of the B area with the D are (only counting the area of their intersection once). No point is dominated
I have implemented a bottom-up dynamic programming algorithm, but it has an error somewhere. Here is the code, that prints out the best result:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct point {
double x, y;
} point;
struct point *point_ptr;
int n, k;
point points_array[1201];
point result_points[1201];
void qsort(void *base, size_t nitems, size_t size,
int (*compar)(const void *, const void *));
int cmpfunc(const void *a, const void *b) {
point *order_a = (point *)a;
point *order_b = (point *)b;
if (order_a->x > order_b->x) {
return 1;
}
return -1;
}
double max(double a, double b) {
if (a > b) {
return a;
}
return b;
}
double getSingleArea(point p) {
return p.x * p.y;
}
double getCommonAreaX(point biggest_x, point new_point) {
double new_x;
new_x = new_point.x - biggest_x.x;
return new_x * new_point.y;
}
double algo() {
double T[k][n], value;
int i, j, d;
for (i = 0; i < n; i++) {
T[0][i] = getSingleArea(points_array[i]);
}
for (j = 0; j < k; j++) {
T[j][0] = getSingleArea(points_array[0]);
}
for (i = 1; i < k; i++) {
for (j = 1; j < n; j++) {
for (d = 0; d < j; d++) {
value = getCommonAreaX(points_array[j - 1], points_array[j]);
T[i][j] = max(T[i - 1][j], value + T[i - 1][d]);
}
}
}
return T[k - 1][n - 1];
}
void read_input() {
int i;
fscanf(stdin, "%d %d\n", &n, &k);
for (i = 0; i < n; i++) {
fscanf(stdin, "%lf %lf\n", &points_array[i].x, &points_array[i].y);
}
}
int main() {
read_input();
qsort(points_array, n, sizeof(point), cmpfunc);
printf("%.12lf\n", algo());
return 0;
}
with the input:
5 3
0.376508963445 0.437693410334
0.948798695015 0.352125307881
0.176318878234 0.493630156084
0.029394902328 0.951299438575
0.235041868262 0.438197791997
where the first number equals n
, the second k
and the following lines the x
and y
coordinates of every point respectively, the result should be: 0.381410589193
,
whereas mine is 0.366431740966
. So I am missing a point?
This is a neat little problem, thanks for posting! In the remainder, I'm going to assume no point is dominated, that is, there are no points c
such that there exists a point d
with c.x < d.x
and c.y < d.y
. If there are, then it is never optimal to use c
(why?), so we can safely ignore any dominated points. None of your example points are dominated.
Your problem exhibits optimal substructure: once we have decided which item is to be included in the first iteration, we have the same problem again with k - 1
, and n - 1
(we remove the selected item from the set of allowed points). Of course the pay-off depends on the set we choose - we do not want to count areas twice.
I propose we pre-sort all point by their x-value, in increasing order. This ensures the value of a selection of points can be computed as piece-wise areas. I'll illustrate with an example: suppose we have three points, (x1, y1), ..., (x3, y3)
with values (2, 3), (3, 1), (4, .5)
. Then the total area covered by these points is (4 - 3) * .5 + (3 - 2) * 1 + (2 - 0) * 3
. I hope it makes sense in a graph:
By our assumption that there are no dominated points, we will always have such a weakly decreasing figure. Thus, pre-sorting solves the entire problem of "counting areas twice"!
Let us turn this into a dynamic programming algorithm. Consider a set of n
points, labelled {p_1, p_2, ..., p_n}
. Let d[k][m]
be the maximum area of a subset of size k + 1
where the (k + 1)
-th point in the subset is point p_m
. Clearly, m
cannot be chosen as the (k + 1)
-th point if m < k + 1
, since then we would have a subset of size less than k + 1
, which is never optimal. We have the following recursion,
d[k][m] = max {d[k - 1][l] + (p_m.x - p_l.x) * p_m.y, for all k <= l < m}.
The initial cases where k = 1
are the rectangular areas of each point. The initial cases together with the updating equation suffice to solve the problem. I estimate the following code as O(n^2 * k)
. The term squared in n
can probably be lowered as well, as we have an ordered collection and might be able to apply a binary search to find the best subset in log n
time, reducing n^2
to n log n
. I leave this to you.
In the code, I have re-used my notation above where possible. It is a bit terse, but hopefully clear with the explanation given.
#include <stdio.h>
typedef struct point
{
double x;
double y;
} point_t;
double maxAreaSubset(point_t const *points, size_t numPoints, size_t subsetSize)
{
// This should probably be heap allocated in your program.
double d[subsetSize][numPoints];
for (size_t m = 0; m != numPoints; ++m)
d[0][m] = points[m].x * points[m].y;
for (size_t k = 1; k != subsetSize; ++k)
for (size_t m = k; m != numPoints; ++m)
for (size_t l = k - 1; l != m; ++l)
{
point_t const curr = points[m];
point_t const prev = points[l];
double const area = d[k - 1][l] + (curr.x - prev.x) * curr.y;
if (area > d[k][m]) // is a better subset
d[k][m] = area;
}
// The maximum area subset is now one of the subsets on the last row.
double result = 0.;
for (size_t m = subsetSize; m != numPoints; ++m)
if (d[subsetSize - 1][m] > result)
result = d[subsetSize - 1][m];
return result;
}
int main()
{
// I assume these are entered in sorted order, as explained in the answer.
point_t const points[5] = {
{0.029394902328, 0.951299438575},
{0.176318878234, 0.493630156084},
{0.235041868262, 0.438197791997},
{0.376508963445, 0.437693410334},
{0.948798695015, 0.352125307881},
};
printf("%f\n", maxAreaSubset(points, 5, 3));
}
Using the example data you've provided, I find an optimal result of 0.381411
, as desired.