calgorithmdynamic-programmingbottom-up

Find k out of n subset with maximal area


I have n points and have to find the maximum united area between k points (k <= n). So, its the sum of those points area minus the common area between them.

enter image description here]1

Suppose we have n=4, k=2. As illustrated in the image above, the areas are calculated from each point to the origin and, the final area is the sum of the B area with the D are (only counting the area of their intersection once). No point is dominated

I have implemented a bottom-up dynamic programming algorithm, but it has an error somewhere. Here is the code, that prints out the best result:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct point {
    double x, y;
} point;
struct point *point_ptr;

int n, k;
point points_array[1201];
point result_points[1201];

void qsort(void *base, size_t nitems, size_t size,
           int (*compar)(const void *, const void *));

int cmpfunc(const void *a, const void *b) {
    point *order_a = (point *)a;
    point *order_b = (point *)b;
    if (order_a->x > order_b->x) {
        return 1;
    }
    return -1;
}

double max(double a, double b) {
    if (a > b) {
        return a;
    }
    return b;
}

double getSingleArea(point p) {
    return p.x * p.y;
}

double getCommonAreaX(point biggest_x, point new_point) {
    double new_x;
    new_x = new_point.x - biggest_x.x;
    return new_x * new_point.y;
}

double algo() {
    double T[k][n], value;
    int i, j, d;
    for (i = 0; i < n; i++) {
        T[0][i] = getSingleArea(points_array[i]);
    }
    for (j = 0; j < k; j++) {
        T[j][0] = getSingleArea(points_array[0]);
    }
    for (i = 1; i < k; i++) {
        for (j = 1; j < n; j++) {
            for (d = 0; d < j; d++) {
                value = getCommonAreaX(points_array[j - 1], points_array[j]);
                T[i][j] = max(T[i - 1][j], value + T[i - 1][d]);
            }
        }
    }
    return T[k - 1][n - 1];
}

void read_input() {
    int i;
    fscanf(stdin, "%d %d\n", &n, &k);
    for (i = 0; i < n; i++) {
        fscanf(stdin, "%lf %lf\n", &points_array[i].x, &points_array[i].y);
    }
}

int main() {
    read_input();
    qsort(points_array, n, sizeof(point), cmpfunc);
    printf("%.12lf\n", algo());
    return 0;
}

with the input:

5 3
0.376508963445 0.437693410334
0.948798695015 0.352125307881
0.176318878234 0.493630156084
0.029394902328 0.951299438575
0.235041868262 0.438197791997

where the first number equals n, the second k and the following lines the x and y coordinates of every point respectively, the result should be: 0.381410589193,

whereas mine is 0.366431740966. So I am missing a point?


Solution

  • This is a neat little problem, thanks for posting! In the remainder, I'm going to assume no point is dominated, that is, there are no points c such that there exists a point d with c.x < d.x and c.y < d.y. If there are, then it is never optimal to use c (why?), so we can safely ignore any dominated points. None of your example points are dominated.

    Your problem exhibits optimal substructure: once we have decided which item is to be included in the first iteration, we have the same problem again with k - 1, and n - 1 (we remove the selected item from the set of allowed points). Of course the pay-off depends on the set we choose - we do not want to count areas twice.

    I propose we pre-sort all point by their x-value, in increasing order. This ensures the value of a selection of points can be computed as piece-wise areas. I'll illustrate with an example: suppose we have three points, (x1, y1), ..., (x3, y3) with values (2, 3), (3, 1), (4, .5). Then the total area covered by these points is (4 - 3) * .5 + (3 - 2) * 1 + (2 - 0) * 3. I hope it makes sense in a graph:

    figure

    By our assumption that there are no dominated points, we will always have such a weakly decreasing figure. Thus, pre-sorting solves the entire problem of "counting areas twice"!

    Let us turn this into a dynamic programming algorithm. Consider a set of n points, labelled {p_1, p_2, ..., p_n}. Let d[k][m] be the maximum area of a subset of size k + 1 where the (k + 1)-th point in the subset is point p_m. Clearly, m cannot be chosen as the (k + 1)-th point if m < k + 1, since then we would have a subset of size less than k + 1, which is never optimal. We have the following recursion,

    d[k][m] = max {d[k - 1][l] + (p_m.x - p_l.x) * p_m.y, for all k <= l < m}.
    

    The initial cases where k = 1 are the rectangular areas of each point. The initial cases together with the updating equation suffice to solve the problem. I estimate the following code as O(n^2 * k). The term squared in n can probably be lowered as well, as we have an ordered collection and might be able to apply a binary search to find the best subset in log n time, reducing n^2 to n log n. I leave this to you.

    In the code, I have re-used my notation above where possible. It is a bit terse, but hopefully clear with the explanation given.

    #include <stdio.h>
    
    typedef struct point
    {
        double x;
        double y;
    } point_t;
    
    
    double maxAreaSubset(point_t const *points, size_t numPoints, size_t subsetSize)
    {
        // This should probably be heap allocated in your program.
        double d[subsetSize][numPoints];
    
        for (size_t m = 0; m != numPoints; ++m)
            d[0][m] = points[m].x * points[m].y;
    
        for (size_t k = 1; k != subsetSize; ++k)
            for (size_t m = k; m != numPoints; ++m)
                for (size_t l = k - 1; l != m; ++l)
                {
                    point_t const curr = points[m];
                    point_t const prev = points[l];
    
                    double const area = d[k - 1][l] + (curr.x - prev.x) * curr.y;
    
                    if (area > d[k][m])  // is a better subset
                        d[k][m] = area;
                }
    
        // The maximum area subset is now one of the subsets on the last row.
        double result = 0.;
    
        for (size_t m = subsetSize; m != numPoints; ++m)
            if (d[subsetSize - 1][m] > result)
                result = d[subsetSize - 1][m];
    
        return result;
    }
    
    int main()
    {
        // I assume these are entered in sorted order, as explained in the answer.
        point_t const points[5] = {
                {0.029394902328, 0.951299438575},
                {0.176318878234, 0.493630156084},
                {0.235041868262, 0.438197791997},
                {0.376508963445, 0.437693410334},
                {0.948798695015, 0.352125307881},
        };
    
        printf("%f\n", maxAreaSubset(points, 5, 3));
    }
    

    Using the example data you've provided, I find an optimal result of 0.381411, as desired.