cminimum

find Minimum sum of squares of set partition in k cluster


Problem

Given a set of n positive integers, partition them into k subsets, then minimize the sum of the squares of the sum of each subset. For example, let the set be [1, 2, 3] and k be 2, then the solution is [1, 2] and [3]. The square of the sum from the first subset is (1+2)^2=9, and the square of the sum from the second subset is 3^2=9. The sum is 9+9=18, which is the minimum.

sample input

n=10, k=2 [63230795, 3521578, 37513838, 37860789, 30498450, 29795141, 41263743, 5815341, 19046274, 20919844] -> 41895269854617569

n=10, k=5 [42566460, 61080136, 12375813, 29881559, 61767889, 60645182, 22105410, 17262225, 34309213, 38950048] -> 29098109328960071

constraints

my code

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <limits.h>
#include <stdint.h>

bool used[20] = {0};
int n, m;
uint64_t arr[20], min = UINT64_MAX;
int find(int nset, uint64_t sum);
int subset(uint64_t subsum, int cur, int sum, int nset){
    if (cur == n){
        find(nset+1, sum+subsum*subsum);
        return 0;
    }
    subset(subsum, cur+1, sum, nset);
    if (!used[cur]){
        used[cur] = 1;
        subset(subsum+arr[cur], cur+1, sum, nset);
        used[cur] = 0;
    }
    return 0;
}
int find(int nset, uint64_t sum){
    if (sum >= min)
        return 0;
    if (nset == m-1){
        uint64_t setsum = 0;
        for (int i = 0; i < n; i++)
            if (!used[i])
                setsum += arr[i];
        sum += setsum*setsum;
        if (sum < min)
            min = sum;
        return 0;
    }else{
        subset(0, 0, sum, nset);
        return 0;
    }
}
int main(){
    scanf("%d %d", &n, &m);
    for (int i = 0; i < n; i++)
        scanf("%llu", &arr[i]);
    uint64_t z = 0;
    find(0, z);
    printf("%llu", min);
}

My idea is using brutal search that counting the sum of squares of one subset and next with simple pruning when current solution is larger than current answer, but wrong. Do I lost something? thank you for answering.


Solution

  • Sol

    #include <stdio.h>
    #include <stdlib.h>
    #include <stdbool.h>
    #include <limits.h>
    #include <stdint.h>
    
    uint64_t befsq[10] = {0}, arr[20], min = UINT64_MAX, avg;
    int n, m, len[10] = {0};
    int find(int cur){
        uint64_t s = 0;
        for (int i = 0; i < m; i++) s += befsq[i]*befsq[i];
        if (s >= min) return 0;
        if (cur == n){
            min = s;
            return 0;
        }
        for (int i = 0; i < m; i++){
            if (befsq[i] > avg)
                continue;
            if (befsq[i]+arr[cur] > avg){
                if (befsq[i]+arr[cur]-avg > (avg-befsq[i]))
                    continue;
            }
            len[i]++;
            befsq[i] += arr[cur];
            find(cur+1);
            befsq[i] -= arr[cur];
            len[i]--;
            if (!len[i]) return 0;
        }
    }
    
    int main(){
        scanf("%d %d", &n, &m);
        for (int i = 0; i < n; i++)
            scanf("%llu", &arr[i]), avg += arr[i];
        avg /= m;
        find(0);
        printf("%llu", min);
    }
    

    Finally, I came up with the solution. The idea is trying to distribute every element to every subset. Similarly, with some "cut" to reduce search tree. The "cut" here, is that the closer the sum of subset to the average the smaller the minimum. Also, the more subset the case has the smaller the minimum. So once found that a subset has no element, the function return directly. These are from my observation, i am not sure if it is true for all similar question. Hope someone confirms the truth of my idea. Thanks.