javaalgorithmnp-complete

How to solve the closest subset sum problem in Java for 100+ element arrays?


I have came across a subset sum problem recently. I was able to solve it for smaller arrays using Java earlier, but in this case I have really no idea what should I do. Brute force and recurrence is probably not an option, as I came across out of memory problem.

So, let's say we have an array of {2500, 3200, 3300}. We are looking for the sum closest to the desired number K = 135000. The main difference is that we can use numbers from the array multiple times.

Ok, if we can use them multiple times, then we can change it to more "traditional" way - just divide K by each of these numbers - that is 54, 42 and 40 - and create a new array, which has those numbers the number of times received from dividing. It would be {2500, 2500, 2500, ... , ... 3300, 3300} and the new array would have the length of 136. Now this is much more than 3.

So - how to solve the closest subset sum problem, where we can pick more than 2 numbers from the array of 136 elements or more using Java?

The thing I want to get is not only the closest sum, but also a list of elements which gave that sum.

I heard and was reading about dynamic programing, approximation algorithms and genetic algorithms, but unfortunately I have no idea about those. I did genetic algorithm for a different case some time ago, but I am not sure how to use it in this case.

Any ideas? I will be really glad for help.


Solution

  • I am not going to solve it for you. but I'll give you the key ideas in pseudocode (aka Python).

    We start with a state that represents the following statement: "I don't know how to arrive to any numbers. The best thing I can generate is 0. I have not yet processed the fact that I could get to 0."

    In data:

    can_generate = set()
    todo = [0]
    best = 0
    K = 135000
    

    What we will do is, while anything is in todo, take off a value, see if it is new to us. If it is, we might update best, and possibly add new values to todo. Like this:

    while len(todo):
        value = todo.pop()
        if value not in can_generate:
            can_generate.add(value)
            if abs(K-value) < abs(K-best):
                best = value
            if value < K:
                for term in [2500, 3200, 3300]:
                    todo.append(value + term)
    

    Now that we know the values in can_generate, we search backwards to find how to get there.

    answer = []
    while 0 < best:
        for term in [3300, 3200, 2500]:
            if best - term in can_generate:
                answer.append(term)
                best -= term
                break