setpermutationsubset-sum

For a Set of Integers, Get All Sets Whose Sum is Greater Than Some Value


Let me start by saying I have found many questions and answers that answer things which seem somewhat related or close to this question, but are not actually this question. It seems related to the coin change / subset sum problem, but I think this is distinctly different enough that subset sum answers do not cover this question.

The Problem

Let's say I have a set of 4 integers called S : [1, 2, 5, 1]

The goal is to devise a set of sets, G , that is comprised of each subset of S where the sum of the set is greater than some value N.

In this example, if N = 6, then G would be [ [5,2,1,1], [5,2,1], [5,2], [5,1,1] ]

Considerations

I chose the word set here specifically because the order of each set does not matter at all. [5, 2, 1, 1] is identical to [5, 1, 1, 2].

When you create your first set that exceeds N, lets say G1, you can also immediately add all other sets which contain G1 as a subset. Likewise, if you find another new set G2 that has not already added and also exceeds N, you can immediately add all sets which contain G2 as a subset. You don't need to check if these sets sum to greater than N.

If you sum all the members of S and the result is not greater than N, then you can conclude there are no sets which meet the criteria.

Actual Use-case

The real-world goal that is trying to be met here is that I have a group of items which have a numerical weight associated with them, and another value which represents a threshold. I'm trying find out if its even feasible to routinely find all the sub-groups that can be made of the items that exceed the threshold when their weights are summed.

What is the most efficient way to generate all sets that meet this criteria? What is complexity of this?


Solution

  • Here's a more efficient implementation. Worthwhile notes:

    from collections import Counter
    
    def solve(nums, target):
        counts = sorted(Counter(nums).items())
        reserve = sum(nums) - target
        
        if reserve <= 0:
            return []
        return list(_solve(counts, reserve, []))
    
    def _solve(counts, reserve, prefix):
        if not counts:
            yield tuple(prefix)
            return
        
        val, max_count = last = counts.pop()
        
        prefix.extend([val] * max_count)
        yield from _solve(counts, reserve, prefix)
        
        for count in range(1, max_count + 1):
            prefix.pop()
            if reserve - count * val > 0:
                yield from _solve(counts, reserve - count * val, prefix)
        
        counts.append(last)
    

    Timing it:

    from timeit import timeit
    
    runs = 10
    statement = "solve([randrange(15) for _ in range(25)], 100)"
    setup = "from __main__ import solve; from random import randrange"
    print(timeit(statement, setup=setup, number=runs)/runs)
    

    Output:

    0.22455865999218078
    

    Meaning its able to solve problems of 25 elements where the solution contains ~100k distinct subsets in about 225ms.

    I did test this against a naive solution to ensure correctness as well.

    Regarding time complexity, its hard to bound this, because its run time really depends on the value of the numbers (or rather, the output size). You could bound it as O(n log n + s * n) where n is the size of the input list, and s is the size of the output list.