pythoncombinationsbest-fit

Find the maximum number of combinations from a data set


I need to write a function that finds the maximum number of combinations from a data set, where the sum of each of the subset combinations are >= a target value. Once numbers are taken from the data_set and put into a combination they cannot be reused in another combination For example data_set = [1,2,3,4,5,6,7,8,9,10] target_value = 5 the combinations would be 1: [10] 2: [9] 3: [8] 4: [7] 5: [6] 6: [5] 7: [4,1] 8: [3,2] an example of bad grouping would be 1: [1,2,3] 2: [4,5] 3: [6] 4: [7] 5: [8] 6: [9] or 1: [1,2,3] 2: [4,5] 3: [6,7,8,9]. I believe a constraint to be the number of subsets cannot be greater than sum(data_set)/target_value i.e. data_set = [5,5,5,5,5] target_value = 5 yields [5],[5],[5],[5],[5].

For context I have a large set of items I need to buy from a store, the store gives a coupon every time your purchase is over $150, if you buy everything at once you receive one coupon but if you split your items into small purchases as close to $150 as possible you will receive the maximum coupons possible.

This is the current code but it is obviously not optimized I'm having trouble scanning ahead for better matches

from numpy import random


def get_groups(item_list=[], target=0):
    groups = []

    def recurse(current_group, remaining_item_list):
        for index in range(len(remaining_item_list)):
            if sum(current_group) + remaining_item_list[index] < target:
                current_group.append(remaining_item_list[index])
                if index+1 == len(remaining_item_list):
                groups.append(current_group)
            else:
                current_group.append(remaining_item_list[index])
                groups.append(current_group)
                recurse([], remaining_item_list[index+1:])
                break

    item_list.sort(reverse=True)
    recurse([], item_list)

    return groups

items = [ random.randint(50) for i in range(21)]
target = 150

groups = get_groups(items, target)

print("Items: {}".format(items))

for index, group in enumerate(groups, start=1):
print("Group {}: {}, total: {}, length: {}".format(index, group, sum(group), len(group)))

EDIT: here is some much more optimized code I am sure it is not 100% correct but its better

from numpy import random


def get_groups(item_list=[], target=0):
    groups = []

    def recurse(current_group, remaining_item_list):
        for index in range(len(remaining_item_list)):
            remaining_item_list.sort(reverse=True)
            if sum(current_group) + remaining_item_list[index] < target:
                current_group.append(remaining_item_list[index])
                if index+1 == len(remaining_item_list):
                    groups.append(current_group)
            elif sum(current_group) + remaining_item_list[index] > target and current_group:
                reverse_search(current_group, remaining_item_list)
                remaining_item_list.sort(reverse=True)
                recurse([], remaining_item_list[index:])
                break
            else:
                current_group.append(remaining_item_list[index])
                groups.append(current_group)
                recurse([], remaining_item_list[index+1:])
                break
    
    def reverse_search(current_group, remaining_item_list):
        for index in range(len(remaining_item_list)):
            remaining_item_list.sort()
            if sum(current_group) + remaining_item_list[index] < target:
                current_group.append(remaining_item_list.pop(index))
                if index+1 == len(remaining_item_list):
                    groups.append(current_group)
            else:
                current_group.append(remaining_item_list.pop(index))
                groups.append(current_group)
                current_group = []
                break


    recurse([], item_list)

    return groups

items = [ random.randint(50) for i in range(20)]
target = 150

items.sort(reverse=True)

print("Items: {}".format(items))

groups = get_groups(items, target)

for index, group in enumerate(groups, start=1):
    print("Group {}: {}, total: {}, length: {}".format(index, group, sum(group), len(group)))```

Solution

  • The following approach does not guarantee an optimal solution but it will produce one in almost all cases.

    The main function starts by determining the maximum number of groups that can be formed and builds the groups by adding the largest item value that fits within the target boundary. This will produce groups that only exceed the target value by a portion of their smallest item. If the last group formed reaches the target value, then, all preceding groups also reach it and there is no additional optimization needed. If the last group does not reach the target, then an optimization function is called to swap items between groups to spread some of the extras down to the last group(s).

    from bisect import bisect_right
    from itertools import combinations
    
    def bonusGroups(numbers,target):
        numbers  = sorted(numbers)
        nbGroups = sum(numbers)//target
        if not nbGroups: return [numbers]
        groups   = [[] for _ in range(nbGroups)]
        
        # build initial groups
        for ig,g in enumerate(groups[:-1]):
            gap = target - sum(g)
            while numbers and gap > 0:
                i = max(0,bisect_right(numbers,gap) - 1)
                g.append(numbers.pop(i))
                gap -= g[-1]
            while g and min(g) <= target-sum(g):
                i = g.index(min(g))
                groups[ig+1].append(g.pop(i))
        groups[-1].extend(numbers)
    
        # last group reaches target, no spreading required
        if sum(groups[-1]) >= target:
            return groups
    
        return spreadGroups([*filter(None,groups)],target)
    

    The optimization function will try to remove items from the first groups and replace them with items of subsequent groups (starting from last). The items are selected and matched up in a way that increase the subsequent group's sum without making the first group go below the target. To keep this efficient, the logic starts by removing as few items as possible and increases the removal combination size progressively until the new spread makes all groups reach the target (or removal size exceeds the largest group's size):

    def spreadGroups(groups,target):
        groups.sort(key=sum,reverse=True)
        spreading = True
        remSize   = 1
        while spreading and any(sum(g) < target for g in groups):
            spreading = False
            for ig,g in enumerate(groups[:-1]):
                if remSize >= len(g): continue
                extra   = sum(g)-target
                if not extra: continue
                remSums = { sum(g[i] for i in idx):idx
                            for idx in combinations(range(len(g)),remSize) }
                for subSum,indexes in remSums.items():
                    for tg in groups[-1:ig:-1]:
                        idx = matchSum(tg,max(0,subSum-extra),subSum-1)
                        if not idx and subSum>extra: continue
                        g.extend(tg.pop(i) for i in sorted(idx,reverse=True))
                        tg.extend(g.pop(i) for i in sorted(indexes,reverse=True))
                        if remSize>1: ig,remSize = -1,1
                        groups[ig+1:] = sorted(groups[ig+1:],key=sum,reverse=True)
                        spreading = True
                        break
                    if spreading: break
                if spreading: break
            if not spreading and remSize < max(map(len,groups))-1:
                remSize  += 1
                spreading = True
                groups.sort(key=sum,reverse=True)           
        return groups
    

    In order to optimally match the total value of removed items, the target "swapping" group needs a combination of item (indexes) that adds up to the smallest total that is within the specified range. By getting the combination with the smallest eligible total, the reduction of the source group's total is maximized and excess item values are progressively pushed down to the last groups. This function produces a list of indexes that meet that criteria:

    # find group indexes producing smallest sum in range
    def matchSum(g,minSum,maxSum):
        gsums = {0:[]}
        for i,n in enumerate(g):
            newSums = { s+n:[*ss,i] for s,ss in gsums.items() if s+n<=maxSum }
            gsums.update(newSums)
        sumVal  = min(gsums,key=lambda s:s if s>=minSum else maxSum+1)
        return gsums.get(sumVal,[])
    

    Testing:

    items = [ random.randint(50) for i in range(21)]
    target = 150
    
    groups = bonusGroups(items, target)
    
    print("Items: {}".format(items))
    
    for index, group in enumerate(groups, start=1):
        print("Group {}: {}, total: {}, length: {}".format(index, group, sum(group), len(group)))
    

    Test outputs:

    Items: [38, 38, 31, 9, 42, 25, 22, 42, 44, 46, 43, 47, 14, 20, 38, 34, 35, 3, 49, 36, 30]
    Group 1: [49, 47, 46, 3, 9], total: 154, length: 5
    Group 2: [44, 43, 42, 20, 14], total: 163, length: 5
    Group 3: [42, 38, 38, 31, 22], total: 171, length: 5
    Group 4: [25, 30, 34, 35, 36, 38], total: 198, length: 6
    
    Items: [34, 5, 13, 30, 27, 15, 4, 44, 33, 13, 25, 41, 33, 23, 14, 8, 39, 9, 33, 8, 4]
    Group 1: [44, 41, 39, 25, 4], total: 153, length: 5
    Group 2: [34, 33, 33, 33, 15, 4], total: 152, length: 6
    Group 3: [5, 8, 8, 9, 13, 13, 14, 23, 27, 30], total: 150, length: 10
    
    Items: [25, 22, 44, 42, 43, 22, 25, 16, 42, 24, 21, 5, 4, 16, 3, 22, 46, 31, 11, 24, 14]
    Group 1: [46, 44, 43, 16, 3], total: 152, length: 5
    Group 2: [42, 42, 31, 25, 5, 4, 11], total: 160, length: 7
    Group 3: [14, 16, 21, 22, 22, 22, 24, 24, 25], total: 190, length: 9