pythonnumpyoptimizationscipyassignment-problem

Linear sum assignment (SciPy) and balancing the costs


I am having difficulty using scipy.optimize.linear_sum_assignment to evenly distribute tasks (costs) to workers, where each worker can be assigned multiple tasks. The cost matrix represents the workload of each task for each worker.

We want to minimize the total costs of all workers, while evenly distributing the costs of each worker.

In this example, we have 3 workers named a, b and c. Each worker can be assigned a total of 4 tasks, so in the cost matrix we have the agents a_1, a_2 and so forth.

linear_sum_assignment does give us the assignment with the total costs minimized. To simply things, our example uses a cost matrix such that any assignment will give us the same total costs.

However, is the costs is not evenly distributed across the 3 workers. In our example, the costs for the 3 workers are 65, 163 and 192 respectively.

Is it possible to minimize the costs as much as possible, while distributing the costs per worker more evenly across the 3 workers?

from scipy.optimize import linear_sum_assignment
import numpy as np

worker_capacities = [
    "a_1", "a_2", "a_3", "a_4",
    "b_1", "b_2", "b_3", "b_4",
    "c_1", "c_2", "c_3", "c_4",
]
n_tasks = len(worker_capacities)
c = np.array([
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
 [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
])

_, assignments = linear_sum_assignment(c)
print("Costs for worker a:", sum(c[i][j] for i, j in enumerate(assignments[0:4])))
print("Costs for worker b:", sum(c[i+4][j] for i, j in enumerate(assignments[4:8])))
print("Costs for worker c:", sum(c[i+8][j] for i, j in enumerate(assignments[8:12])))

gives the output

Costs for worker a: 65
Costs for worker b: 163
Costs for worker c: 192

Solution

  • The linear_sum_assignment method doesn't support constraints or a custom objective, so I don't think this is possible.

    However, you could formulate your problem as a mixed-integer linear programming problem (MILP) and solve it by means of PuLP1. In order to evenly distribute the total costs per worker, you could minimize the difference between the maximum and the minimum total costs per worker. Here's a possible formulation:

    Sets:
    
    - workers = ["a", "b", "c"]
    - tasks   = [1, 2, ..., 12]
    
    Variables:
    
    - x[w,t] = 1 iff worker w is assigned to task t, 0 otherwise
    - min_val
    - max_val
    
    Model:
    
    min  max_val - min_val
    
    s.t. 
    
    # each worker is assigned to exactly n_tasks_per_worker tasks
    sum(x[w,t] for t in tasks) == n_tasks_per_worker for all w in workers
    
    # each task can only be assigned once
    sum(x[w,t] for w in workers) == 1 for all t in tasks
    
    # evenly distribute the tasks
    sum(x[w,t] for t in tasks) <= max_val for all w in workers
    sum(x[w,t] for t in tasks) >= min_val for all w in workers
    

    The code is straightforward:

    import pulp
    import numpy as np
    
    workers = ["a", "b", "c"]
    n_workers = len(workers)
    n_tasks_per_worker = 4
    n_tasks = n_workers * n_tasks_per_worker
    
    c = np.array([[27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
                  [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26],
                  [27, 42, 65, 33, 67, 45, 60, 76,  6,  6, 43, 26]])
    
    # create the model
    mdl = pulp.LpProblem("even_assignment")
    
    # decision variables
    x = {}
    for w in workers:
        for t in range(n_tasks):
            x[w, t] = pulp.LpVariable(f"x[{w}, {t}]", cat="Binary")
    
    max_val = pulp.LpVariable("max_val", cat="Continuous")
    min_val = pulp.LpVariable("min_val", cat="Continuous")
    
    # objective: minimize the difference between the maximum and the minimum
    #            costs per worker
    mdl.setObjective(max_val - min_val)
    
    # constraint: each worker gets assigned exactly n_tasks_per_worker
    for w in workers:
        mdl.addConstraint(sum(x[w, task] for task in range(n_tasks)) == n_tasks_per_worker)
    
    # constraint: each task can only be assigned once
    for task in range(n_tasks):
        mdl.addConstraint(sum(x[w, task] for w in workers) == 1)
    
    # constraint: evenly distribute the tasks
    for i_w, w in enumerate(workers):
        assignment_cost = sum(x[w,task]*c[i_w,task] for task in range(n_tasks))
        mdl.addConstraint(assignment_cost <= max_val)
        mdl.addConstraint(assignment_cost >= min_val)
    
    # solve the problem
    mdl.solve()
    
    # Output
    for i_w, w in enumerate(workers):
        worker_cost = sum(x[w, t].varValue*c[i_w, t] for t in range(n_tasks))
        print(f"costs for worker {w}: {worker_cost:.2f}")
    

    This gives me

    costs for worker a: 165.00
    costs for worker b: 167.00
    costs for worker c: 164.00
    

    [1] To be exact, PuLP isn't a solver, it's just a modelling framework that can pass MILPs to MILP Solvers like CBC, SCIP, HiGHS etc.