pythonmathematical-optimizationlinear-programmingpyomoglpk

Pyomo Objective takes too long to construct


I have a Pyomo objective function that consists of 2 nested loops (total 1,000,000 loops) and it takes about 40 secs for Pyomo to construct it on my computer.

m.obj = Objective(
    expr=sum(
        costs[int(w)][int(t)] * m.Assign[w, t] * (int(w) + int(t))
        for w in workers # len 1000
        for t in tasks   # len 1000
    )
)

Is there a way for us to speed up the creation of the objective function? Such as using multiprocessing, or manually construct the objective expression ourselves instead then passing it to Pyomo instead of having Pyomo perform the construction?

My example Pyomo code:

from pyomo.environ import *
import numpy as np

SIZE = 1000
costs = np.random.randint(1, 15, (SIZE, SIZE))

m = ConcreteModel()
m.W = Set(initialize=[str(i) for i in range(1, SIZE)])
m.T = Set(initialize=[str(i) for i in range(1, SIZE)])
m.Assign = Var(m.W, m.T, domain=Binary)

m.obj = Objective(
    expr=sum(
        costs[int(w)][int(t)] * m.Assign[w, t] * (int(w) + int(t))
        for w in m.W
        for t in m.T
    )
)


def c1_rule(m, t):
    return sum(m.Assign[w, t] for w in m.W) == 1


def c2_rule(m, w):
    return sum(m.Assign[w, t] for t in m.T) == 1


m.c1 = Constraint(m.T, rule=c1_rule)
m.c2 = Constraint(m.W, rule=c2_rule)

solver = SolverFactory("glpk")
results = solver.solve(m)

Solution

  • First, note that you have a slight off-by-one error in your example (the sizes of W, and T do not match SIZE because Python's range(1, SIZE) stops 1 before the second argument).

    You are getting hit by a known performance "quirk" when working with NumPy. NumPy is very fast when all the operations can stay on the "C" (compiled) side. However, the way you have written your objective, each element of the costs array is brought back from the "C" side into a Python float before hitting the Pyomo operator overloading.

    Given your example:

    from pyomo.common.timing import tic, toc
    from pyomo.environ import *
    import numpy as np
    
    SIZE = 1000
    tic(None)
    
    m = ConcreteModel()
    m.W = Set(initialize=[str(i) for i in range(1, SIZE+1)])
    m.T = Set(initialize=[str(i) for i in range(1, SIZE+1)])
    m.Assign = Var(m.W, m.T, domain=Binary)
    toc("Constructed Var")
    costs = np.random.randint(1, 15, (SIZE, SIZE))
    toc("numpy generated random numbers")
    @m.Objective()
    def obj(m):
        return sum(
            costs[int(w)-1][int(t)-1] * m.Assign[w, t] * (int(w) + int(t))
            for w in m.W
            for t in m.T
        )
    toc("Constructed Objective")
    

    I see

    [+   1.45] Constructed Var
    [+   0.01] numpy generated random numbers
    [+  10.63] Constructed Objective
    

    If instead we avoided numpy entirely, we could save some time:

    import random
    
    m = ConcreteModel()
    m.W = Set(initialize=[str(i) for i in range(1, SIZE+1)])
    m.T = Set(initialize=[str(i) for i in range(1, SIZE+1)])
    m.Assign = Var(m.W, m.T, domain=Binary)
    toc("Constructed Var")
    tic(None)
    costs = {(w, t): random.randint(1, 15) for w in m.W for t in m.T}
    toc("python generated random numbers")
    @m.Objective()
    def obj(m):
        return sum(
            costs[w,t] * m.Assign[w, t] * (int(w) + int(t))
            for w in m.W
            for t in m.T
        )
    toc("Constructed Objective")
    

    gives

    [+   1.45] Constructed Var
    [+   0.79] python generated random numbers
    [+   5.59] Constructed Objective
    

    The time savings are solely due to not having to cross the Python / C boundary within the loop. We can save a little more time if we know how Pyomo's operator overloading works and restructure the expression to combine float terms directly and not rely on the Pyomo expression system to simplify things. This example also switches to using 0-based int indexing to avoid converting things from / to strings (mostly for convenience - the conversions were not a huge part of the time - only about 0.2 seconds):

    m = ConcreteModel()
    m.W = Set(initialize=range(SIZE))
    m.T = Set(initialize=range(SIZE))
    m.Assign = Var(m.W, m.T, domain=Binary)
    toc("Constructed Var")
    costs = {(w, t): random.randint(1, 15) for w in m.W for t in m.T}
    toc("python generated random numbers")
    @m.Objective()
    def obj(m):
        return sum(
            costs[w, t] * (2 + w + t) * m.Assign[w, t]
            for w in m.W
            for t in m.T
        )
    toc("Constructed Objective")
    

    Giving us another modest performance gain:

    [+   1.43] Constructed Var
    [+   0.79] python generated random numbers
    [+   4.06] Constructed Objective
    

    Finally, recent Pyomo versions support better (although not perfect) integration with NumPy. In particular some vectorized operations are supported:

    m = ConcreteModel()
    m.W = Set(initialize=range(SIZE))
    m.T = Set(initialize=range(SIZE))
    m.Assign = Var(m.W, m.T, domain=Binary)
    toc("Constructed Var")
    costs = np.random.randint(1, 15, (SIZE, SIZE))
    factor = np.array(range(SIZE)).repeat(SIZE).reshape(SIZE, SIZE)
    factor = 2 + factor + factor.transpose()
    toc("numpy generated random numbers")
    @m.Objective()
    def obj(m):
        return sum(sum(costs * factor * m.Assign))
    toc("Constructed Objective")
    

    Giving a solution that is almost 3x faster than your original model:

    [+   1.44] Constructed Var
    [+   0.01] numpy generated random numbers
    [+   3.62] Constructed Objective