pythonrlinear-programmingsolvermixed-integer-programming

Why is my Mixed-Integer Linear Programming problem running slow on python compared to R?


I have a very large mixed-integer linear programming problem that I need to run thousands of times, so speed is a priority.

I need it to run in both R and in python. My R code runs very fast, the solver takes about 0.002 seconds to solve the problem. My python code is a different story, it takes anywhere from of 0.05 to 0.80 seconds and often times much more.

I highly doubt that R is 25x more efficient than python, especially considering that (to my knowledge) both are using C code to solve the problem. I'm looking for help to determine how to make my python code faster. I assume it starts with getting a different package than the one I have now? I really need the python code to consistently be below .01 seconds.

Here is sample R code and python code that solve the same problem. In this example of my real problem, I'm building "fantasy basketball" lineups with certain constraints. Player pool is 500, must choose exactly 8 players with at least 1 and no more than 3 from each position. Salary must be under 50000. I randomly sample Positions/Salary since the actual solution isn't important as much as the speed.

R Code

library(lpSolve)
library(dplyr)

# Make dummy dataframe with player positions, salaries, and projected points
Positions <- c('PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C')
Position <- sample(Positions, 500, replace = TRUE)
Salary <- sample(seq(3000, 12000, by=100), 500, replace=TRUE)
playerTable <- data.frame(ID = 1:500, Position, Salary) %>% 
  mutate(ProjPts = round(Salary/1000*5))

# Make positional constraint vectors
ConVec_Position <- playerTable %>% 
  select(Position) %>% 
  mutate(AnyPG =  grepl('PG', Position)*1,
         AnySG = grepl('SG', Position)*1,
         AnySF = grepl('SF', Position)*1,
         AnyPF = grepl('PF', Position)*1,
         AnyC = grepl('C', Position)*1,
         
         orPGSGSFPFC = (AnyPG + AnySG + AnySF + AnyPF + AnyC > 0)*1
  ) %>% 
  select(-Position)

# Make salary constraint vectors
ConVec_Salary <- playerTable$Salary

# Get Objective (projected points)
ProjectedPoints <- playerTable$ProjPts

# Make constraint values and directions
ConValGT <- c(1,1,1,1,1, 8) # Must have at least one of each position, and 8 total
ConValLT <- c(3,3,3,3,2, 8, 50000) # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT <- c(rep(">=", ncol(ConVec_Position))) # Make >= directions for each position constraint
ConDirLT <- c(rep("<=", ncol(ConVec_Position) + 1)) # Make <= directions for each position constraint, plus 1 for Salary

# Compile constraint vectors, values, and directions
ConVec_Final <- cbind(ConVec_Position, ConVec_Position, ConVec_Salary)
ConDir_Final <- c(ConDirGT, ConDirLT)
ConVal_Final <- c(ConValGT, ConValLT)

start_time <- proc.time()

sol <- lp("max",
          objective.in = ProjectedPoints,
          const.mat = t(ConVec_Final),
          const.dir = ConDir_Final,
          const.rhs = ConVal_Final,
          binary.vec = 1:length(ProjectedPoints) # All decisions are binary
)

cat("\n--- It took ", round((proc.time() - start_time)[3], 4), " seconds to optimize lineups ---\n", sep = "")

python Code

import pandas as pd
import numpy as np
from lp_solve import *
import time

# Make dummy dataframe with player positions, salaries, and projected points
ID = range(500)
Positions = ['PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C']
Position = np.random.choice(Positions,500)
Salary = np.random.choice(range(3000,12000,100),500)
playerTable = pd.DataFrame({'ID': ID, 'Position': Position, 'Salary': Salary})
playerTable['ProjPts'] = playerTable['Salary']/1000*5

# Make positional constraint vectors
AnyPG = playerTable.Position.str.contains('PG')*1
AnySG = playerTable.Position.str.contains('SG')*1
AnySF = playerTable.Position.str.contains('SF')*1
AnyPF = playerTable.Position.str.contains('PF')*1
AnyC = playerTable.Position.str.contains('C')*1
AnyPos = [1]*500
ConVec_Position = pd.DataFrame({'AnyPG': AnyPG, 'AnySG': AnySG, 'AnySF': AnySF, 'AnyPF': AnyPF, 'AnyC': AnyC, 'AnyPos': AnyPos})

# Make salary constraint vectors
ConVec_Salary = playerTable[['Salary']]

# Get Objective (projected points)
ProjectedPoints = playerTable['ProjPts'].tolist()

# Make constraint values and directions
ConValGT = [1,1,1,1,1, 8] # Must have at least one of each position, and 8 total
ConValLT = [3,3,3,3,2, 8, 50000] # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT = [1] * (len(ConVec_Position.columns)) # Make >= directions for each position constraint
ConDirLT = [-1] * (len(ConVec_Position.columns) + 1) # Make <= directions for each position constraint, plus 1 for Salary

# Compile constraint vectors, values, and directions
ConVec_Final = pd.concat([ConVec_Position, ConVec_Position, ConVec_Salary], axis=1)
ConVal_Final = ConValGT + ConValLT
ConDir_Final = ConDirGT + ConDirLT

# Force all decisions to be binary
vLB = [0] * len(ProjectedPoints) # lower bound 0
vUB = [1] * len(ProjectedPoints) # upper bound 1
xint = [i+1 for i in range(len(ProjectedPoints))] # all decisions are integers (aka all are binary)

solveTimefix = time.time()
[obj, sol, duals] = lp_solve(ProjectedPoints, 
                             ConVec_Final.T.values.tolist(), 
                             ConVal_Final, 
                             ConDir_Final, 
                             vLB, 
                             vUB, 
                             xint)

solveTimeCurrRun = (time.time() - solveTimefix)
print("\n--- It took %s seconds to get all LUs ---\n" % round(solveTimeCurrRun, 2))

Any ideas on how to make my python code run the same speed as R?


Solution

  • On my craptop this takes about 4 ms, but importantly it is not apples-to-apples. I believe that you are missing a (large) constraint: each individual player can only be assigned to one position.

    from time import perf_counter
    
    import pandas as pd
    import numpy as np
    import scipy.sparse
    from scipy.optimize import milp, LinearConstraint, Bounds
    
    from numpy.random._generator import default_rng
    
    
    rand = default_rng(seed=0)
    
    # Make dummy dataframe with player positions, salaries, and projected points
    n = 500
    position_combos = ('PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C')
    salary_k = rand.integers(low=3, high=12, size=n)
    players = pd.DataFrame(
        {
            'Positions': rand.choice(position_combos, n),
            'Salary': salary_k * 1_000,
            'ProjPts': salary_k * 5,
        },
        index=pd.RangeIndex(name='ID', stop=n),
    )  # 500x3
    
    # Convert from slash-separated position option strings to a dense ID/Position multi-index
    exploded_positions = players.Positions.str.split('/', expand=True)  # 500x2
    position_values = np.array(('PG', 'SG', 'SF', 'PF', 'C'))
    position_sparse = pd.DataFrame(
        (
            exploded_positions.values[:, :, np.newaxis]
            == position_values[np.newaxis, np.newaxis, :]
        ).any(axis=1),
        index=players.index,
        columns=pd.Index(name='Position', data=position_values),
    )  # 500x5
    position_sparse[~position_sparse] = np.nan
    positions = position_sparse.stack().index  # 685x2
    
    # Sparse 2D position assignment matrix with guaranteed single one per column
    position_assignments = np.equal.outer(
        positions.get_level_values('Position'),
        position_values,
    ).astype(float).T  # 5x685
    
    # Sparse 2D player assignment matrix with guaranteed single one per column
    player_assignments = np.equal.outer(
        positions.get_level_values('ID'),
        players.index,
    ).astype(float).T  # 500x685
    
    # Dense salary assignment (row) with repeats according to player ID
    salary_assignments = players.Salary[positions.get_level_values('ID')]  # 685
    
    # Cost values: negative projected points, only where a player can take a given position;
    # multi-indexed by ID and Position
    c = -players.ProjPts[positions.get_level_values('ID')].astype(float)
    c.index = positions  # 685
    
    integrality = np.ones(shape=c.size, dtype=np.uint8)  # 685
    bounds = Bounds(
        lb=np.zeros_like(c),
        ub=np.ones_like(c),  # 685
    )
    
    A = scipy.sparse.csc_array(
        scipy.sparse.vstack((
            scipy.sparse.csr_array(np.ones_like(c)),       # The total number of selected players must be fixed
            scipy.sparse.csr_array(salary_assignments),    # There is a maximum salary
            scipy.sparse.csr_array(position_assignments),  # Each position count has a min and max
            scipy.sparse.csr_array(player_assignments),    # One player cannot occupy multiple positions
        ), format='csc')  # this is what milp uses; see _milp_iv()
    )  # 507x685
    lb, ub = np.block([
        [8, -np.inf, 1, 1, 1, 1, 1, np.full(shape=n, fill_value=-np.inf)],
        [8,  50_000, 3, 3, 3, 3, 2, np.ones(n)],
    ])
    
    t0 = perf_counter()
    result = milp(
        c=c, integrality=integrality, bounds=bounds,
        constraints=(LinearConstraint(A=A, lb=lb, ub=ub),),
    )
    t1 = perf_counter()
    assert result.success, result.message
    
    print(result.message)
    print(f'Solution in {1e3*(t1-t0):.1f} ms')
    
    outputs = pd.Series(name='Assigned', data=result.x, index=positions)
    selections = outputs[outputs > 0.5].index
    merged_outputs = players.loc[selections.get_level_values('ID')].set_index(selections)
    print(merged_outputs, end='\n\n')
    
    print('Total:')
    print(merged_outputs[['Salary', 'ProjPts']].sum().to_string())
    

    This uses HiGHS in the backend; I have not attempted lpsolve55. The output is

    Optimization terminated successfully. (HiGHS Status 7: Optimal)
    Solution in 3.4 ms
                 Positions  Salary  ProjPts
    ID  Position                           
    314 SF              SF   10000       50
    390 PG              PG    3000       15
    447 SG              SG    3000       15
    467 SF              SF   10000       50
    480 PF              PF    7000       35
    484 SG              SG    3000       15
    487 C                C    7000       35
    494 C                C    7000       35
    
    Total:
    Salary     50000
    ProjPts      250
    

    You could also try running on PuLP/CBC (I have not shown that here).