pythonoptimizationpulpminimization

How to minimize problem using Pulp when the parameter to minimize is evaluated using a function?


I have a class that evaluates a result for a given input list of parameters. I would like to know the best set of parameters that minimises the result given by the class. Since the input must remain integer, I read that Pulp might be a solution compared to Scipy.minimize().

I tried to implement it, but the line prob += ... doesn't work. Here is a simplified version of the code that reproduces the problem I have. It seems that I don't understand something important about Pulp.

import pandas as pd
import pulp as pl

class ComplexClass():
    
    def __init__(self, path_data:str):
        # self.data = pd.read_csv(path_data)
        d = {'col1': [i for i in range(100)], 'col2': [i**2 for i in range(100)]}
        self.df = pd.DataFrame(data=d)
        
    def eval_value(self, param: list):
        self.p1 = param[0]
        self.p2 = param[1]
        
        # [...] Complexe stuff manipulating pandas DataFrame
        self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean()
        self.df['col4'] = self.df['col2'].rolling(window=self.p2).mean()
        self.df['col5'] = self.df['col3']+self.df['col4']
        self.best_value = self.df['col5'].iloc[-1] 
        
def func_to_minimize(input_parm:list, initialized_class):
    initialized_class.eval_value(input_parm)
    return initialized_class.best_value

path = './path2data/data.csv'
my_class = ComplexClass(path_data=path)

# ===== INIT TEST =====
my_param = [2, 10]
print(f'For input {my_param}\n    => Value to minimize = {func_to_minimize(input_parm=my_param, initialized_class=my_class)}')

# ====== OPTIMIZATION WITH PULP =====
# Create the 'prob' variable to contain the problem data
prob = pl.LpProblem("Best_Param", pl.LpMinimize)
    
# The 2 variables Beef and Chicken are created with a lower limit of zero
x1 = pl.LpVariable("param1", lowBound=2, upBound=10, cat=pl.LpInteger)
x2 = pl.LpVariable("param2", lowBound=2, upBound=20, cat=pl.LpInteger)

# The objective function is added to 'prob' first
prob += func_to_minimize([x1, x2], my_class)    # <= doesn't work

# The problem data is written to an .lp file
prob.writeLP("Best_Param.lp")

# The problem is solved using PuLP's choice of Solver
prob.solve()

The error message is when evaluating self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean() and is: ValueError: window must be an integer 0 or greater. Indeed, the self.p1 used in window is a LpVariable and not an integer.

How to minimize a problem using Pulp when the result to be minimised must be evaluated by a function and not by a line?

Thank you.


Solution

  • The problem is that you're trying to use PuLP, which is an linear programming (LP) optimization library with a function that involves operations that aren't inherently linear or integer-programmable. PuLP is specifically designed for linear programming and integer programming, meaning that the objective function and constraints must be formulated as linear (or affine) relationships. Here's how you can address the issue you're encountering:

    Understanding the Problem

    PuLP can't directly handle objective functions that are evaluated by non-linear operations or involve calls to other Python functions or methods that manipulate data dynamically, such as rolling window calculations in pandas. PuLP variables (LpVariable) are not actual numbers until after the optimization problem is solved. Therefore, they can't be used directly in calculations or functions that require concrete numeric inputs during problem definition.

    Possible Solution

    Given that you need to optimize parameters based on evaluations done by a class method that processes data dynamically, one possible approach would be to use metaheuristic algorithms such as Genetic Algorithms, Simulated Annealing, or others that do not require the gradient of the objective function or even its linearity. Libraries like deap, simanneal, or pyeasyga can be useful here. These algorithms can handle arbitrary objective functions and can work with integer constraints.

    Example Implementation Using a Genetic Algorithm

    Here's a conceptual outline using a genetic algorithm (via the deap library) to handle the problem:

    NOTE: Make sure you have the deap library installed, or install it using pip install deap before running the example below.

    import random
    import pandas as pd
    from deap import base, creator, tools, algorithms
    
    class ComplexClass():
    
        def __init__(self, path_data:str):
            # self.data = pd.read_csv(path_data)
            d = {'col1': [i for i in range(100)], 'col2': [i**2 for i in range(100)]}
            self.df = pd.DataFrame(data=d)
            
        def eval_value(self, param: list):
            self.p1 = param[0]
            self.p2 = param[1]
            
            # [...] Complexe stuff manipulating pandas DataFrame
            self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean()
            self.df['col4'] = self.df['col2'].rolling(window=self.p2).mean()
            self.df['col5'] = self.df['col3'] + self.df['col4']
            self.best_value = self.df['col5'].iloc[-1]
    
    
    def func_to_minimize(input_parm:list, initialized_class):
        initialized_class.eval_value(input_parm)
        return initialized_class.best_value
    
    
    # Define the evaluation function for the genetic algorithm
    def evalGA(params):
        result = func_to_minimize(params, my_class)
        return (result,)
    
    
    # Setting up the genetic algorithm
    my_class = ComplexClass("")
    creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
    creator.create("Individual", list, fitness=creator.FitnessMin)
    
    toolbox = base.Toolbox()
    
    # Register a function named "attribute_param1" that generates random numbers
    # between 2 and 10. This function will be used to generate values for the
    # variable "param1"
    toolbox.register("attribute_param1", random.randint, 2, 10)
    
    # Do the same process to register a function that generates values for the
    # variable "param2". Change only the interval that for "param2" should be
    # between 2 and 20
    toolbox.register("attribute_param2", random.randint, 2, 20)
    
    # Register another function that is equivalent to generating values for
    # "param1" and "param2" one time
    toolbox.register("individual", tools.initCycle, creator.Individual,
                     (toolbox.attribute_param1, toolbox.attribute_param2),
                     n=1)
    
    toolbox.register("population", tools.initRepeat, list, toolbox.individual)
    toolbox.register("evaluate", evalGA)
    toolbox.register("mate", tools.cxTwoPoint)
    toolbox.register("mutate", tools.mutUniformInt, low=[2, 2], up=[10, 20], indpb=0.2)
    toolbox.register("select", tools.selTournament, tournsize=3)
    
    # Create population and run the genetic algorithm
    population = toolbox.population(n=50)
    result = algorithms.eaSimple(population, toolbox, cxpb=0.5, mutpb=0.2, ngen=40, verbose=True)
    
    best_individual = tools.selBest(population, 1)[0]
    
    print('Best Parameters:', best_individual)
    print('Best Score:', best_individual.fitness.values[0])
    # Prints:
    # 
    # Best Parameters: [10, 20]
    # Best Score: 8138.0
    

    This approach directly evaluates your class method within the genetic algorithm's fitness function, allowing for any complexity in the computation that you need.

    You can find more information on how to use deap here: https://deap.readthedocs.io/en