I have a class that evaluates a result for a given input list of parameters. I would like to know the best set of parameters that minimises the result given by the class. Since the input must remain integer, I read that Pulp might be a solution compared to Scipy.minimize().
I tried to implement it, but the line prob += ...
doesn't work. Here is a simplified version of the code that reproduces the problem I have. It seems that I don't understand something important about Pulp.
import pandas as pd
import pulp as pl
class ComplexClass():
def __init__(self, path_data:str):
# self.data = pd.read_csv(path_data)
d = {'col1': [i for i in range(100)], 'col2': [i**2 for i in range(100)]}
self.df = pd.DataFrame(data=d)
def eval_value(self, param: list):
self.p1 = param[0]
self.p2 = param[1]
# [...] Complexe stuff manipulating pandas DataFrame
self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean()
self.df['col4'] = self.df['col2'].rolling(window=self.p2).mean()
self.df['col5'] = self.df['col3']+self.df['col4']
self.best_value = self.df['col5'].iloc[-1]
def func_to_minimize(input_parm:list, initialized_class):
initialized_class.eval_value(input_parm)
return initialized_class.best_value
path = './path2data/data.csv'
my_class = ComplexClass(path_data=path)
# ===== INIT TEST =====
my_param = [2, 10]
print(f'For input {my_param}\n => Value to minimize = {func_to_minimize(input_parm=my_param, initialized_class=my_class)}')
# ====== OPTIMIZATION WITH PULP =====
# Create the 'prob' variable to contain the problem data
prob = pl.LpProblem("Best_Param", pl.LpMinimize)
# The 2 variables Beef and Chicken are created with a lower limit of zero
x1 = pl.LpVariable("param1", lowBound=2, upBound=10, cat=pl.LpInteger)
x2 = pl.LpVariable("param2", lowBound=2, upBound=20, cat=pl.LpInteger)
# The objective function is added to 'prob' first
prob += func_to_minimize([x1, x2], my_class) # <= doesn't work
# The problem data is written to an .lp file
prob.writeLP("Best_Param.lp")
# The problem is solved using PuLP's choice of Solver
prob.solve()
The error message is when evaluating self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean()
and is: ValueError: window must be an integer 0 or greater
. Indeed, the self.p1
used in window is a LpVariable
and not an integer.
How to minimize a problem using Pulp when the result to be minimised must be evaluated by a function and not by a line?
Thank you.
The problem is that you're trying to use PuLP, which is an linear programming (LP) optimization library with a function that involves operations that aren't inherently linear or integer-programmable. PuLP is specifically designed for linear programming and integer programming, meaning that the objective function and constraints must be formulated as linear (or affine) relationships. Here's how you can address the issue you're encountering:
PuLP can't directly handle objective functions that are evaluated by non-linear
operations or involve calls to other Python functions or methods that manipulate data dynamically,
such as rolling window calculations in pandas
.
PuLP variables (LpVariable
) are not actual numbers until after the optimization problem is solved.
Therefore, they can't be used directly in calculations or functions that require
concrete numeric inputs during problem definition.
Given that you need to optimize parameters based on evaluations done by a class
method that processes data dynamically, one possible approach would be to use
metaheuristic algorithms such as
Genetic Algorithms,
Simulated Annealing,
or others that do not require the gradient of the objective function or
even its linearity. Libraries like deap
,
simanneal
,
or pyeasyga
can be useful here.
These algorithms can handle arbitrary objective functions and can work with
integer constraints.
Here's a conceptual outline using a genetic algorithm (via the deap
library) to handle the problem:
NOTE: Make sure you have the
deap
library installed, or install it usingpip install deap
before running the example below.
import random
import pandas as pd
from deap import base, creator, tools, algorithms
class ComplexClass():
def __init__(self, path_data:str):
# self.data = pd.read_csv(path_data)
d = {'col1': [i for i in range(100)], 'col2': [i**2 for i in range(100)]}
self.df = pd.DataFrame(data=d)
def eval_value(self, param: list):
self.p1 = param[0]
self.p2 = param[1]
# [...] Complexe stuff manipulating pandas DataFrame
self.df['col3'] = self.df['col1'].rolling(window=self.p1).mean()
self.df['col4'] = self.df['col2'].rolling(window=self.p2).mean()
self.df['col5'] = self.df['col3'] + self.df['col4']
self.best_value = self.df['col5'].iloc[-1]
def func_to_minimize(input_parm:list, initialized_class):
initialized_class.eval_value(input_parm)
return initialized_class.best_value
# Define the evaluation function for the genetic algorithm
def evalGA(params):
result = func_to_minimize(params, my_class)
return (result,)
# Setting up the genetic algorithm
my_class = ComplexClass("")
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessMin)
toolbox = base.Toolbox()
# Register a function named "attribute_param1" that generates random numbers
# between 2 and 10. This function will be used to generate values for the
# variable "param1"
toolbox.register("attribute_param1", random.randint, 2, 10)
# Do the same process to register a function that generates values for the
# variable "param2". Change only the interval that for "param2" should be
# between 2 and 20
toolbox.register("attribute_param2", random.randint, 2, 20)
# Register another function that is equivalent to generating values for
# "param1" and "param2" one time
toolbox.register("individual", tools.initCycle, creator.Individual,
(toolbox.attribute_param1, toolbox.attribute_param2),
n=1)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", evalGA)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutUniformInt, low=[2, 2], up=[10, 20], indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)
# Create population and run the genetic algorithm
population = toolbox.population(n=50)
result = algorithms.eaSimple(population, toolbox, cxpb=0.5, mutpb=0.2, ngen=40, verbose=True)
best_individual = tools.selBest(population, 1)[0]
print('Best Parameters:', best_individual)
print('Best Score:', best_individual.fitness.values[0])
# Prints:
#
# Best Parameters: [10, 20]
# Best Score: 8138.0
This approach directly evaluates your class method within the genetic algorithm's fitness function, allowing for any complexity in the computation that you need.
You can find more information on how to use deap
here: https://deap.readthedocs.io/en