pythonnumpy-ndarrayscipy-optimizedifferential-evolution

Using genetic algorithm in a function (locally) - minimizing function


I have been trying to use the genetic algorithm in order to fit my power law in different experiments. And the problem is that I do not really understand it completely. I think I understand everything but the sumOfSquaredError(parameterTuple) function.

The code that I have been trying to understand and use, it is one commonly shared in the community which is built from functions called:

Then what I do is to use the main function which has a pandas.DataFrame as an input. This function selects specific rows and columns from the dataframe (in order to get a single experiment) and to process it.

What I have different from the general code posted is that I have this function with a pandas data frame input, which is nothing more than an indexed dataset. Then, I impose some conditions on it in order to extract specific data from the dataset in order that each iteration only takes one experiment of the whole dataset and try to define it as xData and yData.


Actualization

If trying to input xData and yData:

yData = np.asarray(y_eff)
xData = np.asarray(x)

# diff_evolution completes by calling curve_fit() using param. bounds
geneticParameters = generate_Initial_Parameters(xData, yData)

In order to define sumOfSquaredError(xData, yData, parameterTuple):

    # "seed" the numpy random number generator for repeatable results
result = scipy.optimize.differential_evolution(
    sumOfSquaredError(xData, yData, parameterBounds),
    parameterBounds, seed=3)

It returns:

File "/home/josep/programa.py", line 361, in get_Results geneticParameters = generate_Initial_Parameters(xData, yData)

File "/home/josep/programa.py", line 267, in generate_Initial_Parameters parameterBounds, seed=3)

File "/home/josep/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentialevolution.py", line 276, in differential_evolution ret = solver.solve()

File "/home/josep/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentialevolution.py", line 688, in solve self.population)

File "/home/josep/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentialevolution.py", line 794, in _calculate_population_energies raise RuntimeError("The map-like callable must be of the"

RuntimeError: The map-like callable must be of the form f(func, iterable), returning a sequence of numbers the same length as 'iterable'


The error that I face is:

I think I got the problem when calling the sumOfSquaredError(parameterTuple) inside of a function because this function is using variables that are defined as global variables. This might also have to do with scipy.optimize.differential_evolution().

If I try to input xData and yData as input parameters then it asks me to input also the parameterTuple, which I do not understand where it comes from because it is not defined anywhere.

def sumOfSquaredError(parameterTuple):
    """
    Input to the genetic algorithm.

    Function for the genetic algorithm to minimize (sum of squared error).

    Parameters: xData, yData, parameterTuple
    ----------

    Returns
    -------
    Squared difference between experimental data and predicted data
    """
    # do not print warnings by genetic algorithm
    warnings.filterwarnings("ignore")
    val = func(xData, *parameterTuple)
    return np.sum((yData - val) ** 2.0)

The other functions that I am using are:

def generate_Initial_Parameters():
    """
    Generate initial parameters based on SciPy's genetic algorithm.

    Returns
    -------
    result.x is the optimization result

    """
    parameterBounds = []
    parameterBounds.append([0, 100.0])  # search bounds for a
    parameterBounds.append([0, 100.0])  # search bounds for b

    # "seed" the numpy random number generator for repeatable results
    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
    return result.x

And this is kind of the main function I call in the code where I input the df and select the specific rows at each iteration. (I also write the imports here if anyone wants them)

import numpy
import pandas as pd
import matplotlib.pyplot as plt
import scipy as scipy
from scipy.optimize import differential_evolution
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score
import warnings

def get_Results(df):
    """
    Process df.

    Parameters.
    ----------
    df : Dataframe with the data

    Returns results which is a DataFrame with columns describing
    regression coefficients
    -------
    """
    columns_results = 'Full_Name', 'Slope', 'Y-intercept', 'MSE', 'R2_score'

    results = pd.DataFrame(columns=columns_results)

# Processing - Ht(%) = 0
    sample_names = df['Sample'][
        df['Ht(%)'] == 0].drop_duplicates()

    for i in range(len(sample_names)):
        df_i = df[(df['Sample'] == sample_names.values[i])
                     & (df['Ht(%)'] == 0.0)]

        name = 'Ht(%)=' + str(df_i['Ht(%)'].values[i]) \
            + '    ' + 'Sample: ' + df_i['Sample'].values[i]

        y_o = pd.DataFrame(df_i['Pressure(Pa)'].values)
        x_o = pd.DataFrame(df_i['Velocity(um/s)'].values)

# Create linear regression object
        regr = linear_model.LinearRegression()
# Train the model using the dataset
        regr.fit(x_o, y_o)  # fit(self, X, y[, sample_weight]) - Fit linear model.
# Make predictions using the estimator set
        y_pred = regr.predict(x_o)

        yData = numpy.asarray(y_o)
        xData = numpy.asarray(x_o)

        # diff_evolution completes by calling curve_fit() using param. bounds
        geneticParameters = generate_Initial_Parameters()

        # now call curve_fit without passing bounds from the genetic algorithm,
        # just in case the best fit parameters are aoutside those bounds
        fittedParameters, pcov = scipy.optimize.curve_fit(
            func, xData, yData, geneticParameters)
        print('Fitted parameters:', fittedParameters)
        print()
        modelPredictions = func(xData, *fittedParameters)

        absError = modelPredictions - yData

        SE = numpy.square(absError)  # Squared Errors
        MSE = numpy.mean(SE)  # Mean Squared Errors
        RMSE = numpy.sqrt(MSE)  # Root Mean Squared Error, RMSE
        Rsquared = 1.0 - (numpy.var(absError) / numpy.var(y_o))

        print()
        print('RMSE:', RMSE)
        print('R-squared:', Rsquared)


# Fill up results DataFrame
        df = pd.DataFrame(
            [[name,
              regr.coef_[0, 0], regr.intercept_[0],
              mean_squared_error(y_o, y_pred),
              r2_score(y_o, y_pred)]], columns=columns_results)

        results = results.append(df, ignore_index=True)

    return results

Actualization - SOLUTION

As @jeremy_rutman suggested it is just a question of defining xData and yData globally inside the function. However, the problem of it was that the dimensions of the arrays that I was defining were not the needed ones.

The correct code to introduce when defining them is as follows

global xData
global yData

yData = np.asarray(y_eff).reshape(len(y_eff))
xData = np.asarray(x).reshape(len(x)) 

(In my case the problem was that when defining globally it returned: Value error: object too deep for desired array error: Result from function call is not a proper array of floats )


Solution

  • The problem may be entirely due to the declaration

    def sumOfSquaredError(parameterTuple):
    

    which apparently should be

    def sumOfSquaredError(xdata,ydata,parameterTuple):
    

    although it seems the use of func here is also undeclared.