pythonscipycurve-fittinggenetic-algorithmscipy-optimize

Differential evolution fails when worker argument is added


To set up my differential_evolve curve-fit function, I borrowed heavily from https://bitbucket.org/zunzuncode/ramanspectroscopyfit/src/master/RamanSpectroscopyFit.py. My implemented function works perfectly, when I don't use "workers".

My minimal reproducible example:

#### Python 3.9 ####
#### Windows 10 ####

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings
from scipy.optimize import differential_evolution as DE


def LCR(f, L, C, R):
    TPF = 2*np.pi*f
    return np.sqrt(R**2 + (TPF*L - 1/(TPF*C))**2)

def sumOfSquaredError(parameterTuple):
    warnings.filterwarnings("ignore")
    return np.sum((ydata - LCR(xdata, *parameterTuple))**2)

def generateParameterBounds():
        parameterBounds = []
        parameterBounds.append([1,5]) # parameter bounds for L
        parameterBounds.append([1,5]) # parameter bounds for C
        parameterBounds.append([1,5]) # parameter bounds for R
        result = DE(sumOfSquaredError, parameterBounds, popsize=30, init='sobol', polish=False, seed=3)
        # result = DE(sumOfSquaredError, parameterBounds, popsize=30, init='sobol', polish=False, workers=2, seed=3)
        return result.x

xdata = np.linspace(1e-3,1,1000)
ydata = LCR(xdata,1.5,2,2.5) + np.random.randn(len(xdata))
plt.plot(xdata, ydata, 'b-', label='Measured')

#### REGULAR CURVE FIT ####
popt, pcov = curve_fit(LCR, xdata, ydata, bounds=([1, 1, 1], [5, 5, 5]))
print(*popt)
fit = LCR(xdata, *popt)
plt.plot(xdata, fit, 'g-')

#### DIFFERENTIAL EVOLUTION CURVE FIT ####
ParameterRanges = generateParameterBounds()
geneticParameters, pcov = curve_fit(LCR, xdata, ydata, ParameterRanges, maxfev=1000000)
print(*geneticParameters)
g_fit = LCR(xdata, *geneticParameters)
plt.plot(xdata, g_fit, 'r-')

plt.show()

When I comment / uncomment the two result = DE(...) lines, which adds the workers parameter, I get the following error.

Error:

RuntimeError:
    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

I am not sure how to utilize this error comment to update my code. I'd like to stay within the package of scipy.optimize, if possible.


Solution

  • You need to change a few different things to make this work.

    When you have a program in Python that uses multiprocessing with the spawn start method, for any piece of work which you want to only be done by the parent process, you need to wrap with if __name__ == '__main__':. For example:

    def main():
        # code that you had outside of any functions before
        # ...
    
    if __name__ == '__main__':
        main()
    

    (There's an exception to this, which is that for Python versions before 3.14 on Linux, the fork start method is used by default, which does allow you to put code in the module scope. This is because fork operates by copying the parent process to create the child process, so your code will not be imported multiple times. Using if __name__ == '__main__': is still a best-practice, however.)

    If you don't do this, differential_evolution() will create a new process to run your code. This new process won't have your code loaded, so it will import it. Importing your code will run everything not within a function or a if __name__ == '__main__':, including differential_evolution(). This is an infinite loop.

    The second thing you need to change is that the function you want to run, sumOfSquaredError() uses the xdata and ydata variables as global variables. If you apply the advice I just gave, then you won't have these global variables in the child process. (This is due to two factors: the creation of those variables happened in a different process, and they are now local variables.) Instead, you need to pass these using the args argument.

    Example:

    def LCR(f, L, C, R):
        TPF = 2*np.pi*f
        return np.sqrt(R**2 + (TPF*L - 1/(TPF*C))**2)
    
    def sumOfSquaredError(parameterTuple, xdata, ydata):
        warnings.filterwarnings("ignore")
        return np.sum((ydata - LCR(xdata, *parameterTuple))**2)
    
    def generateParameterBounds(xdata, ydata):
        parameterBounds = []
        parameterBounds.append([1,5]) # parameter bounds for L
        parameterBounds.append([1,5]) # parameter bounds for C
        parameterBounds.append([1,5]) # parameter bounds for R
        result = DE(
            sumOfSquaredError,
            parameterBounds,
            args=(xdata, ydata),
            popsize=30,
            init='sobol',
            polish=False,
            seed=3,
            workers=2,
        )
        return result.x
    

    Notice that I have changed sumOfSquaredError() to include arguments for xdata and ydata. I have also added args=(xdata, ydata), to the call to differential_evolution(). This means that when differential_evolution() calls sumOfSquaredError(), it will add on those extra arguments on the end of the parameters to sumOfSquaredError(). You can read the documentation to learn more.

    Here is the full corrected code.

    #### Python 3.9 ####
    #### Windows 10 ####
    
    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    import warnings
    from scipy.optimize import differential_evolution as DE
    
    
    def LCR(f, L, C, R):
        TPF = 2*np.pi*f
        return np.sqrt(R**2 + (TPF*L - 1/(TPF*C))**2)
    
    def sumOfSquaredError(parameterTuple, xdata, ydata):
        warnings.filterwarnings("ignore")
        return np.sum((ydata - LCR(xdata, *parameterTuple))**2)
    
    def generateParameterBounds(xdata, ydata):
        parameterBounds = []
        parameterBounds.append([1,5]) # parameter bounds for L
        parameterBounds.append([1,5]) # parameter bounds for C
        parameterBounds.append([1,5]) # parameter bounds for R
        result = DE(
            sumOfSquaredError,
            parameterBounds,
            args=(xdata, ydata),
            popsize=30,
            init='sobol',
            polish=False,
            seed=3,
            workers=2,
        )
        return result.x
    
        
    def main():
        xdata = np.linspace(1e-3,1,1000)
        ydata = LCR(xdata,1.5,2,2.5) + np.random.randn(len(xdata))
        plt.plot(xdata, ydata, 'b-', label='Measured')
    
        #### REGULAR CURVE FIT ####
        popt, pcov = curve_fit(LCR, xdata, ydata, bounds=([1, 1, 1], [5, 5, 5]))
        print(*popt)
        fit = LCR(xdata, *popt)
        plt.plot(xdata, fit, 'g-', label='curve_fit only')
    
        #### DIFFERENTIAL EVOLUTION CURVE FIT ####
        ParameterRanges = generateParameterBounds(xdata, ydata)
        geneticParameters, pcov = curve_fit(LCR, xdata, ydata, ParameterRanges, maxfev=1000000)
        print(*geneticParameters)
        g_fit = LCR(xdata, *ParameterRanges)
        plt.plot(xdata, g_fit, 'r-', label='genetic + curve_fit')
        plt.legend()
        plt.show()
    
    
    if __name__ == '__main__':
        # I used these two lines to reproduce your problem on Linux
        # Not required on Windows
        # import multiprocessing
        # multiprocessing.set_start_method('spawn')
        main()