pythonmatplotlibcurve-fittingyfinance

Can’t get exponential curve_fit to work with dates


I’m trying to plot a curve_fit for the S&P 500.

I’m successful (I think) at performing a linear fit/plot. When I try to get an exponential curve_fit to work, I get this error:

Optimal parameters not found: Number of calls to function has reached maxfev = 800.
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
from scipy.optimize import curve_fit

# get data
df = yf.download("SPY", interval = '1mo')
df = df.reset_index()

def func(x, a, b):
     return a * x + b
# ??     return a * np.exp(-b * x) + c
# ??     return a*(x**b)+c
# ??     return a*np.exp(b*x)

# create data arrays
# convert Date to numeric for curve_fit ??
xdata = df['Date'].to_numpy().astype(np.int64)//10**9
ydata = df['Close'].to_numpy()

# p0 = (?, ?, ?) use guesses?
popt, pcov = curve_fit(func, xdata, ydata)

print(popt)

y_pred = func(xdata, *popt)

plt.plot(xdata, ydata)
plt.plot(xdata, y_pred, '-')
plt.show()

enter image description here

Am I dealing with dates correctly?

Should I be doing a p0 initial guess?

This question/solution may provide some clues.

It would be nice to have the x-axis labeled in a date format (but not important right now).


Solution

  • In addition to normalizing the data, it is important to actually choose a good function. In your example you had:

    def func(x, a, b):
         return a * x + b
    # ??     return a * np.exp(-b * x) + c
    # ??     return a*(x**b)+c
    # ??     return a*np.exp(b*x)
    

    The correct one, when you say you want to fit an exponential, should be this, IMO:

    # define the exponential growth function, before you had exponential decay because of -b
    def ExponentialGrowth(x, a, b, c):
        return a * np.exp(b * x) + c # + c due to account for offset
    

    The power function might work as well, I did not check. Anyways, here's the code:

    # define the exponential growth function, before you had exponential decay because of -b
    def ExponentialGrowth(x, a, b, c):
        return a * np.exp(b * x) + c # + c due to account for offset
    # get data
    x = df['Date'].to_numpy().astype(np.int64)//10**9
    y = df['Close'].to_numpy()
    # apply z normalization
    xNorm = (x - x.mean()) / x.std()
    yNorm = (y - y.mean()) / y.std()
    # get the optimal parameters
    popt, pcov = curve_fit(ExponentialGrowth, xNorm, yNorm)
    # get the predicted but in the normalized range
    yPredNorm = ExponentialGrowth(xNorm, *popt)
    # reverse normalize the predicted values
    yPred = yPredNorm * (y.std()) + y.mean()
    plt.figure()
    plt.scatter(df['Date'], y, 1)
    plt.plot(df['Date'], yPred, 'r-')
    plt.grid()
    plt.legend(["Raw", "Fitted"])
    plt.xlabel("Year")
    plt.ylabel("Close")
    

    And the results:

    results

    If you will eventually need to get initial guesses, you can search online how to get the initial guesses for any function. For example, if I am fitting an exponential growth function and I know that the data has an offset of 100, I can set the initial guess of c to a 100...

    Hope this helps you.