I have a dataset that follows a sigmoid curve, and I've observed that the sigmoid function becomes steeper over time. I'm looking for guidance on creating a model to fit this curve using Python. This question is similar to this post expect that the data curve becomes steeper over time. I've attempted to use a logistic regression to fit the data (as the other post suggests), but I need to incorporate a time-dependent parameter for the steepness.
The data is across a single day. In the images below, I've time segmented the data by hour. Meaning, each color represents data from a specific hour in the day. First, observe that for each color, the data follows a sigmoid curve (this is especially apparent for the orange data and is a property of the data I'm working with). Second, notice that the different colors follow different sigmoid steepnesses. For example, the blue curve follows a steeper sigmoid curve than the orange one does. From what I observed, the later in the day the data is from, the steeper the sigmoid curves become. I've attempted to show this property by drawing out the sigmoid curves in the second image.
Here was my attempt to fit this data:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Define a logistic function with a time-dependent parameter
def time_dependent_logistic(x, k, t):
return 1 / (1 + np.exp(-k * (x - t)))
# TODO: I replaced the variables below with my actual data
X = ... # Example: np.random.rand(100) * 10
time = ... # Example: np.linspace(0, 1, 100)
y = time_dependent_logistic(X, k=5 * time, t=2) + 0.05 * np.random.randn(100)
# Fit the time-dependent logistic function to the data
popt, pcov = curve_fit(time_dependent_logistic, X, y, bounds=([0, 0], [np.inf, np.inf]))
# Generate predictions using the fitted parameters
X_test = np.linspace(min(X), max(X), 300)
y_pred = time_dependent_logistic(X_test, *popt)
# Plot the original data and the fitted logistic curve
plt.scatter(X, y, label='Original data')
plt.plot(X_test, y_pred, label='Fitted time-dependent logistic curve', color='red', linewidth=2)
plt.xlabel('Input')
plt.ylabel('Output')
plt.legend()
plt.show()
However, the graph was not fitting the data properly. It simply produces a flat line:
I've researched ways into fitting the data properly so that the curve does not produce a flat line, but this is my first time attempting to model data like this, so I'm unsure if I'm even taking the right approach. Any guidance would be great. Thank you very in advance for your help. I've been very stuck on this problem for a long time, and I'm open to other methods of fitting the data as well.
EDIT:
I uploaded my data to Google Drive in case anyone wants to play around with it: https://drive.google.com/file/d/1aDB8U6Cn8lFo1TWFSRXWX-X0aEpCQ4sT/view?usp=sharing. Just a note, I shifted the X and Time columns to make the minimum 0, so it's not exactly the same data shown in the graphs.
Well, I tried to fit a "switch" curve y=y0+A.tanh(c(x-x0)) to each hour of data (individually). The parameters x0, y0, A, c appear to depend on the hour, but it's difficult to say what hour of the day they correspond to because you've re-zeroed your time axis.
To get the optimize routine to work I had to supply decent initial guess (p0) and bounds based on the data for each hour.
Code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Curve to be fitted (for ONE dataset)
def func( xval, x0, y0, A, c ):
return y0 + A * np.tanh( c * ( xval - x0 ) )
# Read data from csv file
x, t, y = np.loadtxt( "data.csv", delimiter=',', skiprows=1, unpack=True )
hours_to_plot = [ 0, 1, 2, 3, 4, 5 ]
colours = [ "black", "red", "orange", "yellow", "magenta", "blue" ]
xmin, xmax = 0.0, 200.0
fmthead = "{:4}" + 4 * " {:>11}"
fmtdata = "{:4}" + 4 * " {:>11.6}"
print( fmthead.format( "hour", "x0", "y0", "A", "c" ) )
for hour in range( 0, int( max( t ) / 3600 ) + 1 ):
xhour = []
yhour = []
for i in range( len( t ) ):
if hour <= t[i] / 3600.0 < hour + 1: # 3600 seconds in an hour
xhour.append( x[i] )
yhour.append( y[i] )
if len( xhour ) == 0: continue
# Fit a switch curve to this hour's data
x1, x2, y1, y2 = min( xhour ), max( xhour ), min( yhour ), max( yhour )
p0 = [ 0.5*(x1+x2), 0.5*(y1+y2), 0.5*(y2-y1), 1.0 ]
params, cv = curve_fit( func, xhour, yhour, p0=p0, bounds=( [ x1, y1, 0.0, 0.0 ], [ x2, y2, (y2-y1), np.inf ] ) )
print( fmtdata.format( hour, *params ) )
# Generate predictions using the fitted parameters
xtest = np.linspace( xmin, xmax, 201 )
ytest = func( xtest, *params )
if hour in hours_to_plot:
# Plot the original data and the fitted logistic curve for selected hours
plt.scatter( xhour, yhour, s=4, color=colours[hour%6] )
plt.plot( xtest, ytest, linewidth=2, label=f"hour {hour}", color=colours[hour%6] )
plt.xlabel('Input')
plt.ylabel('Output')
plt.xlim( xmin, xmax )
plt.legend()
plt.show()
Parameters:
hour x0 y0 A c
0 35.9892 32.7941 8.3595 0.0310962
1 138.025 56.978 30.2371 0.0108192
2 185.048 76.6886 14.0 0.013368
3 102.521 47.7351 29.0443 0.0152189
4 102.606 47.2297 13.29 0.0394483
5 103.609 45.9815 26.7875 0.0283607
6 109.932 56.2847 31.2823 0.044357