pythonnumpyscipycurve-fittingsigmoid

Modeling Sigmoid Curve with Time-Dependent Steepness in Python


I have a dataset that follows a sigmoid curve, and I've observed that the sigmoid function becomes steeper over time. I'm looking for guidance on creating a model to fit this curve using Python. This question is similar to this post expect that the data curve becomes steeper over time. I've attempted to use a logistic regression to fit the data (as the other post suggests), but I need to incorporate a time-dependent parameter for the steepness.

The data is across a single day. In the images below, I've time segmented the data by hour. Meaning, each color represents data from a specific hour in the day. First, observe that for each color, the data follows a sigmoid curve (this is especially apparent for the orange data and is a property of the data I'm working with). Second, notice that the different colors follow different sigmoid steepnesses. For example, the blue curve follows a steeper sigmoid curve than the orange one does. From what I observed, the later in the day the data is from, the steeper the sigmoid curves become. I've attempted to show this property by drawing out the sigmoid curves in the second image.

enter image description here

enter image description here

Here was my attempt to fit this data:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# Define a logistic function with a time-dependent parameter
def time_dependent_logistic(x, k, t):
    return 1 / (1 + np.exp(-k * (x - t)))

# TODO: I replaced the variables below with my actual data 
X = ... # Example: np.random.rand(100) * 10  
time = ... # Example: np.linspace(0, 1, 100)  
y = time_dependent_logistic(X, k=5 * time, t=2) + 0.05 * np.random.randn(100)

# Fit the time-dependent logistic function to the data
popt, pcov = curve_fit(time_dependent_logistic, X, y, bounds=([0, 0], [np.inf, np.inf]))

# Generate predictions using the fitted parameters
X_test = np.linspace(min(X), max(X), 300)
y_pred = time_dependent_logistic(X_test, *popt)

# Plot the original data and the fitted logistic curve
plt.scatter(X, y, label='Original data')
plt.plot(X_test, y_pred, label='Fitted time-dependent logistic curve', color='red', linewidth=2)
plt.xlabel('Input')
plt.ylabel('Output')
plt.legend()
plt.show()

However, the graph was not fitting the data properly. It simply produces a flat line:

enter image description here

I've researched ways into fitting the data properly so that the curve does not produce a flat line, but this is my first time attempting to model data like this, so I'm unsure if I'm even taking the right approach. Any guidance would be great. Thank you very in advance for your help. I've been very stuck on this problem for a long time, and I'm open to other methods of fitting the data as well.

EDIT:

I uploaded my data to Google Drive in case anyone wants to play around with it: https://drive.google.com/file/d/1aDB8U6Cn8lFo1TWFSRXWX-X0aEpCQ4sT/view?usp=sharing. Just a note, I shifted the X and Time columns to make the minimum 0, so it's not exactly the same data shown in the graphs.


Solution

  • Well, I tried to fit a "switch" curve y=y0+A.tanh(c(x-x0)) to each hour of data (individually). The parameters x0, y0, A, c appear to depend on the hour, but it's difficult to say what hour of the day they correspond to because you've re-zeroed your time axis.

    To get the optimize routine to work I had to supply decent initial guess (p0) and bounds based on the data for each hour.

    enter image description here

    Code:

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    
    # Curve to be fitted (for ONE dataset)
    def func( xval, x0, y0, A, c ):
        return y0 + A * np.tanh( c * ( xval - x0 ) )
    
    # Read data from csv file
    x, t, y = np.loadtxt( "data.csv", delimiter=',', skiprows=1, unpack=True )
    
    hours_to_plot = [ 0, 1, 2, 3, 4, 5 ]
    colours = [ "black", "red", "orange", "yellow", "magenta", "blue" ]
    
    xmin, xmax = 0.0, 200.0
    fmthead = "{:4}" + 4 * " {:>11}"
    fmtdata = "{:4}" + 4 * " {:>11.6}"
    
    print( fmthead.format( "hour", "x0", "y0", "A", "c" ) )
    for hour in range( 0, int( max( t ) / 3600 ) + 1 ):
        xhour = []
        yhour = []
        for i in range( len( t ) ):
            if hour <= t[i] / 3600.0 < hour + 1:          # 3600 seconds in an hour
                xhour.append( x[i] )
                yhour.append( y[i] )
        if len( xhour ) == 0: continue
    
        # Fit a switch curve to this hour's data
        x1, x2, y1, y2 = min( xhour ), max( xhour ), min( yhour ), max( yhour )
        p0 = [ 0.5*(x1+x2), 0.5*(y1+y2), 0.5*(y2-y1), 1.0 ]
        params, cv = curve_fit( func, xhour, yhour, p0=p0, bounds=( [ x1, y1, 0.0, 0.0 ], [ x2, y2, (y2-y1), np.inf ] ) )
    
        print( fmtdata.format( hour, *params ) )
    
        # Generate predictions using the fitted parameters
        xtest = np.linspace( xmin, xmax, 201 )
        ytest = func( xtest, *params )
    
        if hour in hours_to_plot:
            # Plot the original data and the fitted logistic curve for selected hours
            plt.scatter( xhour, yhour, s=4, color=colours[hour%6] )
            plt.plot( xtest, ytest, linewidth=2, label=f"hour {hour}", color=colours[hour%6] )
            plt.xlabel('Input')
            plt.ylabel('Output')
            plt.xlim( xmin, xmax )
            plt.legend()
    
    plt.show()
    

    Parameters:

    hour          x0          y0           A           c
       0     35.9892     32.7941      8.3595   0.0310962
       1     138.025      56.978     30.2371   0.0108192
       2     185.048     76.6886        14.0    0.013368
       3     102.521     47.7351     29.0443   0.0152189
       4     102.606     47.2297       13.29   0.0394483
       5     103.609     45.9815     26.7875   0.0283607
       6     109.932     56.2847     31.2823    0.044357