pythondataframetime-seriessimilaritycalibration

Calibrating one timeseries with another


I have two timeseries in two different dataframes. What is the best way to find the calibration coefficient between them?

I was thinking to substrack one dataframe from another and divide with datapoints. But I am not sure if this works because these two different dataframes are from different measurement devices, so different time values.

df1[] - df2[] / (amount of datapoints)

enter image description here

For example, the bottom graph (black) is concentration and the red graph is intensity. When I would measure only the intensity, using the calibration value or formula, I could derive the concentration values.


Solution

  • Store intensity and concentration in a dataframe

    import pandas as pd
    import numpy as np
    
    # MAKE MEASUREMENTS  (I'll admit I'm just measuring sinuses)
    t0, t1 = 0, 32
    
    n_points_intensity = 50
    time_intensity = np.random.uniform(t0, t1, n_points_intensity)
    intensity = 10 * np.sin(time_intensity) + np.random.normal(0,0.1,n_points_intensity)
    
    n_points_concentration = 50
    time_concentration = np.random.uniform(t0, t1, n_points_concentration)
    concentration = np.sin(time_concentration) + np.random.normal(0,0.1,n_points_concentration)
    
    # STORE MEASUREMENTS IN ONE DATAFRAME WITH MISSING VALUES
    df = pd.DataFrame({'time': np.concatenate((time_intensity, time_concentration)),
                       'intensity': np.concatenate((intensity, [np.NaN]*n_points_concentration)),
                       'concentration': np.concatenate(([np.NaN]*n_points_intensity, concentration))
    }).sort_values(by='time', ignore_index=True)
    print(df)
    
    # OUTPUT
             time  intensity  concentration
    0    0.337827        NaN       0.309016
    1    0.402861   3.920522            NaN
    2    0.544032   5.175905            NaN
    3    0.615081        NaN       0.542628
    4    0.726053   6.639234            NaN
    ..        ...        ...            ...
    95  60.408141  -6.577897            NaN
    96  61.571468  -9.522081            NaN
    97  61.899227        NaN      -0.729130
    98  62.182529  -6.046479            NaN
    99  62.857848        NaN       0.093763
    
    [100 rows x 3 columns]
    

    Fill missing values with pandas.DataFrame.interpolate:

    df.set_index('time', inplace=True)
    df.interpolate(method='index', inplace=True)
    df.dropna(inplace=True)  # this should only drop the first row
    df.reset_index(inplace=True)
    print(df)
    
    # OUTPUT
    time       intensity  concentration                               
    0.402861    3.920522       0.363813
    0.544032    5.175905       0.482763
    0.615081    5.747088       0.542628
    0.726053    6.639234       0.577253
    ...              ...            ...
    60.408141  -6.577897      -0.601758
    61.571468  -9.522081      -0.701132
    61.899227  -7.657847      -0.729130
    62.182529  -6.046479      -0.485939
    62.857848  -6.046479       0.093763
    
    [99 rows x 3 columns]
    

    Perform linear regression to get relationship between intensity and concentration

    from sklearn.linear_model import LinearRegression
    
    reg = LinearRegression(fit_intercept=False).fit(
        np.array(df['intensity']).reshape(-1,1),
        df['concentration']
    )
    a, b = reg.coef_[0], reg.intercept_
    # concentration = a * intensity + b