I have two timeseries in two different dataframes. What is the best way to find the calibration coefficient between them?
I was thinking to substrack one dataframe from another and divide with datapoints. But I am not sure if this works because these two different dataframes are from different measurement devices, so different time values.
df1[] - df2[] / (amount of datapoints)
For example, the bottom graph (black) is concentration and the red graph is intensity. When I would measure only the intensity, using the calibration value or formula, I could derive the concentration values.
pandas.DataFrame.interpolate
;import pandas as pd
import numpy as np
# MAKE MEASUREMENTS (I'll admit I'm just measuring sinuses)
t0, t1 = 0, 32
n_points_intensity = 50
time_intensity = np.random.uniform(t0, t1, n_points_intensity)
intensity = 10 * np.sin(time_intensity) + np.random.normal(0,0.1,n_points_intensity)
n_points_concentration = 50
time_concentration = np.random.uniform(t0, t1, n_points_concentration)
concentration = np.sin(time_concentration) + np.random.normal(0,0.1,n_points_concentration)
# STORE MEASUREMENTS IN ONE DATAFRAME WITH MISSING VALUES
df = pd.DataFrame({'time': np.concatenate((time_intensity, time_concentration)),
'intensity': np.concatenate((intensity, [np.NaN]*n_points_concentration)),
'concentration': np.concatenate(([np.NaN]*n_points_intensity, concentration))
}).sort_values(by='time', ignore_index=True)
print(df)
# OUTPUT
time intensity concentration
0 0.337827 NaN 0.309016
1 0.402861 3.920522 NaN
2 0.544032 5.175905 NaN
3 0.615081 NaN 0.542628
4 0.726053 6.639234 NaN
.. ... ... ...
95 60.408141 -6.577897 NaN
96 61.571468 -9.522081 NaN
97 61.899227 NaN -0.729130
98 62.182529 -6.046479 NaN
99 62.857848 NaN 0.093763
[100 rows x 3 columns]
pandas.DataFrame.interpolate
:df.set_index('time', inplace=True)
df.interpolate(method='index', inplace=True)
df.dropna(inplace=True) # this should only drop the first row
df.reset_index(inplace=True)
print(df)
# OUTPUT
time intensity concentration
0.402861 3.920522 0.363813
0.544032 5.175905 0.482763
0.615081 5.747088 0.542628
0.726053 6.639234 0.577253
... ... ...
60.408141 -6.577897 -0.601758
61.571468 -9.522081 -0.701132
61.899227 -7.657847 -0.729130
62.182529 -6.046479 -0.485939
62.857848 -6.046479 0.093763
[99 rows x 3 columns]
from sklearn.linear_model import LinearRegression
reg = LinearRegression(fit_intercept=False).fit(
np.array(df['intensity']).reshape(-1,1),
df['concentration']
)
a, b = reg.coef_[0], reg.intercept_
# concentration = a * intensity + b