I have two pandas dataframe as follow:
df1:
dt | z |
---|---|
1970-01-17 21:39:10.304 | 0.87602 |
1970-01-17 21:39:10.344 | 0.99907 |
... | ... |
df2: | dt | z | | --------------------------| -------------- | | 1970-01-17 21:39:23.312 | 0.84904 | | 1970-01-17 21:39:23.352 | 1.00542 | | ... | ... |
where dt is the index and z (that has values between 0 and 2) is the only column, in order to plot timeseries. I have plotted the two time series:
Both dataframe should have approximately the same values since both registers the same event. The problem is the time delay with which they record the values.
I need to calculate the delta between the time series in time order, is it possible starting from the dataframes?
Edit accorind to OPs update.
Comments in code.
import pandas as pd
from datetime import timedelta
# sample data for two data frames
df1 = pd.DataFrame(
data=np.random.rand(20) + 1,
columns=["col"],
index=pd.date_range(
start="2023-04-01T00:00",
end="2023-04-07T00:00",
periods=20)
)
df2 = pd.DataFrame(
data=np.random.rand(20) + 1,
columns=["col"],
index=pd.date_range(
start="2023-04-01T00:00",
end="2023-04-07T00:00",
periods=20)
)
# choose according to your needs
offset_threshold = timedelta(days=1, hours=6)
# choose according to your needs
peak_strength_threshold = 1.8
# highlight peaks by applying some threshold or "peak detection"
df1["peak"] = df1["col"] > peak_strength_threshold
df2["peak"] = df2["col"] > peak_strength_threshold
for index1, row1 in df1.iterrows():
if row1.peak:
# find matching peak in df2
df2_peaks = df2[df2.peak]
best_match_in_df2 = df2_peaks.iloc[np.abs((df2_peaks.index - index1)).argmin()]
delta = index1 - best_match_in_df2.name
# check if the best match is close enough to be published
if np.abs(delta) < offset_threshold:
print(delta)