pythonpandastime-seriesdelta

Calculate Delta between time series peaks with pandas


I have two pandas dataframe as follow:

df1:

dt z
1970-01-17 21:39:10.304 0.87602
1970-01-17 21:39:10.344 0.99907
... ...

df2: | dt | z | | --------------------------| -------------- | | 1970-01-17 21:39:23.312 | 0.84904 | | 1970-01-17 21:39:23.352 | 1.00542 | | ... | ... |

where dt is the index and z (that has values between 0 and 2) is the only column, in order to plot timeseries. I have plotted the two time series: time series plot

Both dataframe should have approximately the same values since both registers the same event. The problem is the time delay with which they record the values.

I need to calculate the delta between the time series in time order, is it possible starting from the dataframes?


Solution

  • Edit accorind to OPs update.

    Comments in code.

    import pandas as pd
    from datetime import timedelta
    
    
    # sample data for two data frames
    df1 = pd.DataFrame(
        data=np.random.rand(20) + 1,
        columns=["col"],
        index=pd.date_range(
            start="2023-04-01T00:00",
            end="2023-04-07T00:00",
            periods=20)
    )
    
    
    df2 = pd.DataFrame(
        data=np.random.rand(20) + 1,
        columns=["col"],
        index=pd.date_range(
            start="2023-04-01T00:00",
            end="2023-04-07T00:00",
            periods=20)
    )
    
    # choose according to your needs
    offset_threshold = timedelta(days=1, hours=6)
    
    # choose according to your needs
    peak_strength_threshold = 1.8
    
    # highlight peaks by applying some threshold or "peak detection"
    df1["peak"] = df1["col"] > peak_strength_threshold
    df2["peak"] = df2["col"] > peak_strength_threshold
    
    for index1, row1 in df1.iterrows():
        if row1.peak:
            # find matching peak in df2
            df2_peaks = df2[df2.peak]
            best_match_in_df2 = df2_peaks.iloc[np.abs((df2_peaks.index - index1)).argmin()]
    
            delta = index1 - best_match_in_df2.name
    
            # check if the best match is close enough to be published
            if np.abs(delta) < offset_threshold:
                print(delta)