pythonpandasdatetimetime-difference

Subtract consecutive rows based on binary condition


I have a dataframe like below:

data={'time':['2021-01-01 22:00:12','2021-01-05 22:49:12','2021-01-06 21:00:00','2021-01-06 23:59:15','2021-01-07 05:00:55','2021-01-07 12:00:39'],
    'flag':['On','Off','On','Off','On','Off']}
df=pd.DataFrame(data)

I want to get difference between consecutive rows, which I accomplished using:

df['diff']=pd.to_datetime(df['time'])-pd.to_datetime(df['time'].shift(1))

But there is calculation overhead here as there is no meaning for difference for every consecutive rows, I only want the difference whenever the flag goes to Off. Also, how to convert the difference into hours ?

enter image description here


Solution

  • Mask the difference when the flag goes off

    df['time'] = pd.to_datetime(df['time'])
    
    mask = df['flag'].eq('Off') & df['flag'].shift().eq('On')
    df['diff'] = df['time'].sub(df['time'].shift()).where(mask).dt.total_seconds() / 3600
    

                     time flag       diff
    0 2021-01-01 22:00:12   On        NaN
    1 2021-01-05 22:49:12  Off  96.816667
    2 2021-01-06 21:00:00   On        NaN
    3 2021-01-06 23:59:15  Off   2.987500
    4 2021-01-07 05:00:55   On        NaN
    5 2021-01-07 12:00:39  Off   6.995556