I have a dataframe like below:
data={'time':['2021-01-01 22:00:12','2021-01-05 22:49:12','2021-01-06 21:00:00','2021-01-06 23:59:15','2021-01-07 05:00:55','2021-01-07 12:00:39'],
'flag':['On','Off','On','Off','On','Off']}
df=pd.DataFrame(data)
I want to get difference between consecutive rows, which I accomplished using:
df['diff']=pd.to_datetime(df['time'])-pd.to_datetime(df['time'].shift(1))
But there is calculation overhead here as there is no meaning for difference for every consecutive rows, I only want the difference whenever the flag goes to Off. Also, how to convert the difference into hours ?
Mask the difference when the flag goes off
df['time'] = pd.to_datetime(df['time'])
mask = df['flag'].eq('Off') & df['flag'].shift().eq('On')
df['diff'] = df['time'].sub(df['time'].shift()).where(mask).dt.total_seconds() / 3600
time flag diff
0 2021-01-01 22:00:12 On NaN
1 2021-01-05 22:49:12 Off 96.816667
2 2021-01-06 21:00:00 On NaN
3 2021-01-06 23:59:15 Off 2.987500
4 2021-01-07 05:00:55 On NaN
5 2021-01-07 12:00:39 Off 6.995556