pythonpandasdateoperation

find the time difference between adjacent value in padas


I have this dataframe

date    qq  q_t l_t
1956-01-01  1   4   True
1956-01-02  2   5   True
1956-01-03  3   1   False
1956-01-04  4   1   False
1956-01-05  5   1   False
1956-01-06  6   10  True
1956-01-07  7   11  True
1956-01-08  8   12  True
1956-01-09  9   5   False
1956-01-10  10  3   False
1956-01-11  11  3   False
1956-01-12  12  3   False
1956-01-13  13  50  True
1956-01-14  14  51  True
1956-01-15  15  52  True
1956-01-16  16  53  True
1956-01-17  17  1   False
1956-01-18  18  23  True
1956-01-19  19  1   False

I would like to find the number of successive days with true values and store the initial and final date. In other words, I would like to create a new dataframe as:

index days start        end 
1     2    1956-01-01   1956-01-02
2     3    1956-01-06   1956-01-08
3     4    1956-01-50   1956-01-53
4     1    1956-01-18   1956-01-18

I thought to work with np.where and then do a cycle all over the dataframe, a sort of

  _fref = np.where(dfr_['l_t'] == 'True')
    
    for i in range(0, len(_fref)-1):
        
        i_1 = _fref[i]
        i_2 = _fref[i+1]
        
        deltat = i_2-i_1
        
        if deltat == 1:

...
...

It seems not very elegant and I am pretty sure that there are different way to do that. What do you think? Is it better to stay stick with the cycle strategy?


Solution

  • Here's my try

    df['date'] = pd.to_datetime(df['date'])
    df['group'] = (df['l_t'] != df['l_t'].shift()).cumsum()
    true_values = df[df['l_t']]
    
    # Calculate the duration (number of successive days) for each group of True values
    consecutive_days = true_values.groupby('group')['date'].agg(['min', 'max', 'count'])
    consecutive_days = consecutive_days.reset_index()
    print(consecutive_days)
    

    Hope this works for you