pythonpandasdatetimegroup-byrunning-count

find equal time and incrementally add a constant


I have a dataframe df containing some timestamps

df['Date'].values
Out[16]: 
array(['2015-03-25T14:36:39.199994000', '2015-03-25T14:36:39.199994000',
       '2015-03-26T10:05:03.699999000', '2015-04-19T16:01:49.680009000',
       '2015-04-19T16:36:10.040007000', '2015-04-19T16:36:10.040007000',
       '2015-04-19T16:36:10.040007000'], dtype='datetime64[ns]')

As you can see the first and the second timestamps are equal, but also the last 3.

I would like to scan the dataframe and if there are timestamps that are equal, maintain the first and add incrementally 5 seconds to the others that are equal.

The new dataframe should look like

df['Date'].values
Out[16]: 
array(['2015-03-25T14:36:39.199994000', '2015-03-25T14:36:44.199994000',
       '2015-03-26T10:05:03.699999000', '2015-04-19T16:01:49.680009000',
       '2015-04-19T16:36:10.040007000', '2015-04-19T16:36:15.040007000',
       '2015-04-19T16:36:20.040007000'], dtype='datetime64[ns]')

Is there a pythonic way to do so without looping. I was thinking to groupby according to the timestamps, but then I don't know how to proceed...


Solution

  • Use groupby cumcount times the timedelta i.e

    df = pd.DataFrame({'Date':np.array(['2015-03-25T14:36:39.199994000', '2015-03-25T14:36:39.199994000',
       '2015-03-26T10:05:03.699999000', '2015-04-19T16:01:49.680009000',
       '2015-04-19T16:36:10.040007000', '2015-04-19T16:36:10.040007000',
       '2015-04-19T16:36:10.040007000'], dtype='datetime64[ns]')})
    
    df['Date'] + df.groupby(df['Date']).cumcount()*pd.Timedelta('5 seconds')
    

    Output :

    0   2015-03-25 14:36:39.199994
    1   2015-03-25 14:36:44.199994
    2   2015-03-26 10:05:03.699999
    3   2015-04-19 16:01:49.680009
    4   2015-04-19 16:36:10.040007
    5   2015-04-19 16:36:15.040007
    6   2015-04-19 16:36:20.040007
    dtype: datetime64[ns]