pandasdatetimetimestamp

Python Pandas: Rounding Datetime down to nearest 15 minute and extracting time with NA in the Column


I have been struggling to convert a datetime to a rounded time, especially when dealing with NAs -- getting a lot of 'Cannot Convert to Int, Cannot Convert to Datetime', etc, errors.

Say I have data like so:

ColnName=Start_Time
2020-01-01 19:43:32
2020-01-11 12:23:12
NA
2020-03-01 06:23:32

This column is queried from a db and automatically imported as datetime64[ns].

To round down, I know I want 15*(minutes//15). So, I would think that trivially, I could do something like (actual syntax might not be correct, but generally speaking, something along lines of):

pd.to_datetime(hour=df['Start_Time'].dt.hour,minute=15*(df['Start_Time'].dt.minute//15)).dt.time

But, no combination of those things seem to work, because of some of the below issues:

  1. Because there are NA's in the original column, the .dt.hour and .dt.minute methods return float values.
  2. Consistently getting a float value to integer conversion error, even if using 'Int32/64'

A bunch of other methods I've tried have also failed. I did get it to work using [Edit: This actually produces an incorrect answer]:

pd.to_datetime(((df['Start_Time'].dt.hour.fillna(999).astype(int)).astype(str)+':'+ (15*(test['Start_Time'].dt.minute.fillna(999).astype(int)//15)).astype(str)), errors='coerce').dt.time

But performance using this method seems very slow, and the code just looks terrible.

Any help would be appreciated; this will eventually need to be applied to ~200 different datasets, each consisting of 100k+ records, so speed is valuable.


Solution

  • Let's try passing errors='coerce':

    df['round_time'] = pd.to_datetime(df['Start_Time'], errors='coerce').dt.floor('15T')
    

    Output:

                Start_Time          round_time
    0  2020-01-01 19:43:32 2020-01-01 19:30:00
    1  2020-01-11 12:23:12 2020-01-11 12:15:00
    2                  NaN                 NaT
    3  2020-03-01 06:23:32 2020-03-01 06:15:00