pythondatetimepandasepoch

Convert a column of datetimes to epoch in Python


I'm currently having an issue with Python. I have a Pandas DataFrame and one of the columns is a string with a date. The format is :

"%Y-%m-%d %H:%m:00.000". For example : "2011-04-24 01:30:00.000"

I need to convert the entire column to integers. I tried to run this code, but it is extremely slow and I have a few million rows.

for i in range(calls.shape[0]):
    calls['dateint'][i] = int(time.mktime(time.strptime(calls.DATE[i], "%Y-%m-%d %H:%M:00.000")))

Do you guys know how to convert the whole column to epoch time?


Solution

  • convert the string to a datetime using to_datetime and then subtract datetime 1970-1-1 and call dt.total_seconds():

    In [2]:
    import pandas as pd
    import datetime as dt
    df = pd.DataFrame({'date':['2011-04-24 01:30:00.000']})
    df
    
    Out[2]:
                          date
    0  2011-04-24 01:30:00.000
    
    In [3]:
    df['date'] = pd.to_datetime(df['date'])
    df
    
    Out[3]:
                     date
    0 2011-04-24 01:30:00
    
    In [6]:    
    (df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()
    
    Out[6]:
    0    1303608600
    Name: date, dtype: float64
    

    You can see that converting this value back yields the same time:

    In [8]:
    pd.to_datetime(1303608600, unit='s')
    
    Out[8]:
    Timestamp('2011-04-24 01:30:00')
    

    So you can either add a new column or overwrite:

    In [9]:
    df['epoch'] = (df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()
    df
    
    Out[9]:
                     date       epoch
    0 2011-04-24 01:30:00  1303608600
    

    EDIT

    better method as suggested by @Jeff:

    In [3]:
    df['date'].astype('int64')//1e9
    
    Out[3]:
    0    1303608600
    Name: date, dtype: float64
    
    In [4]:
    %timeit (df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()
    %timeit df['date'].astype('int64')//1e9
    
    100 loops, best of 3: 1.72 ms per loop
    1000 loops, best of 3: 275 µs per loop
    

    You can also see that it is significantly faster