pythondatetimepython-datetimedate-manipulationpy-datatable

Converting string column to date format in datatable frame in python


For an easy example :

import datatable as dt
import pandas as pd
from datetime import datetime

d_t = dt.Frame(pd.DataFrame({"Date": ["04/05/2020", "04/06/2020"]}))

There is only a column named Date with two values in str32 type.

How could I convert the Date column into a Date Format in datatable frame.

I have tried

dates_list = [datetime.strptime(date, "%m/%d/%Y") for date in d_t["Date"].to_list()[0]]
d_t[:,"NewDate"] = dt.Frame(dates_list)
d_t["NewDate"].max()
# The code can run successfully so far.

But the result was shown like this :

     NewDate
    ▪▪▪▪▪▪▪▪
  0     NA

I think it was still not a date format.

Even I looked up the type of each column, I still have no idea:

d_t.stypes

[Out]: (stype.str32, stype.obj64)

Is there any way to solve the problem or any alternatives?

Thanks for answering.


Solution

  • datatable 1.0 introduced new column types datatable.Type.date32 and datatable.Type.time64.

    In this case conversion from the string column Date to date32 type works like this:

    d_t[:, dt.update(Date_date = 
                 dt.time.ymd(dt.as_type(dt.str.slice(dt.f.Date, 6, 10), int), 
                             dt.as_type(dt.str.slice(dt.f.Date, 3, 5), int), 
                             dt.as_type(dt.str.slice(dt.f.Date, 0, 2), int)))]
    
    d_t
    
        Date        Date_date
        ▪▪▪▪        ▪▪▪▪
    0   04/05/2020  2020-05-04
    1   04/06/2020  2020-06-04
    

    Although a bit verbose it performs all operations inside datatable without overhead and penalty of intermediate lists or pandas frames. Hopefully, future releases will add more functions beyond datatable.time.ymd() to simplify such conversion ops.

    Have you picked the ISO 8601 standard to represent dates (i.e. string in the format %Y-%m-%d) then conversion to date32 would have been considerably simpler:

    d_t = dt.Frame({"Date": ["2020-04-05", "2020-04-06"]})
    d_t[0] = dt.Type.date32
    d_t.stypes
    
    > (stype.date32,)