pythondatetimepython-polars

Python Polars: Read Column as Datetime


How does one read a csv into a polars DataFrame and parse one of the columns as a datetime?

Alternatively, how does one convert a column to a pl.Datetime?


Solution

  • I would first try try_parse_dates=True in the read_csv call.

    For example, let's say we have the following data:

    import polars as pl
    
    my_csv = (
    b"""
    ID,start,last_updt,end
    1,2008-10-31,2020-11-28 12:48:53,12/31/2008
    2,2007-10-31,2021-11-29 01:37:20,12/31/2007
    3,2006-10-31,2021-11-30 23:22:05,12/31/2006
    """
    )
    
    pl.read_csv(my_csv, try_parse_dates=True)
    
    shape: (3, 4)
    ┌─────┬────────────┬─────────────────────┬────────────┐
    │ ID  ┆ start      ┆ last_updt           ┆ end        │
    │ --- ┆ ---        ┆ ---                 ┆ ---        │
    │ i64 ┆ date       ┆ datetime[μs]        ┆ str        │
    ╞═════╪════════════╪═════════════════════╪════════════╡
    │ 1   ┆ 2008-10-31 ┆ 2020-11-28 12:48:53 ┆ 12/31/2008 │
    │ 2   ┆ 2007-10-31 ┆ 2021-11-29 01:37:20 ┆ 12/31/2007 │
    │ 3   ┆ 2006-10-31 ┆ 2021-11-30 23:22:05 ┆ 12/31/2006 │
    └─────┴────────────┴─────────────────────┴────────────┘
    

    The start column parsed as a Date, and the last_updt column parsed as a Datetime. But notice that the end column did not parse as a date because it is not in ISO 8601 format. (I've come across plenty of csv files where Date/Datetime fields were non-standard.)

    To parse this column, we can use the .str.to_date() function and supply the appropriate format.

    pl.read_csv(my_csv, try_parse_dates=True).with_columns(pl.col('end').str.to_date('%m/%d/%Y'))
    
    shape: (3, 4)
    ┌─────┬────────────┬─────────────────────┬────────────┐
    │ ID  ┆ start      ┆ last_updt           ┆ end        │
    │ --- ┆ ---        ┆ ---                 ┆ ---        │
    │ i64 ┆ date       ┆ datetime[μs]        ┆ date       │
    ╞═════╪════════════╪═════════════════════╪════════════╡
    │ 1   ┆ 2008-10-31 ┆ 2020-11-28 12:48:53 ┆ 2008-12-31 │
    │ 2   ┆ 2007-10-31 ┆ 2021-11-29 01:37:20 ┆ 2007-12-31 │
    │ 3   ┆ 2006-10-31 ┆ 2021-11-30 23:22:05 ┆ 2006-12-31 │
    └─────┴────────────┴─────────────────────┴────────────┘