python-datetimepython-polars

Polars string column to pl.datetime in Polars: conversion issue


working with a csv file with the following schema

'Ticket ID': polars.datatypes.Int64, .. 'Created time': polars.datatypes.Utf8, 'Due by Time': polars.datatypes.Utf8, ..

Converting to Datetime:

df = (
      df.lazy()
          .select(list_cols)
          .with_columns([
              pl.col(convert_to_date).str.strptime(pl.Date, fmt='%d-%m-%Y %H:%M',strict=False).alias("Create_date") #.cast(pl.Datetime) 
              ])
              )

Here is the output. 'Created time' is the original str and 'Create_date' is the conversion:

Created time Create_date
str date
04-01-2021 10:26 2021-01-04
04-01-2021 10:26 2021-01-04
04-01-2021 10:26 2021-01-04
04-01-2021 11:48 2021-01-05
... ...
22-09-2022 22:44 null
22-09-2022 22:44 null
22-09-2022 22:44 null
22-09-2022 22:47 null

Getting a bunch of nulls and some of the date conversions seems to be incorrect (see 4th row in the output above). Also, how may I keep the time values?

Sure I am doing something wrong and any help would be greatly appreciated.

import polars as pl
from datetime import datetime
from datetime import date, timedelta
import pyarrow as pa
import pandas as pd

convert_to_date = ['Created time','Due by Time','Resolved time','Closed time','Last update time','Initial response time']

url = 'https://raw.githubusercontent.com/DOakville/PolarsDate/main/3000265945_tickets-Dates.csv'

df = (
      pl.read_csv(url,parse_dates=True)
      ) 

df = df.with_column(
        pl.col(convert_to_date).str.strptime(pl.Date, fmt='%d-%m-%Y %H:%M',strict=False).alias("Create_date") #.cast(pl.Datetime) 
              )

Solution

  • Ahhh... I think I see what is happening - your with_columns expression is successfully converting all of the columns given in the "convert_to_date" list, but assigning the result of each conversion to the same name: "Create date".

    So, the values you finally get are coming from the last column to be converted ("Initial response time"), which does have nulls where you see them.

    If you want each column to be associated with a separate date-converted entry, you can use the name.suffix expression to ensure that each conversion is mapped to its own distinct output column (based on the original name).

    For example:

    df.with_columns(
        pl.col(convert_to_date).str.to_date()
          .name.suffix(" date")  # << adds " date" to the existing column name  
    )
    

    Or, if you prefer to overwrite the existing columns with the converted ones, you could keep the existing column names:

    df.with_columns(
        pl.col(convert_to_date).str.to_date()
          .name.keep()  # << keeps original name (effectively overwriting it)
    )
    

    Finally, if you actually want datetimes (not dates), you can use .str.to_datetime().