working with a csv file with the following schema
'Ticket ID': polars.datatypes.Int64, .. 'Created time': polars.datatypes.Utf8, 'Due by Time': polars.datatypes.Utf8, ..
Converting to Datetime:
df = (
df.lazy()
.select(list_cols)
.with_columns([
pl.col(convert_to_date).str.strptime(pl.Date, fmt='%d-%m-%Y %H:%M',strict=False).alias("Create_date") #.cast(pl.Datetime)
])
)
Here is the output. 'Created time' is the original str and 'Create_date' is the conversion:
Created time | Create_date |
---|---|
str | date |
04-01-2021 10:26 | 2021-01-04 |
04-01-2021 10:26 | 2021-01-04 |
04-01-2021 10:26 | 2021-01-04 |
04-01-2021 11:48 | 2021-01-05 |
... | ... |
22-09-2022 22:44 | null |
22-09-2022 22:44 | null |
22-09-2022 22:44 | null |
22-09-2022 22:47 | null |
Getting a bunch of nulls and some of the date conversions seems to be incorrect (see 4th row in the output above). Also, how may I keep the time values?
Sure I am doing something wrong and any help would be greatly appreciated.
import polars as pl
from datetime import datetime
from datetime import date, timedelta
import pyarrow as pa
import pandas as pd
convert_to_date = ['Created time','Due by Time','Resolved time','Closed time','Last update time','Initial response time']
url = 'https://raw.githubusercontent.com/DOakville/PolarsDate/main/3000265945_tickets-Dates.csv'
df = (
pl.read_csv(url,parse_dates=True)
)
df = df.with_column(
pl.col(convert_to_date).str.strptime(pl.Date, fmt='%d-%m-%Y %H:%M',strict=False).alias("Create_date") #.cast(pl.Datetime)
)
Ahhh... I think I see what is happening - your with_columns
expression is successfully converting all of the columns given in the "convert_to_date" list, but assigning the result of each conversion to the same name: "Create date".
So, the values you finally get are coming from the last column to be converted ("Initial response time"), which does have nulls where you see them.
If you want each column to be associated with a separate date-converted entry, you can use the name.suffix
expression to ensure that each conversion is mapped to its own distinct output column (based on the original name).
For example:
df.with_columns(
pl.col(convert_to_date).str.to_date()
.name.suffix(" date") # << adds " date" to the existing column name
)
Or, if you prefer to overwrite the existing columns with the converted ones, you could keep the existing column names:
df.with_columns(
pl.col(convert_to_date).str.to_date()
.name.keep() # << keeps original name (effectively overwriting it)
)
Finally, if you actually want datetimes (not dates), you can use .str.to_datetime()
.