I am trying to move from pandas to polars but I am running into the following issue.
df = pl.DataFrame(
{
"integer": [1, 2, 3],
"date": [
"2010-01-31T23:00:00+00:00",
"2010-02-01T00:00:00+00:00",
"2010-02-01T01:00:00+00:00"
]
}
)
df = df.with_columns(
pl.col("date").str.to_datetime().dt.convert_time_zone("Europe/Amsterdam")
)
Yields the following dataframe:
shape: (3, 2)
┌─────────┬────────────────────────────────┐
│ integer ┆ date │
│ --- ┆ --- │
│ i64 ┆ datetime[μs, Europe/Amsterdam] │
╞═════════╪════════════════════════════════╡
│ 1 ┆ 2010-02-01 00:00:00 CET │
│ 2 ┆ 2010-02-01 01:00:00 CET │
│ 3 ┆ 2010-02-01 02:00:00 CET │
└─────────┴────────────────────────────────┘
As you can see, I transformed the datetime string from UTC to CET succesfully.
However, if I try to cast to pl.Date
(as suggested here), it seems to extract the date from the UTC string even though it has been transformed, e.g.:
df = df.with_columns(
pl.col("date").cast(pl.Date).alias("valueDay")
)
shape: (3, 3)
┌─────────┬────────────────────────────────┬────────────┐
│ integer ┆ date ┆ valueDay │
│ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs, Europe/Amsterdam] ┆ date │
╞═════════╪════════════════════════════════╪════════════╡
│ 1 ┆ 2010-02-01 00:00:00 CET ┆ 2010-01-31 │ # <- NOT OK
│ 2 ┆ 2010-02-01 01:00:00 CET ┆ 2010-02-01 │
│ 3 ┆ 2010-02-01 02:00:00 CET ┆ 2010-02-01 │
└─────────┴────────────────────────────────┴────────────┘
The valueDay
should be 2010-02-01 for all 3 values.
Can anyone help me fix this? A pandas dt.date like way to approach this would be nice.
By the way, what is the best way to optimize this code? Do I constantly have to assign everything to df
or is there a way to chain all of this?
Update: .dt.date()
has since been added to Polars.
import polars as pl
df = pl.DataFrame(
{
"integer": [1, 2, 3],
"date": [
"2010-01-31T23:00:00+00:00",
"2010-02-01T00:00:00+00:00",
"2010-02-01T01:00:00+00:00",
],
}
)
df = df.with_columns(
pl.col("date").str.to_datetime().dt.convert_time_zone("Europe/Amsterdam")
)
df = df.with_columns(
pl.col("date").dt.date().alias("valueDay"),
pl.col("date").dt.day().alias("day"),
pl.col("date").dt.month().alias("month"),
pl.col("date").dt.year().alias("year"),
)
shape: (3, 6)
┌─────────┬────────────────────────────────┬────────────┬─────┬───────┬──────┐
│ integer ┆ date ┆ valueDay ┆ day ┆ month ┆ year │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs, Europe/Amsterdam] ┆ date ┆ i8 ┆ i8 ┆ i32 │
╞═════════╪════════════════════════════════╪════════════╪═════╪═══════╪══════╡
│ 1 ┆ 2010-02-01 00:00:00 CET ┆ 2010-02-01 ┆ 1 ┆ 2 ┆ 2010 │
│ 2 ┆ 2010-02-01 01:00:00 CET ┆ 2010-02-01 ┆ 1 ┆ 2 ┆ 2010 │
│ 3 ┆ 2010-02-01 02:00:00 CET ┆ 2010-02-01 ┆ 1 ┆ 2 ┆ 2010 │
└─────────┴────────────────────────────────┴────────────┴─────┴───────┴──────┘