Is there any built-in function in polars
or a better way to convert time durations to numeric by defining the time resolution (e.g.: days, hours, minutes)?
import polars as pl
df = pl.DataFrame({
"from": ["2023-01-01", "2023-01-02", "2023-01-03"],
"to": ["2023-01-04", "2023-01-05", "2023-01-06"],
})
My current approach:
# Convert to date and calculate the time difference
df = (
df.with_columns(
pl.col("to", "from").str.to_date().name.suffix("_date")
)
.with_columns((pl.col("to_date") - pl.col("from_date")).alias("time_diff"))
)
# Convert the time difference to int (in days)
df = df.with_columns(
((pl.col("time_diff") / (24 * 60 * 60 * 1000)).cast(pl.Int8)).alias("time_diff_int")
)
Output:
shape: (3, 6)
┌────────────┬────────────┬────────────┬────────────┬──────────────┬───────────────┐
│ from ┆ to ┆ to_date ┆ from_date ┆ time_diff ┆ time_diff_int │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ date ┆ date ┆ duration[ms] ┆ i8 │
╞════════════╪════════════╪════════════╪════════════╪══════════════╪═══════════════╡
│ 2023-01-01 ┆ 2023-01-04 ┆ 2023-01-04 ┆ 2023-01-01 ┆ 3d ┆ 3 │
│ 2023-01-02 ┆ 2023-01-05 ┆ 2023-01-05 ┆ 2023-01-02 ┆ 3d ┆ 3 │
│ 2023-01-03 ┆ 2023-01-06 ┆ 2023-01-06 ┆ 2023-01-03 ┆ 3d ┆ 3 │
└────────────┴────────────┴────────────┴────────────┴──────────────┴───────────────┘
The dt
accessor lets you obtain individual components, is that what you're looking for?
df.select(
total_days = pl.col.time_diff.dt.total_days(),
total_hours = pl.col.time_diff.dt.total_hours(),
total_minutes = pl.col.time_diff.dt.total_minutes()
)
shape: (3, 3)
┌────────────┬─────────────┬───────────────┐
│ total_days ┆ total_hours ┆ total_minutes │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞════════════╪═════════════╪═══════════════╡
│ 3 ┆ 72 ┆ 4320 │
│ 3 ┆ 72 ┆ 4320 │
│ 3 ┆ 72 ┆ 4320 │
└────────────┴─────────────┴───────────────┘
docs: Temporal API reference