I have polars dataframe with timestamp folumn of type datetime[ns] which value is 2023-03-08 11:13:07.831
I want to use polars efficiency to round timestamp to 5 minutes floor.
Right now I do:
import arrow
def timestamp_5minutes_floor(ts: int) -> int:
return int(arrow.get(ts).timestamp() // 300000 * 300000)
df.with_columns([
pl.col("timestamp").apply(lambda x: timestamp_5minutes_floor(x)).alias("ts_floor")
])
It is slow. How to improve it?
You could try to use .dt.truncate
: With the sample dataframe
df = pl.DataFrame({
"ts": ["2023-03-08 11:01:07.831", "2023-03-08 18:09:01.007"]
}).select(pl.col("ts").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S%.3f"))
┌─────────────────────────┐
│ ts │
│ --- │
│ datetime[ms] │
╞═════════════════════════╡
│ 2023-03-08 11:01:07.831 │
│ 2023-03-08 18:09:01.007 │
└─────────────────────────┘
this
df = df.select(pl.col("ts").dt.truncate("5m"))
results in
┌─────────────────────┐
│ ts │
│ --- │
│ datetime[ms] │
╞═════════════════════╡
│ 2023-03-08 11:00:00 │
│ 2023-03-08 18:05:00 │
└─────────────────────┘