pythontimestamppython-polarsfloor

polars datetime 5 minutes floor


I have polars dataframe with timestamp folumn of type datetime[ns] which value is 2023-03-08 11:13:07.831 I want to use polars efficiency to round timestamp to 5 minutes floor.

Right now I do:

import arrow

def timestamp_5minutes_floor(ts: int) -> int:
    return int(arrow.get(ts).timestamp() // 300000 * 300000)

df.with_columns([
    pl.col("timestamp").apply(lambda x: timestamp_5minutes_floor(x)).alias("ts_floor")
    ])

It is slow. How to improve it?


Solution

  • You could try to use .dt.truncate: With the sample dataframe

    df = pl.DataFrame({
        "ts": ["2023-03-08 11:01:07.831", "2023-03-08 18:09:01.007"]
    }).select(pl.col("ts").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S%.3f"))
    
    ┌─────────────────────────┐
    │ ts                      │
    │ ---                     │
    │ datetime[ms]            │
    ╞═════════════════════════╡
    │ 2023-03-08 11:01:07.831 │
    │ 2023-03-08 18:09:01.007 │
    └─────────────────────────┘
    

    this

    df = df.select(pl.col("ts").dt.truncate("5m"))
    

    results in

    ┌─────────────────────┐
    │ ts                  │
    │ ---                 │
    │ datetime[ms]        │
    ╞═════════════════════╡
    │ 2023-03-08 11:00:00 │
    │ 2023-03-08 18:05:00 │
    └─────────────────────┘