window-functionspython-polars

rolling per group in polars


Let's say I have the following df:

import numpy as np
import polars as pl

df = pl.DataFrame({
    "date": pl.date_range(pl.date(2022, 1, 1), pl.date(2022, 1, 30), interval="1d", eager=True),
    "group": ["A", "B", "C"]*10,
    "total": np.random.randint(100, size=30)
})

Now I want to sum the total per group, in a time window of 5 days. Something like combining groupby with rolling.

I wonder if there more elegant and efficient way to do it than the following code:

df_list = []

for table_group in df["group"].unique():
    temp_df = (
       df
        .filter(pl.col("group")==table_group)
        .sort("date")
        .rolling("date", period="5d")
        .agg(pl.col("total").sum().name.prefix("roll_"))
    )
    
    temp_df2 = (
        df
        .filter(pl.col("group")==table_group)
        .sort("date")
        .select(
            pl.all(),
            temp_df.get_column("roll_total")
        )
    )
    df_list.append(temp_df2)

pl.concat(df_list)

My main issue is that I have millions of rows and over 10,000 groups(and more than 1 column that I want to sum)... So this code is not so elegant and consumes more memory than I want it to.


Solution

  • It appears that the piece you're missing is the group_by parameter.

    This allows you to specify grouping.

    (df.rolling(index_column="date", group_by="group", period="5d")
       .agg(
          pl.col("total").last(),
          roll_total = pl.sum("total")
       )
    )
    
    shape: (30, 4)
    ┌───────┬────────────┬───────┬────────────┐
    │ group ┆ date       ┆ total ┆ roll_total │
    │ ---   ┆ ---        ┆ ---   ┆ ---        │
    │ str   ┆ date       ┆ i64   ┆ i64        │
    ╞═══════╪════════════╪═══════╪════════════╡
    │ A     ┆ 2022-01-01 ┆ 96    ┆ 96         │
    │ A     ┆ 2022-01-04 ┆ 16    ┆ 112        │
    │ A     ┆ 2022-01-07 ┆ 17    ┆ 33         │
    │ A     ┆ 2022-01-10 ┆ 4     ┆ 21         │
    │ …     ┆ …          ┆ …     ┆ …          │
    │ C     ┆ 2022-01-21 ┆ 17    ┆ 78         │
    │ C     ┆ 2022-01-24 ┆ 84    ┆ 101        │
    │ C     ┆ 2022-01-27 ┆ 78    ┆ 162        │
    │ C     ┆ 2022-01-30 ┆ 54    ┆ 132        │
    └───────┴────────────┴───────┴────────────┘