How to replicate df.groupby('some_column').resample('Q').agg('total':'count') in polars with group_by_dynamic

Given i want to group quarterly, then in pandas i would write (given that the index contains the 'date' column)

df = df.groupby('some_column').resample('Q').agg({"total": "count"})

I tried to replicate that with polars:

df = df.group_by_dynamic('date', every='3mo', group_by='some_column', include_boundaries=True).agg(pl.len())

In pandas i get the end of the quarter back. In polars the lower boundary because truncate is default set to True. How would i get the _upper_boundary back as my column 'date' ?

Obviously i could just run a selecte after setting include_boundaries=True but i actually want to set it to false because the docs say it will impact performance if set to true

Is every='3mo' equal to pandas resample('Q') ?

Examples:

df = pd.DataFrame({'date': pd.date_range("2018-01-01", periods=365*24, freq='H')})
# Create second column 'pokemon_type':
df['pokemon_type'] = 'fire'
df.loc[400:, 'pokemon_type'] = 'water'

                    date pokemon_type
0    2018-01-01 00:00:00         fire
1    2018-01-01 01:00:00         fire
2    2018-01-01 02:00:00         fire
3    2018-01-01 03:00:00         fire
4    2018-01-01 04:00:00         fire
                  ...          ...
8755 2018-12-31 19:00:00        water
8756 2018-12-31 20:00:00        water
8757 2018-12-31 21:00:00        water
8758 2018-12-31 22:00:00        water
8759 2018-12-31 23:00:00        water
[8760 rows x 2 columns]

Now groupby:
df['total'] = 0
df = df.set_index('date') 
df = df.groupby('pokemon_type').resample('Q').agg({'total':'count'})

df
pokemon_type date             
fire         2018-03-31    400
water        2018-03-31   1760
             2018-06-30   2184
             2018-09-30   2208
             2018-12-31   2208
df.index
MultiIndex([( 'fire', '2018-03-31'),
            ('water', '2018-03-31'),
            ('water', '2018-06-30'),
            ('water', '2018-09-30'),
            ('water', '2018-12-31')],
           names=['pokemon_type', 'date'])

For Polars: Sadly i don't know how to slice a column and assign sub values. Tried to figure it out but i am still on it.

The polars df will look like this

┌──────────────┬─────────────────────┬─────────────────────┬────────────┬───────┐
│ pokemon_type ┆ _lower_boundary     ┆ _upper_boundary     ┆ date       ┆ count │
│ ---          ┆ ---                 ┆ ---                 ┆ ---        ┆ ---   │
│ str          ┆ datetime[μs]        ┆ datetime[μs]        ┆ date       ┆ i64   │
╞══════════════╪═════════════════════╪═════════════════════╪════════════╪═══════╡
│ fire         ┆ 1994-01-01 00:00:00 ┆ 1994-04-01 00:00:00 ┆ 1994-01-01 ┆ 58    │
│ water        ┆ 1994-01-01 00:00:00 ┆ 1994-04-01 00:00:00 ┆ 1994-01-01 ┆ 38    │
└──────────────┴─────────────────────┴─────────────────────┴────────────┴───────┘

What i need is the _upper_boundary directly without having performance issues.

Solution

First, let's recreate your data (including the slice) using Polars. We'll take this in steps.

Calculating Dates

First, we'll use the datetime_range function in Polars to create our dates.

import polars as pl
from datetime import datetime

df = pl.DataFrame({
    'date': pl.datetime_range(datetime(2018, 11, 28), datetime(2019, 11, 28), '1h', closed='left', eager=True),
})
df

shape: (8_760, 1)
┌─────────────────────┐
│ date                │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2018-11-28 00:00:00 │
│ 2018-11-28 01:00:00 │
│ 2018-11-28 02:00:00 │
│ 2018-11-28 03:00:00 │
│ 2018-11-28 04:00:00 │
│ …                   │
│ 2019-11-27 19:00:00 │
│ 2019-11-27 20:00:00 │
│ 2019-11-27 21:00:00 │
│ 2019-11-27 22:00:00 │
│ 2019-11-27 23:00:00 │
└─────────────────────┘

Slicing

Next, we'll use the int_range expression, along with a when/then/otherwise expression to slice our data and assign the pokemon_type.

df = df.with_columns(
    pl.when(pl.int_range(pl.len()) < 400)
    .then(pl.lit('fire'))
    .otherwise(pl.lit('water'))
    .alias('pokemon_type')
)
df

shape: (8_760, 2)
┌─────────────────────┬──────────────┐
│ date                ┆ pokemon_type │
│ ---                 ┆ ---          │
│ datetime[μs]        ┆ str          │
╞═════════════════════╪══════════════╡
│ 2018-11-28 00:00:00 ┆ fire         │
│ 2018-11-28 01:00:00 ┆ fire         │
│ 2018-11-28 02:00:00 ┆ fire         │
│ 2018-11-28 03:00:00 ┆ fire         │
│ 2018-11-28 04:00:00 ┆ fire         │
│ …                   ┆ …            │
│ 2019-11-27 19:00:00 ┆ water        │
│ 2019-11-27 20:00:00 ┆ water        │
│ 2019-11-27 21:00:00 ┆ water        │
│ 2019-11-27 22:00:00 ┆ water        │
│ 2019-11-27 23:00:00 ┆ water        │
└─────────────────────┴──────────────┘

Calculating quarter-end dates

To obtain our end-quarter dates, we can use the when/then/otherwise and the datetime expression to create them from the date field.

Note: I'll set include_bondaries=True, just to compare our calculated end-quarter date to the _upper_boundary date.

(
    df
    .group_by_dynamic("date", every="3mo", group_by="pokemon_type", include_boundaries=True)
    .agg(pl.len())
    .with_columns(
        pl.when(pl.col("date").dt.month() < 4)
        .then(pl.datetime(pl.col('date').dt.year(), 4, 1))
        .when(pl.col("date").dt.month() < 7)
        .then(pl.datetime(pl.col("date").dt.year(), 7, 1))
        .when(pl.col("date").dt.month() < 10)
        .then(pl.datetime(pl.col("date").dt.year(), 10, 1))
        .otherwise(pl.datetime(pl.col("date").dt.year() + 1, 1, 1))
        .alias("quarter")
    )
)

shape: (6, 6)
┌──────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬─────────────────────┐
│ pokemon_type ┆ _lower_boundary     ┆ _upper_boundary     ┆ date                ┆ len  ┆ quarter             │
│ ---          ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---                 │
│ str          ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ u32  ┆ datetime[μs]        │
╞══════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪═════════════════════╡
│ fire         ┆ 2018-10-01 00:00:00 ┆ 2019-01-01 00:00:00 ┆ 2018-10-01 00:00:00 ┆ 400  ┆ 2019-01-01 00:00:00 │
│ water        ┆ 2018-10-01 00:00:00 ┆ 2019-01-01 00:00:00 ┆ 2018-10-01 00:00:00 ┆ 416  ┆ 2019-01-01 00:00:00 │
│ water        ┆ 2019-01-01 00:00:00 ┆ 2019-04-01 00:00:00 ┆ 2019-01-01 00:00:00 ┆ 2160 ┆ 2019-04-01 00:00:00 │
│ water        ┆ 2019-04-01 00:00:00 ┆ 2019-07-01 00:00:00 ┆ 2019-04-01 00:00:00 ┆ 2184 ┆ 2019-07-01 00:00:00 │
│ water        ┆ 2019-07-01 00:00:00 ┆ 2019-10-01 00:00:00 ┆ 2019-07-01 00:00:00 ┆ 2208 ┆ 2019-10-01 00:00:00 │
│ water        ┆ 2019-10-01 00:00:00 ┆ 2020-01-01 00:00:00 ┆ 2019-10-01 00:00:00 ┆ 1392 ┆ 2020-01-01 00:00:00 │
└──────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴─────────────────────┘

Likewise, if you wanted to use closed="left", you could also express your quarter-ending dates like this:

(
    df
    .group_by_dynamic("date", every="3mo", group_by="pokemon_type", closed='left', include_boundaries=True)
    .agg(pl.len())
    .with_columns(
        pl.when(pl.col("date").dt.month() < 4)
        .then(pl.datetime(pl.col('date').dt.year(), 3, 31))
        .when(pl.col("date").dt.month() < 7)
        .then(pl.datetime(pl.col("date").dt.year(), 6, 30))
        .when(pl.col("date").dt.month() < 10)
        .then(pl.datetime(pl.col("date").dt.year(), 9, 30))
        .otherwise(pl.datetime(pl.col("date").dt.year(), 12, 31))
        .alias("quarter")
    )
)

shape: (6, 6)
┌──────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬─────────────────────┐
│ pokemon_type ┆ _lower_boundary     ┆ _upper_boundary     ┆ date                ┆ len  ┆ quarter             │
│ ---          ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---                 │
│ str          ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ u32  ┆ datetime[μs]        │
╞══════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪═════════════════════╡
│ fire         ┆ 2018-10-01 00:00:00 ┆ 2019-01-01 00:00:00 ┆ 2018-10-01 00:00:00 ┆ 400  ┆ 2018-12-31 00:00:00 │
│ water        ┆ 2018-10-01 00:00:00 ┆ 2019-01-01 00:00:00 ┆ 2018-10-01 00:00:00 ┆ 416  ┆ 2018-12-31 00:00:00 │
│ water        ┆ 2019-01-01 00:00:00 ┆ 2019-04-01 00:00:00 ┆ 2019-01-01 00:00:00 ┆ 2160 ┆ 2019-03-31 00:00:00 │
│ water        ┆ 2019-04-01 00:00:00 ┆ 2019-07-01 00:00:00 ┆ 2019-04-01 00:00:00 ┆ 2184 ┆ 2019-06-30 00:00:00 │
│ water        ┆ 2019-07-01 00:00:00 ┆ 2019-10-01 00:00:00 ┆ 2019-07-01 00:00:00 ┆ 2208 ┆ 2019-09-30 00:00:00 │
│ water        ┆ 2019-10-01 00:00:00 ┆ 2020-01-01 00:00:00 ┆ 2019-10-01 00:00:00 ┆ 1392 ┆ 2019-12-31 00:00:00 │
└──────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴─────────────────────┘