I'm trying to upsample a Polars dataframe while grouping by a particular column. In the following example, I wish to group by 'fruit' and then upsample by date.
df = pl.from_repr("""
┌───────┬─────────────────────┬───────┐
│ fruit ┆ date ┆ count │
│ --- ┆ --- ┆ --- │
│ str ┆ datetime[ns] ┆ i64 │
╞═══════╪═════════════════════╪═══════╡
│ apple ┆ 2022-06-01 00:00:00 ┆ 5 │
│ apple ┆ 2022-06-03 00:00:00 ┆ 6 │
│ apple ┆ 2022-06-04 00:00:00 ┆ 2 │
│ apple ┆ 2022-06-07 00:00:00 ┆ 1 │
│ pear ┆ 2022-06-01 00:00:00 ┆ 9 │
│ pear ┆ 2022-06-07 00:00:00 ┆ 12 │
└───────┴─────────────────────┴───────┘
""")
This is what the output should look like:
shape: (14, 3)
┌───────┬─────────────────────┬───────┐
│ fruit ┆ date ┆ count │
│ --- ┆ --- ┆ --- │
│ str ┆ datetime[ns] ┆ i64 │
╞═══════╪═════════════════════╪═══════╡
│ apple ┆ 2022-06-01 00:00:00 ┆ 5 │
│ apple ┆ 2022-06-02 00:00:00 ┆ 5 │
│ apple ┆ 2022-06-03 00:00:00 ┆ 6 │
│ apple ┆ 2022-06-04 00:00:00 ┆ 2 │
│ apple ┆ 2022-06-05 00:00:00 ┆ 2 │
│ apple ┆ 2022-06-06 00:00:00 ┆ 2 │
│ apple ┆ 2022-06-07 00:00:00 ┆ 1 │
│ pear ┆ 2022-06-01 00:00:00 ┆ 9 │
│ pear ┆ 2022-06-02 00:00:00 ┆ 9 │
│ pear ┆ 2022-06-03 00:00:00 ┆ 9 │
│ pear ┆ 2022-06-04 00:00:00 ┆ 9 │
│ pear ┆ 2022-06-05 00:00:00 ┆ 9 │
│ pear ┆ 2022-06-06 00:00:00 ┆ 9 │
│ pear ┆ 2022-06-07 00:00:00 ┆ 12 │
└───────┴─────────────────────┴───────┘
For a non group-by scenario, the following command gets me the result I need:
df.upsample('date', every='1d').fill_null(strategy="forward")
However, I've not been able to get it working when a groupby is involved
ps: here is a similar question, but using pandas - Pandas: resample timeseries with groupby
I realized that the upsample
function has a 'group_by'
parameter that gives me the results that I need. Here is a link to API docs for the .upsample()
method.