pythonpython-polars

Polars: Expand dataframe so that each id vars have the same num of rows


I have a dataframe that has id and week. I want to expand the dataframe so that each id have the same number of rows or four weeks.

import pandas as pd

data = {
    'id': ['a', 'a', 'b', 'c', 'c', 'c'],
    'week': ['1', '2', '3', '4', '3', '1'],
    'num1': [1, 3, 5, 4, 3, 6],
    'num2': [4, 5, 3, 4, 6, 6]
}
df = pd.DataFrame(data)
  id week  num1  num2
0  a    1     1     4
1  a    2     3     5
2  b    3     5     3
3  c    4     4     4
4  c    3     3     6
5  c    1     6     6

In pandas, I can just do:

df = (
    df.set_index(['id', 'week'])
      .unstack().stack(dropna=False)
      .reset_index()
)
   id week  num1  num2
0   a    1   1.0   4.0
1   a    2   3.0   5.0
2   a    3   NaN   NaN
3   a    4   NaN   NaN
4   b    1   NaN   NaN
5   b    2   NaN   NaN
6   b    3   5.0   3.0
7   b    4   NaN   NaN
8   c    1   6.0   6.0
9   c    2   NaN   NaN
10  c    3   3.0   6.0
11  c    4   4.0   4.0

How do you do this with polars?


Solution

  • It kind of looks like an .upsample but week is a str in this case which wont work.

    It also looks like a cross join of unique id -> week values joined with the original df.

    (df.select("id").unique()
       .join(df.select("week").unique(), how="cross")
       .join(
          df,
          on=["id", "week"],
          how="left"
       )
    )   
    
    shape: (12, 4)
    ┌─────┬──────┬──────┬──────┐
    │ id  ┆ week ┆ num1 ┆ num2 │
    │ --- ┆ ---  ┆ ---  ┆ ---  │
    │ str ┆ str  ┆ i64  ┆ i64  │
    ╞═════╪══════╪══════╪══════╡
    │ a   ┆ 1    ┆ 1    ┆ 4    │
    │ a   ┆ 2    ┆ 3    ┆ 5    │
    │ a   ┆ 3    ┆ null ┆ null │
    │ a   ┆ 4    ┆ null ┆ null │
    │ …   ┆ …    ┆ …    ┆ …    │
    │ c   ┆ 1    ┆ 6    ┆ 6    │
    │ c   ┆ 2    ┆ null ┆ null │
    │ c   ┆ 3    ┆ 3    ┆ 6    │
    │ c   ┆ 4    ┆ 4    ┆ 4    │
    └─────┴──────┴──────┴──────┘
    

    There's also .to_dummies() which could potentially be of use depending on what it is you're actually doing.

    We make a copy of the original column first:

    df.with_columns(_week = pl.col("week")).to_dummies("week")
    
    shape: (6, 8)
    ┌─────┬────────┬────────┬────────┬────────┬──────┬──────┬───────┐
    │ id  ┆ week_1 ┆ week_2 ┆ week_3 ┆ week_4 ┆ num1 ┆ num2 ┆ _week │
    │ --- ┆ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---  ┆ ---  ┆ ---   │
    │ str ┆ u8     ┆ u8     ┆ u8     ┆ u8     ┆ i64  ┆ i64  ┆ str   │
    ╞═════╪════════╪════════╪════════╪════════╪══════╪══════╪═══════╡
    │ a   ┆ 1      ┆ 0      ┆ 0      ┆ 0      ┆ 1    ┆ 4    ┆ 1     │
    │ a   ┆ 0      ┆ 1      ┆ 0      ┆ 0      ┆ 3    ┆ 5    ┆ 2     │
    │ b   ┆ 0      ┆ 0      ┆ 1      ┆ 0      ┆ 5    ┆ 3    ┆ 3     │
    │ c   ┆ 0      ┆ 0      ┆ 0      ┆ 1      ┆ 4    ┆ 4    ┆ 4     │
    │ c   ┆ 0      ┆ 0      ┆ 1      ┆ 0      ┆ 3    ┆ 6    ┆ 3     │
    │ c   ┆ 1      ┆ 0      ┆ 0      ┆ 0      ┆ 6    ┆ 6    ┆ 1     │
    └─────┴────────┴────────┴────────┴────────┴──────┴──────┴───────┘
    

    You can then do things like:

    (df.with_columns(_week = pl.col("week"))
       .to_dummies("week")
       .with_columns(
          pl.when(pl.col("^week_\d+$") == 0)
            .then(pl.col("^week_\d+$"))
            .fill_null(pl.col("_week"))
       )
    )
    
    shape: (6, 8)
    ┌─────┬────────┬────────┬────────┬────────┬──────┬──────┬───────┐
    │ id  ┆ week_1 ┆ week_2 ┆ week_3 ┆ week_4 ┆ num1 ┆ num2 ┆ _week │
    │ --- ┆ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---  ┆ ---  ┆ ---   │
    │ str ┆ str    ┆ str    ┆ str    ┆ str    ┆ i64  ┆ i64  ┆ str   │
    ╞═════╪════════╪════════╪════════╪════════╪══════╪══════╪═══════╡
    │ a   ┆ 1      ┆ 0      ┆ 0      ┆ 0      ┆ 1    ┆ 4    ┆ 1     │
    │ a   ┆ 0      ┆ 2      ┆ 0      ┆ 0      ┆ 3    ┆ 5    ┆ 2     │
    │ b   ┆ 0      ┆ 0      ┆ 3      ┆ 0      ┆ 5    ┆ 3    ┆ 3     │
    │ c   ┆ 0      ┆ 0      ┆ 0      ┆ 4      ┆ 4    ┆ 4    ┆ 4     │
    │ c   ┆ 0      ┆ 0      ┆ 3      ┆ 0      ┆ 3    ┆ 6    ┆ 3     │
    │ c   ┆ 1      ┆ 0      ┆ 0      ┆ 0      ┆ 6    ┆ 6    ┆ 1     │
    └─────┴────────┴────────┴────────┴────────┴──────┴──────┴───────┘