I have a dataframe that has id and week. I want to expand the dataframe so that each id have the same number of rows or four weeks.
import pandas as pd
data = {
'id': ['a', 'a', 'b', 'c', 'c', 'c'],
'week': ['1', '2', '3', '4', '3', '1'],
'num1': [1, 3, 5, 4, 3, 6],
'num2': [4, 5, 3, 4, 6, 6]
}
df = pd.DataFrame(data)
id week num1 num2
0 a 1 1 4
1 a 2 3 5
2 b 3 5 3
3 c 4 4 4
4 c 3 3 6
5 c 1 6 6
In pandas, I can just do:
df = (
df.set_index(['id', 'week'])
.unstack().stack(dropna=False)
.reset_index()
)
id week num1 num2
0 a 1 1.0 4.0
1 a 2 3.0 5.0
2 a 3 NaN NaN
3 a 4 NaN NaN
4 b 1 NaN NaN
5 b 2 NaN NaN
6 b 3 5.0 3.0
7 b 4 NaN NaN
8 c 1 6.0 6.0
9 c 2 NaN NaN
10 c 3 3.0 6.0
11 c 4 4.0 4.0
How do you do this with polars?
It kind of looks like an .upsample
but week is a str
in this case which wont work.
It also looks like a cross join of unique id -> week
values joined with the original df.
(df.select("id").unique()
.join(df.select("week").unique(), how="cross")
.join(
df,
on=["id", "week"],
how="left"
)
)
shape: (12, 4)
┌─────┬──────┬──────┬──────┐
│ id ┆ week ┆ num1 ┆ num2 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞═════╪══════╪══════╪══════╡
│ a ┆ 1 ┆ 1 ┆ 4 │
│ a ┆ 2 ┆ 3 ┆ 5 │
│ a ┆ 3 ┆ null ┆ null │
│ a ┆ 4 ┆ null ┆ null │
│ … ┆ … ┆ … ┆ … │
│ c ┆ 1 ┆ 6 ┆ 6 │
│ c ┆ 2 ┆ null ┆ null │
│ c ┆ 3 ┆ 3 ┆ 6 │
│ c ┆ 4 ┆ 4 ┆ 4 │
└─────┴──────┴──────┴──────┘
There's also .to_dummies()
which could potentially be of use depending on what it is you're actually doing.
We make a copy of the original column first:
df.with_columns(_week = pl.col("week")).to_dummies("week")
shape: (6, 8)
┌─────┬────────┬────────┬────────┬────────┬──────┬──────┬───────┐
│ id ┆ week_1 ┆ week_2 ┆ week_3 ┆ week_4 ┆ num1 ┆ num2 ┆ _week │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ u8 ┆ u8 ┆ u8 ┆ u8 ┆ i64 ┆ i64 ┆ str │
╞═════╪════════╪════════╪════════╪════════╪══════╪══════╪═══════╡
│ a ┆ 1 ┆ 0 ┆ 0 ┆ 0 ┆ 1 ┆ 4 ┆ 1 │
│ a ┆ 0 ┆ 1 ┆ 0 ┆ 0 ┆ 3 ┆ 5 ┆ 2 │
│ b ┆ 0 ┆ 0 ┆ 1 ┆ 0 ┆ 5 ┆ 3 ┆ 3 │
│ c ┆ 0 ┆ 0 ┆ 0 ┆ 1 ┆ 4 ┆ 4 ┆ 4 │
│ c ┆ 0 ┆ 0 ┆ 1 ┆ 0 ┆ 3 ┆ 6 ┆ 3 │
│ c ┆ 1 ┆ 0 ┆ 0 ┆ 0 ┆ 6 ┆ 6 ┆ 1 │
└─────┴────────┴────────┴────────┴────────┴──────┴──────┴───────┘
You can then do things like:
(df.with_columns(_week = pl.col("week"))
.to_dummies("week")
.with_columns(
pl.when(pl.col("^week_\d+$") == 0)
.then(pl.col("^week_\d+$"))
.fill_null(pl.col("_week"))
)
)
shape: (6, 8)
┌─────┬────────┬────────┬────────┬────────┬──────┬──────┬───────┐
│ id ┆ week_1 ┆ week_2 ┆ week_3 ┆ week_4 ┆ num1 ┆ num2 ┆ _week │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ i64 ┆ i64 ┆ str │
╞═════╪════════╪════════╪════════╪════════╪══════╪══════╪═══════╡
│ a ┆ 1 ┆ 0 ┆ 0 ┆ 0 ┆ 1 ┆ 4 ┆ 1 │
│ a ┆ 0 ┆ 2 ┆ 0 ┆ 0 ┆ 3 ┆ 5 ┆ 2 │
│ b ┆ 0 ┆ 0 ┆ 3 ┆ 0 ┆ 5 ┆ 3 ┆ 3 │
│ c ┆ 0 ┆ 0 ┆ 0 ┆ 4 ┆ 4 ┆ 4 ┆ 4 │
│ c ┆ 0 ┆ 0 ┆ 3 ┆ 0 ┆ 3 ┆ 6 ┆ 3 │
│ c ┆ 1 ┆ 0 ┆ 0 ┆ 0 ┆ 6 ┆ 6 ┆ 1 │
└─────┴────────┴────────┴────────┴────────┴──────┴──────┴───────┘