I have this dataframe:
df = pl.DataFrame({
'date':['date1','date1','date1','date2','date3','date3'],
'factor':['A','B','C','B','B','C'], 'val':[1,2,3,3,1,5]
})
shape: (6, 3)
┌───────┬────────┬─────┐
│ date ┆ factor ┆ val │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═══════╪════════╪═════╡
│ date1 ┆ A ┆ 1 │
│ date1 ┆ B ┆ 2 │
│ date1 ┆ C ┆ 3 │
│ date2 ┆ B ┆ 3 │
│ date3 ┆ B ┆ 1 │
│ date3 ┆ C ┆ 5 │
└───────┴────────┴─────┘
Some of the factors are missing. I'd like to fill in the gaps with values 0.
shape: (9, 3)
┌───────┬────────┬───────┐
│ date ┆ factor ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═══════╪════════╪═══════╡
│ date1 ┆ A ┆ 1 │
│ date1 ┆ B ┆ 2 │
│ date1 ┆ C ┆ 3 │
│ date2 ┆ A ┆ 0 │
│ date2 ┆ B ┆ 3 │
│ date2 ┆ C ┆ 0 │
│ date3 ┆ A ┆ 0 │
│ date3 ┆ B ┆ 1 │
│ date3 ┆ C ┆ 5 │
└───────┴────────┴───────┘
For pure readability/"polars"icity, I think
(df.pivot(on="factor", index="date")
.unpivot(index="date", variable_name="factor", value_name="value")
)
┌───────┬────────┬───────┐
│ date ┆ factor ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═══════╪════════╪═══════╡
│ date1 ┆ A ┆ 1 │
│ date2 ┆ A ┆ null │
│ date3 ┆ A ┆ null │
│ date1 ┆ B ┆ 2 │
│ date2 ┆ B ┆ 3 │
│ date3 ┆ B ┆ 1 │
│ date1 ┆ C ┆ 3 │
│ date2 ┆ C ┆ null │
│ date3 ┆ C ┆ 5 │
└───────┴────────┴───────┘
is good, since it makes most clear what you are trying to do, make the dataframe look like a usual "melted"/"unpivoted" one. I haven't benchmarked it though.