I have a polars dataframe with columns a_0, a_1, a_2, b_0, b_1, b_2
. I want to convert it to a longer and thinner dataframe (3 x rows, but just 2 columns a
and b
), so that a
contains a_0[0], a_1[0], a_2[0], a_0[1], a_1[1], a_2[1],...
and the same for b
. How can I do that?
You can use concat_list()
to join the columns you want together and then use explode()
to convert them into rows.
Let's take simple data frame as an example:
df = pl.DataFrame(
data=[[x for x in range(6)]],
schema=[f"a_{i}" for i in range(3)] + [f"b_{i}" for i in range(3)]
)
┌─────┬─────┬─────┬─────┬─────┬─────┐
│ a_0 ┆ a_1 ┆ a_2 ┆ b_0 ┆ b_1 ┆ b_2 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╪═════╪═════╡
│ 0 ┆ 1 ┆ 2 ┆ 3 ┆ 4 ┆ 5 │
└─────┴─────┴─────┴─────┴─────┴─────┘
Now, you can reshape it. First, concat the columns into lists and rename the columns for the final result:
import polars.selectors as cs
df.select(
pl.concat_list(cs.starts_with(x)).alias(x) for x in ['a','b']
)
┌───────────┬───────────┐
│ a ┆ b │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [0, 1, 2] ┆ [3, 4, 5] │
└───────────┴───────────┘
No, explode lists into rows:
df.select(
pl.concat_list(cs.starts_with(x)).alias(x) for x in ['a','b']
).explode(pl.all())
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 0 ┆ 3 │
│ 1 ┆ 4 │
│ 2 ┆ 5 │
└─────┴─────┘