pythonreshapepython-polars

merge groups of columns in a polars dataframe to single columns


I have a polars dataframe with columns a_0, a_1, a_2, b_0, b_1, b_2. I want to convert it to a longer and thinner dataframe (3 x rows, but just 2 columns a and b), so that a contains a_0[0], a_1[0], a_2[0], a_0[1], a_1[1], a_2[1],... and the same for b. How can I do that?


Solution

  • You can use concat_list() to join the columns you want together and then use explode() to convert them into rows.

    Let's take simple data frame as an example:

    df = pl.DataFrame(
        data=[[x for x in range(6)]],
        schema=[f"a_{i}" for i in range(3)] + [f"b_{i}" for i in range(3)]
    )
    
    ┌─────┬─────┬─────┬─────┬─────┬─────┐
    │ a_0 ┆ a_1 ┆ a_2 ┆ b_0 ┆ b_1 ┆ b_2 │
    │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
    │ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
    ╞═════╪═════╪═════╪═════╪═════╪═════╡
    │ 0   ┆ 1   ┆ 2   ┆ 3   ┆ 4   ┆ 5   │
    └─────┴─────┴─────┴─────┴─────┴─────┘
    

    Now, you can reshape it. First, concat the columns into lists and rename the columns for the final result:

    import polars.selectors as cs
    
    df.select(
        pl.concat_list(cs.starts_with(x)).alias(x) for x in ['a','b']
    )
    
    ┌───────────┬───────────┐
    │ a         ┆ b         │
    │ ---       ┆ ---       │
    │ list[i64] ┆ list[i64] │
    ╞═══════════╪═══════════╡
    │ [0, 1, 2] ┆ [3, 4, 5] │
    └───────────┴───────────┘
    

    No, explode lists into rows:

    df.select(
        pl.concat_list(cs.starts_with(x)).alias(x) for x in ['a','b']
    ).explode(pl.all())
    
    ┌─────┬─────┐
    │ a   ┆ b   │
    │ --- ┆ --- │
    │ i64 ┆ i64 │
    ╞═════╪═════╡
    │ 0   ┆ 3   │
    │ 1   ┆ 4   │
    │ 2   ┆ 5   │
    └─────┴─────┘