python-polars

Unstack columns by groups in parallel with polars


Hy!

I need to unstack some columns in polars, but i want to group the columns by column 'a'

seq_ = df['a'].unique()
l = len(seq_)
dfs = []

for idx, a in enumerate(seq_):
    df_tmp = df.filter(pl.col('a') == a)
    steps = df_tmp.shape[0]
    df_tmp = df_tmp.unstack(step=steps, how="horizontal",columns=['b', 'c'])
    df_tmp = df_tmp.with_columns(pl.lit(a).alias('a'))
    dfs.append(df_tmp)

df_tmp = pl.concat(dfs, how='diagonal')

this code is too much slower. Is there any way to do this in polars more faster?

from this:

df = pl.from_repr("""
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ f   │
│ 2   ┆ b   ┆ g   │
│ 1   ┆ c   ┆ h   │
│ 1   ┆ d   ┆ i   │
│ 2   ┆ e   ┆ j   │
└─────┴─────┴─────┘
""")

to this:

┌─────┬─────┬─────┬──────┬─────┬─────┬──────┐
│ a   ┆ b_0 ┆ b_1 ┆ b_2  ┆ c_0 ┆ c_1 ┆ c_2  │
│ --- ┆ --- ┆ --- ┆ ---  ┆ --- ┆ --- ┆ ---  │
│ i64 ┆ str ┆ str ┆ str  ┆ str ┆ str ┆ str  │
╞═════╪═════╪═════╪══════╪═════╪═════╪══════╡
│ 1   ┆ a   ┆ c   ┆ d    ┆ f   ┆ h   ┆ i    │
│ 2   ┆ b   ┆ e   ┆ null ┆ g   ┆ j   ┆ null │
└─────┴─────┴─────┴──────┴─────┴─────┴──────┘

Solution

  • df = df.group_by('a', maintain_order=True).all()
    
    for col in 'bc':
        df = df.with_columns(
            pl.col(col).list.to_struct(
                n_field_strategy='max_width', fields=lambda i: f'{col}_{i}'
            )
        ).unnest(col)
    
    shape: (2, 7)
    ┌─────┬─────┬─────┬──────┬─────┬─────┬──────┐
    │ a   ┆ b_0 ┆ b_1 ┆ b_2  ┆ c_0 ┆ c_1 ┆ c_2  │
    │ --- ┆ --- ┆ --- ┆ ---  ┆ --- ┆ --- ┆ ---  │
    │ i64 ┆ str ┆ str ┆ str  ┆ str ┆ str ┆ str  │
    ╞═════╪═════╪═════╪══════╪═════╪═════╪══════╡
    │ 1   ┆ a   ┆ c   ┆ d    ┆ f   ┆ h   ┆ i    │
    │ 2   ┆ b   ┆ e   ┆ null ┆ g   ┆ j   ┆ null │
    └─────┴─────┴─────┴──────┴─────┴─────┴──────┘
    

    There might be an alternate, cleaner solution with pivot, but at a glance I don't think this quite fits the mold of a pivot.