Expand Columns of Structs to Rows in polars

Say we have this dataframe:

import polars as pl
df = pl.DataFrame({'EU': {'size': 10, 'GDP': 80},
                   'US': {'size': 100, 'GDP': 800},
                   'AS': {'size': 80, 'GDP': 500}})

shape: (1, 3)
┌───────────┬───────────┬───────────┐
│ EU        ┆ US        ┆ AS        │
│ ---       ┆ ---       ┆ ---       │
│ struct[2] ┆ struct[2] ┆ struct[2] │
╞═══════════╪═══════════╪═══════════╡
│ {10,80}   ┆ {100,800} ┆ {80,500}  │
└───────────┴───────────┴───────────┘

I am looking for a function like df.expand_structs(column_name='metric') that gives

shape: (2, 4)
┌────────┬─────┬─────┬─────┐
│ metric ┆ EU  ┆ US  ┆ AS  │
│ ---    ┆ --- ┆ --- ┆ --- │
│ str    ┆ i64 ┆ i64 ┆ i64 │
╞════════╪═════╪═════╪═════╡
│ size   ┆ 10  ┆ 100 ┆ 80  │
│ GBP    ┆ 80  ┆ 800 ┆ 500 │
└────────┴─────┴─────┴─────┘

I've tried other functions like unnest, explode but no luck. Any help appreciated!

Solution

TL;DR

Performance comparison at the end.

Both @etrotta's method and @DeanMacGregor's adjustment perform well on a pl.lazyframe with small Structs (e.g., struct[2]) and columns N <= 15 (not collected). Other methods fail lazily.

With bigger Structs and/or columns N > 15, both unpivot options below start to outperform. Other suggested methods thus far slower in general.

Option 1

out = (df.unpivot()
       .unnest('value')
       .select(pl.exclude('variable'))
       .transpose(include_header=True)
       .pipe(
           lambda x: x.rename(
               dict(zip(x.columns, ['metric'] + df.columns))
               )
           )
       )

Output:

shape: (2, 4)
┌────────┬─────┬─────┬─────┐
│ metric ┆ EU  ┆ US  ┆ AS  │
│ ---    ┆ --- ┆ --- ┆ --- │
│ str    ┆ i64 ┆ i64 ┆ i64 │
╞════════╪═════╪═════╪═════╡
│ size   ┆ 10  ┆ 100 ┆ 80  │
│ GDP    ┆ 80  ┆ 800 ┆ 500 │
└────────┴─────┴─────┴─────┘

Explanation / Intermediates

Use df.unpivot:

shape: (3, 2)
┌──────────┬───────────┐
│ variable ┆ value     │
│ ---      ┆ ---       │
│ str      ┆ struct[2] │
╞══════════╪═══════════╡
│ EU       ┆ {10,80}   │
│ US       ┆ {100,800} │
│ AS       ┆ {80,500}  │
└──────────┴───────────┘

So that we can apply df.unnest on new 'value' column:

shape: (3, 3)
┌──────────┬──────┬─────┐
│ variable ┆ size ┆ GDP │
│ ---      ┆ ---  ┆ --- │
│ str      ┆ i64  ┆ i64 │
╞══════════╪══════╪═════╡
│ EU       ┆ 10   ┆ 80  │
│ US       ┆ 100  ┆ 800 │
│ AS       ┆ 80   ┆ 500 │
└──────────┴──────┴─────┘

Use df.select to exclude 'variable' column (pl.exclude) and df.transpose with include_header=True:

shape: (2, 4)
┌────────┬──────────┬──────────┬──────────┐
│ column ┆ column_0 ┆ column_1 ┆ column_2 │
│ ---    ┆ ---      ┆ ---      ┆ ---      │
│ str    ┆ i64      ┆ i64      ┆ i64      │
╞════════╪══════════╪══════════╪══════════╡
│ size   ┆ 10       ┆ 100      ┆ 80       │
│ GDP    ┆ 80       ┆ 800      ┆ 500      │
└────────┴──────────┴──────────┴──────────┘

Now, we just need to rename the columns. Here done via df.pipe + df.rename. Without the chained operation, that can also be:

out.columns = ['metric'] + df.columns

Option 2

out2 = (df.unpivot()
        .unnest('value')
        .unpivot(index='variable', variable_name='metric')
        .pivot(on='variable', index='metric')
        )

Equality check:

out.equals(out2)
# True

Explanation / Intermediates

Same start as option 1, but followed by a second df.unpivot to get:

shape: (6, 3)
┌────────┬────────┬───────┐
│ column ┆ metric ┆ value │
│ ---    ┆ ---    ┆ ---   │
│ str    ┆ str    ┆ i64   │
╞════════╪════════╪═══════╡
│ EU     ┆ size   ┆ 10    │
│ US     ┆ size   ┆ 100   │
│ AS     ┆ size   ┆ 80    │
│ EU     ┆ GDP    ┆ 80    │
│ US     ┆ GDP    ┆ 800   │
│ AS     ┆ GDP    ┆ 500   │
└────────┴────────┴───────┘

Followed by df.pivot on 'column' with 'metric' as the index to get desired shape.

Performance comparison (gist)

Columns: n_range=[2**k for k in range(12)]

Struct: 2, 20, 100

Methods compared:

unpivot_unnest_t (option 1), #@ouroboros1
unpivot_unnest_t2 (option 1, adj)
unpivot_pivot (option 2)
concat_list_expl, #@etrotta
concat_list_expl_lazy, #lazy
concat_list_expl2, #@etrotta, #@DeanMacGregor
concat_list_expl2_lazy, #lazy
map_batches, #@DeanMacGregor
loop, #@sammywemmy

Results: