I have a pl.DataFrame
with a column that is a list
of struct
entries. The lengths of the lists might differ:
pl.DataFrame(
{
"id": [1, 2, 3],
"s": [
[
{"a": 1, "b": 1},
{"a": 2, "b": 2},
{"a": 3, "b": 3},
],
[
{"a": 10, "b": 10},
{"a": 20, "b": 20},
{"a": 30, "b": 30},
{"a": 40, "b": 40},
],
[
{"a": 100, "b": 100},
{"a": 200, "b": 200},
{"a": 300, "b": 300},
{"a": 400, "b": 400},
{"a": 500, "b": 500},
],
],
}
)
This looks like this:
shape: (3, 2)
┌─────┬─────────────────────────────────┐
│ id ┆ s │
│ --- ┆ --- │
│ i64 ┆ list[struct[2]] │
╞═════╪═════════════════════════════════╡
│ 1 ┆ [{1,1}, {2,2}, {3,3}] │
│ 2 ┆ [{10,10}, {20,20}, … {40,40}] │
│ 3 ┆ [{100,100}, {200,200}, … {500,… │
└─────┴─────────────────────────────────┘
I've tried various versions of unnest
and explode
, but I am failing to turn this into a long pl.DataFrame
where the list
is turned into rows and the struct
entries into columns. This is what I want to see:
pl.DataFrame(
{
"id": [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3],
"a": [1, 2, 3, 10, 20, 30, 40, 100, 200, 300, 400, 500],
"b": [1, 2, 3, 10, 20, 30, 40, 100, 200, 300, 400, 500],
}
)
Which looks like this:
shape: (12, 3)
┌─────┬─────┬─────┐
│ id ┆ a ┆ b │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ 1 ┆ 1 │
│ 1 ┆ 2 ┆ 2 │
│ 1 ┆ 3 ┆ 3 │
│ 2 ┆ 10 ┆ 10 │
│ 2 ┆ 20 ┆ 20 │
│ … ┆ … ┆ … │
│ 3 ┆ 100 ┆ 100 │
│ 3 ┆ 200 ┆ 200 │
│ 3 ┆ 300 ┆ 300 │
│ 3 ┆ 400 ┆ 400 │
│ 3 ┆ 500 ┆ 500 │
└─────┴─────┴─────┘
Is there a way to manipulate the first pl.DataFrame
into the second pl.DataFrame
?
df.explode('s').unnest('s')
Output:
┌─────┬─────┬─────┐
│ id ┆ a ┆ b │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ 1 ┆ 1 │
│ 1 ┆ 2 ┆ 2 │
│ 1 ┆ 3 ┆ 3 │
│ 2 ┆ 10 ┆ 10 │
│ 2 ┆ 20 ┆ 20 │
│ … ┆ … ┆ … │
│ 3 ┆ 100 ┆ 100 │
│ 3 ┆ 200 ┆ 200 │
│ 3 ┆ 300 ┆ 300 │
│ 3 ┆ 400 ┆ 400 │
│ 3 ┆ 500 ┆ 500 │
└─────┴─────┴─────┘