I have a dictionary of nested columns with the index as key in each one. When i try to convert it to a polars dataframe, it fetches the column names and the values right, but each column has just one element that's the dictionary of the column elements, without "expanding" it into a series.
An example, let's say i have:
d = {'col1': {'0':'A','1':'B','2':'C'}, 'col2': {'0':1,'1':2,'2':3}}
Then, when i do a pl.DataFrame(d)
or pl.from_dict(d)
, i'm getting:
col1 col2
--- ---
struct[3] struct[3]
{"A","B","C"} {1,2,3}
Instead of the regular dataframe.
Any idea how to fix this?
Thanks in advance!
There's not a particularly straight forward way to do that. You essentially have to take each column one at a time and unpivot it and then join each column back together.
d = {'col1': {'0':'A','1':'B','2':'C'}, 'col2': {'0':1,'1':2,'2':3}}
df = pl.DataFrame(d)
df_final=None
for col in df.columns:
df_new = df[col].to_frame().unnest(col)
df_new = df_new.unpivot(variable_name="index", value_name=col)
if df_final is None:
df_final=df_new
else:
df_final=df_final.join(df_new, on="index", how="full", coalesce=True)
df_final
shape: (3, 3)
┌───────┬──────┬──────┐
│ index ┆ col1 ┆ col2 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═══════╪══════╪══════╡
│ 0 ┆ A ┆ 1 │
│ 1 ┆ B ┆ 2 │
│ 2 ┆ C ┆ 3 │
└───────┴──────┴──────┘
If you can be assured that the keys of your nested cols will always be uniform and sorted you can do it as a map_batches
instead of a for loop with joins.
df.select(pl.all().map_batches(lambda s: (
s.to_frame().unnest(s.name).unpivot()['value']
)))
shape: (3, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ i64 │
╞══════╪══════╡
│ A ┆ 1 │
│ B ┆ 2 │
│ C ┆ 3 │
└──────┴──────┘