I have pandas daraframe where I have multiple column which has list of dictionary. Ex:
col1 col2 combine_col1 combine_col2
val1 val2 [{'x1':'v1','x2':'v2'}] [{'x3':'v3','x4':'v4','x5':'v5'}]
val11 val22 [{'x1':'v11','x2':'v22'}] [{'x3':'v33','x4':'v44','x5':'v55'}]
Inside the combined column the number of keys and their name I don't know.It can be anything.
I want to explode those specific columns which has list of dict and get all the keys as new columns. For this example the output dataframe column names should be col1, col2, x1, x2,x3,x4,x5
. Currently I am hardcoding the combine cols and then doing explode operation. But I want that to happen automatically.
If there are only one element lists like in sample data use json_normalize
with str[0]
for select them:
import ast
cols = ['combine_col1','combine_col2']
#if necessary
#df[cols] = df[cols].applymap(ast.literal_eval)
df1 = (df.drop(cols, axis=1)
.join(pd.concat([pd.json_normalize(df[x].str[0]) for x in cols], axis=1)))
print (df1)
col1 col2 x1 x2 x3 x4 x5
0 val1 val2 v1 v2 v3 v4 v5
1 val11 val22 v11 v22 v33 v44 v55
EDIT: Solution with append prefix
:
df1 = (df.drop(cols, axis=1)
.join(pd.concat([pd.json_normalize(df[x].str[0]).add_prefix(f'{x}.')
for x in cols], axis=1)))
print (df1)
col1 col2 combine_col1.x1 combine_col1.x2 combine_col2.x3 \
0 val1 val2 v1 v2 v3
1 val11 val22 v11 v22 v33
combine_col2.x4 combine_col2.x5
0 v4 v5
1 v44 v55
If possible multiple values per lists use Series.explode
:
df1 = (df.drop(cols, axis=1)
.join(pd.concat([pd.json_normalize(df[x].explode()) for x in cols], axis=1)))
df1 = (df.drop(cols, axis=1)
.join(pd.concat([pd.json_normalize(df[x].explode()).add_prefix(f'{x}.')
for x in cols], axis=1)))