[SOLVED] Is there a way to flatten a pandas column of nested json strings with a varying number of objects?

Is there a way to flatten a pandas column of nested json strings with a varying number of objects?

I have a pandas dataframe that is formatted similarly to this:

order_guid	selections
ef23a23d-5e2c-42a0-952b-cf43732c55b8	[{"id":"7233","item":"Smoothie"},{"guid":"6317","item":"Juice"},{"guid":"6941","item":"Chips"}]
0fb69525-7b71-4dec-ab77-36a698d519f6	[{"id":"5425","item":"Wrap"}]
59b5f8e9-96f1-483b-a31d-659492765e06	[{"id":"7233","item":"Smoothie"},{"guid":"3803","item":"Water"},{"guid":"7961","item":"Salad"}]
d8656ba2-4076-4036-9515-6fea1f9e58d4	[{"id":"7961","item":"Salad"},{"guid":"5425","item":"Wrap"}]

What I've been trying to do is unnest the selections field such that each selection occupies its own row with the order_guid as a foreign key to a separate orders table, similar to this:

order_guid	selection_id	selection_item
ef23a23d-5e2c-42a0-952b-cf43732c55b8	7233	Smoothie
ef23a23d-5e2c-42a0-952b-cf43732c55b8	6317	Juice
ef23a23d-5e2c-42a0-952b-cf43732c55b8	6941	Chips
0fb69525-7b71-4dec-ab77-36a698d519f6	5425	Wrap
59b5f8e9-96f1-483b-a31d-659492765e06	7233	Smoothie
59b5f8e9-96f1-483b-a31d-659492765e06	3803	Water
59b5f8e9-96f1-483b-a31d-659492765e06	7961	Salad
d8656ba2-4076-4036-9515-6fea1f9e58d4	7961	Salad
d8656ba2-4076-4036-9515-6fea1f9e58d4	5425	Wrap

The main thing giving me trouble is the varying number of nested json elements. My initial instinct was that I could do something like to_json() ->json.loads()->json_normalize, but I can't make that work with the order_guid and the multiple selections. Looping seems silly. It feels like this is a simple enough problem that someone else should've encountered it, but this is the closest I've seen to the issue.

Is there a pandas method for this? Or something obvious I'm missing?

Solution

You can first use df.explode() before using json_normalize(). Hope this is what you meant.

df = df.explode('selections').reset_index(drop=True)
result_df = pd.concat([df['order_guid'], pd.json_normalize(df.selections)], axis=1)
result_df['guid'] = result_df['guid'].fillna(result_df['id'])
result_df = result_df[['order_guid', 'guid', 'item']].rename(columns={'guid': 'selection_guid', 'item': 'selection_item'})