How to convert a nested JSON to a Dataframe?

A response.json() returns a JSON in the following format:

{
    "workbooks": [
        {
            "name": "WORKBOOK_A",
            "embeddedDatasources": [
                {
                    "upstreamTables": [
                        {"name": "WORKBOOK_A_TABLE_A"}]},
                {
                    "upstreamTables": [
                        {"name": "WORKBOOK_A_TABLE_B"},
                        {"name": "WORKBOOK_A_TABLE_C"}]},
                {
                    "upstreamTables": []}]},
        {
            "name": "WORKBOOK_B",
            "embeddedDatasources": [
                {
                    "upstreamTables": [
                        {"name": "WORKBOOK_B_TABLE_A"},
                        {"name": "WORKBOOK_B_TABLE_B"}]},
                {
                    "upstreamTables": [
                        {"name": "WORKBOOK_B_TABLE_C"},
                        {"name": "WORKBOOK_B_TABLE_D"}]}]}]}

I am trying to convert it to a dataframe like this:

workbooks	upstreamTables
WORKBOOK_A	WORKBOOK_A_TABLE_A
WORKBOOK_A	WORKBOOK_A_TABLE_B
WORKBOOK_A	WORKBOOK_A_TABLE_C
WORKBOOK_B	WORKBOOK_B_TABLE_A
WORKBOOK_B	WORKBOOK_B_TABLE_B
WORKBOOK_B	WORKBOOK_B_TABLE_C
WORKBOOK_B	WORKBOOK_B_TABLE_D

"upstreamTables": [] should be ignored in this case.

Playing around with json_normalize

df = pd.json_normalize(json_data)

didn't play out so far and extracting the data as separate dataframes and rejoining them seems too convulsive.

Solution

Here's one approach:

Pass resp (i.e., response.json()) to pd.json_normalize with both record_path and meta. Add meta_prefix to avoid a ValueError: Conflicting metadata. Cf. this post. Otherwise we would end up with 2 name columns.
Use df.rename to rename the columns and re-order them.

import pandas as pd

# resp = {...}

df = (pd.json_normalize(resp['workbooks'], 
                        record_path=['embeddedDatasources', 'upstreamTables'], 
                        meta='name', 
                        meta_prefix='meta_'
                        )
      .rename(columns={'name': 'upstreamTables',
                       'meta_name': 'workbooks'})
      [['workbooks', 'upstreamTables']]
      )

Output:

    workbooks      upstreamTables
0  WORKBOOK_A  WORKBOOK_A_TABLE_A
1  WORKBOOK_A  WORKBOOK_A_TABLE_B
2  WORKBOOK_A  WORKBOOK_A_TABLE_C
3  WORKBOOK_B  WORKBOOK_B_TABLE_A
4  WORKBOOK_B  WORKBOOK_B_TABLE_B
5  WORKBOOK_B  WORKBOOK_B_TABLE_C
6  WORKBOOK_B  WORKBOOK_B_TABLE_D