pythonjsonpandasdictionaryjson-normalize

json_normalize produces a KeyError when trying to extract certain attributes


Here is a subset of my json file:

d = {'data': {'questions': [{'id': 6574,
                             'text': 'Question #1',
                             'instructionalText': '',
                             'minimumResponses': 0,
                             'maximumResponses': None,
                             'sortOrder': 1,
                             'answers': [{'id': 362949, 'text': 'Answer #1', 'parentId': None},
                                         {'id': 362950, 'text': 'Answer #2', 'parentId': None},
                                         {'id': 362951, 'text': 'Answer #3', 'parentId': None},
                                         {'id': 362952, 'text': 'Answer #4', 'parentId': None}]}]}}

I want to place this into a dataframe with each question and a row for each answer.

Python code:

from pandas import json_normalize
import json

fields = ['text','answers.text']

with open(R'response.json') as f:
    d = json.load(f)

data = json_normalize(d['data'],['questions'],errors='ignore')
data = data[fields]

print(data)

Which produces a KeyError:

KeyError: "['answers.text'] not in index"

Been at this for a few hours and absolutely cannot figure this out. I feel like it should be quite simple, but it never is.


Solution

  • import pandas as pd
    
    # normalize d
    df = pd.json_normalize(data=d['data']['questions'],
                           record_path= ['answers'],
                           meta=['id', 'text', 'instructionalText', 'minimumResponses', 'maximumResponses', 'sortOrder'],
                           record_prefix='answers_')
    
    # display(df)
       answers_id answers_text answers_parentId    id         text     instructionalText minimumResponses maximumResponses sortOrder
    0      362949    Answer #1             None  6574  Question #1                                      0             None         1
    1      362950    Answer #2             None  6574  Question #1                                      0             None         1
    2      362951    Answer #3             None  6574  Question #1                                      0             None         1
    3      362952    Answer #4             None  6574  Question #1                                      0             None         1
    4      262949    Answer #1             None  4756  Question #2  No cheating, cheater                0             None         1
    5      262950    Answer #2             None  4756  Question #2  No cheating, cheater                0             None         1
    6      262951    Answer #3             None  4756  Question #2  No cheating, cheater                0             None         1
    7      262952    Answer #4             None  4756  Question #2  No cheating, cheater                0             None         1
    

    Expanded Test Data

    d = {'data': {'questions': [{'id': 6574,
                                 'text': 'Question #1',
                                 'instructionalText': '',
                                 'minimumResponses': 0,
                                 'maximumResponses': None,
                                 'sortOrder': 1,
                                 'answers': [{'id': 362949, 'text': 'Answer #1', 'parentId': None},
                                             {'id': 362950, 'text': 'Answer #2', 'parentId': None},
                                             {'id': 362951, 'text': 'Answer #3', 'parentId': None},
                                             {'id': 362952, 'text': 'Answer #4', 'parentId': None}]},
                                {'id': 4756,
                                 'text': 'Question #2',
                                 'instructionalText': 'No cheating, cheater',
                                 'minimumResponses': 0,
                                 'maximumResponses': None,
                                 'sortOrder': 1,
                                 'answers': [{'id': 262949, 'text': 'Answer #1', 'parentId': None},
                                             {'id': 262950, 'text': 'Answer #2', 'parentId': None},
                                             {'id': 262951, 'text': 'Answer #3', 'parentId': None},
                                             {'id': 262952, 'text': 'Answer #4', 'parentId': None}]}]}}