How to create pandas dataframe from nested json with dictionary

I'm trying to create a pandas dataframe form json file. I've seen a multiple solutions to this problem which uses built in functions from_dict/json_normalize yet I'm unable to apply it to my code. Here's how my data is structured in json file:

     "data": [
   {
      "groups": {
         "data": [
               {
               "group": "Math",
               "year_joined": "2009"
               },
               {
               "group_name": "History",
               "year_joined": "2011"
               },
               {
               "group_name": "Biology",
               "year_joined": "2010"
               }
         ]
      },
      "id": "12512"
   },

When I'm trying to normalize this data with pandas function like this:

path = 'mypath'
f = open(path)
data = json.load(f)

test = pd.json_normalize(
            data['data'], 
            errors='ignore')

I just receive something like this:

    id      groups.data
0   12512   [{'group_name': 'Math', 'year_joined': '2009', 'gr...
1   23172   [{'group_name': 'Chemistry', 'year_joined': '2005'...

I want this data to look like this (solution 1):

    id      group     year_joined
0   12512   group1    year1
1   12512   group2    year2
2   12512   group3    year3

Or like this (solution 2):

    id      group                   year_joined
0   12512   group1,group2,group3    year1,year2,year3
1   23172   group4,group5           year4,year5

How can i achieve it? I tried passing 'record_path' parameter to 'json_normalize' function but it doesn't change anything. I tried to use 'DataFrame.from_dict' function to work around this but I failed. The only way I was able to get to solution 1 was to just create multiple loops that iterated through everything in json file and add it to separate list. It kinda works but takes a lot of time on bigger datasets.

How could i use built-in pandas tools to process files which are nested as dictionaries in 3rd layer of the file as presented above?

Solution

You need to collect the information from the data dictionary

solution 1

d = {}
for group in data["data"]:
    groups = [x["group_name"] for x in group['groups']["data"]]
    d['id'] = d.get('id', []) + [group['id']] * len(groups)
    d['group'] = d.get('group', []) + groups
    d['year_joined'] = d.get('year_joined', []) + [x["year_joined"] for x in group['groups']["data"]]

df = pd.DataFrame(d)

Output

      id      group year_joined
0  12512       Math        2009
1  12512    History        2011
2  12512    Biology        2010
3  23172  Chemistry        2007
4  23172  Economics        2008

solution 2

d = {}
for group in data["data"]:
    d['id'] = d.get('id', []) + [group['id']]
    d['group'] = d.get('group', []) + [','.join(x["group_name"] for x in group['groups']["data"])]
    d['year_joined'] = d.get('year_joined', []) + [','.join(x["year_joined"] for x in group['groups']["data"])]

df = pd.DataFrame(d)

Output

      id                 group     year_joined
0  12512  Math,History,Biology  2009,2011,2010
1  23172   Chemistry,Economics       2007,2008