pythonjson

Python script exports only half the information from the JSON


I have a Python script but when it is exporting the JSON, it prints only "y", "meas", and "label" without any data. I have no errors when I run the script. Can you advise to fix that? I cant post the link and the JSON.

import requests
from pandas.io.json import json_normalize
import json
import pandas as pd
import numpy as np

# try this

total_df = pd.DataFrame()
request_list = ["LInk",]

# read each link
for r in request_list:
    resp = requests.get(url=r)
    df = json_normalize(resp.json())

    item_list = df['dataset']
    current_df = pd.DataFrame()
    
    

    for i in item_list:
        try:
            current_df.loc[0,'y'] = i['data'][0]['y']
        except:
            current_df.loc[0,'y'] = np.nan
        try:
            current_df.loc[0,'meas'] = i['meas']
        except:
            current_df.loc[0,'meas'] = np.nan
        try:
            current_df.loc[0,'label'] = i['label']
        except:
            current_df.loc[0,'label'] = np.nan

    total_df = pd.concat([total_df,current_df])


total_df.to_excel('C:/Users/svetl/Onedrive/Desktop/work/output.xlsx',index=False)  

Solution

  • so the issue is with the way you handle the JSON. The key dataset has a list of dict items as value. You try to access the dict item directly while ignoring they are part of a list.

    Look at the edited part below, and notice the change from
    i['data'][0]['y']
    to
    i[0]['data'][0]['y']
    (and the other changes when accessing i)

        for i in item_list:
            try:
                current_df.loc[0,'y'] = i[0]['data'][0]['y']
            except Exception as e:
                print(e)
                current_df.loc[0,'y'] = np.nan
            try:
                current_df.loc[0,'meas'] = i[0]['meas']
            except Exception as e:
                print(e)
                current_df.loc[0,'meas'] = np.nan
            try:
                current_df.loc[0,'label'] = i[0]['label']
            except Exception as e:
                print(e)
                current_df.loc[0,'label'] = np.nan
    

    Sidenote:

    this is deprecated:

    from pandas.io.json import json_normalize
    

    you need to change it to:

    from pandas import json_normalize