pythonjsoncsvpandasijson

Write JSON values from array and nested array to single CSV


I have a JSON output, where I want to create a csv file, that contains two columns. The first column should contain the userId and the second column should contain the value of videoSeries. The output looks like this:

{
  "start": 1490383076,
  "stop": 1492975076,
  "events": [
    {
      "time": 1491294219,
      "customParameters": [
        {
          "group": "channelId",
          "item": "dr3"
        },
        {
          "group": "videoGenre",
          "item": "unknown"
        },
        {
          "group": "videoSeries",
          "item": "min-mor-er-pink"
        },
        {
          "group": "videoSlug",
          "item": "min-mor-er-pink"
        }
      ],
      "userId": "cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16"
    }
  ],
}

My csv should look like this:

--------------------------------------------------------------
User ID                                       videoSeries
--------------------------------------------------------------
cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16    min-mor-er-pink
--------------------------------------------------------------

I have tried using ijson and pandas to get the desired output, but I am unable to get values from two different arrays into a single csv

import ijson
import pandas as pd

with open('MY JSON FILE', 'r') as f:
    objects = ijson.items(f, 'events.item')
    pandaReadable = list(objects)

df = pd.DataFrame(pandaReadable, columns=['userId', 'customParameters'])
df.to_csv('C:/Users/.../Desktop/output.csv', columns=['userId', 'customParameters'], index=False)

Solution

  • Try this approach:

    d is a dictionary built from your JSON:

    In [150]: d
    Out[150]:
    {'events': [{'customParameters': [{'group': 'channelId', 'item': 'dr3'},
        {'group': 'videoGenre', 'item': 'unknown'},
        {'group': 'videoSeries', 'item': 'min-mor-er-pink'},
        {'group': 'videoSlug', 'item': 'min-mor-er-pink'}],
       'time': 1491294219,
       'userId': 'cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16'}],
     'start': 1490383076,
     'stop': 1492975076}
    

    Solution:

    In [153]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \
         ...:   .query("group in ['videoSeries']")[['userId','item']]
         ...:
    Out[153]:
                                           userId             item
    2  cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16  min-mor-er-pink
    

    if you need to have videoSeries as a column name:

    In [154]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \
         ...:   .query("group in ['videoSeries']")[['userId','item']] \
         ...:   .rename(columns={'item':'videoSeries'})
         ...:
    Out[154]:
                                           userId      videoSeries
    2  cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16  min-mor-er-pink