pythonjsonijson

Exploding a large json-file using ijson


I have a large json-file (too large to fit in memory) with the following structure:

{
    "commonVal": "foo",
    "entries": [
        { "bar": "baz1" },
        { "bar": "baz2" },
        ...,
    ]
}

I would like to process this json file as objects like this:

commonVal: foo, bar: baz1
commonVal: foo, bar: baz2
...

I.e.

  1. Read commonVal (outside of the list) and store that in a variable,
  2. Iterate over entries one by one

To my help I have the ijson library.

I can perform step 1 using kvitems, and step 2 using items, but I can't figure out how to "mix" the two. (I would very much like to avoid dropping to the events-api because the entries in the list are more complex than in the example.)


Solution

  • You need to use ijson.parse, hopefully in combination with the event interception mechanism.

    ijson.parse will allow you to iterate over every single event in the JSON file. Using this you should be able to both read the common data, and identify when your big list starts. You can then forward the rest of the results to ijson.items.

    Check the example in the event interception mechanism doc, with some simple modifications it really should be pretty much what you need.