I'm working with a web response of JSON that looks like this (simplified, and I can't change the format):
[
{ "type": "0","key1": 3, "key2": 5},
{ "type": "1","key3": "a", "key4": "b"},
{ "type": "2", "data": [<very big array here>] }
]
I want to do two things:
parsed = ijson.items(res.raw, 'item')
next(parsed) # first item
next(parsed) # second item
Inspect the third object without putting it all to memory.
If I do next(parsed)
again, all of the "data" array will be read to memory and turned into a dict, and I want to avoid it.
Inspect the data array without loading it all to memory. If I didn't care about the other keys, I could do that:
parsed = ijson.items(res.raw, 'item.data.item') # iterator over data's items
The problem is, I need to do all of these on the same stream.
Ideally it would have been great to receive the third object as a file-like object that I can pass to ijson again, but that seems out of scope for that API.
I'm also fine with replacing ijson with a library that can do this better.
You need to use ijson's event interception mechanism. Basically go one level down in the parsing logic by using ijson.parse
until you hit the big array, then switch to using ijson.items
with the rest of the parse
events. This uses a string literal, but should illustrate the point:
import ijson
s = b'''[
{ "type": "0","key1": 3, "key2": 5},
{ "type": "1","key3": "a", "key4": "b"},
{ "type": "2", "data": [1, 2, 3] }
]'''
parse_events = ijson.parse(s)
while True:
path, name, value = next(parse_events)
# do stuff with path, name, data, until...
if name == 'map_key' and value == 'data':
break
for value in ijson.items(parse_events, 'item.data.item'):
print(value)