I'm trying to stream through a large json file using ijson in python. This is my first time trying this.
my code is really simple right now:
with open('file.json', 'rb') as f:
j = ijson.items(f, 'item')
for item in j:
print('x')
This returns a "trailing garbage" error - essentially the 2nd item in the file is considered garbage, i think because of the file format.
My json file is this one from kaggle, and is formatted like this:
{"_id":{"$oid":"6457879fd1187d621cbbba9c"},"sourceCC":"us",...etc...}
{"_id":{"$oid":"6457879fd1187d621cbddd8a"},"sourceCC":"us",...etc...}
It is about 3GB in size, so im unable to open it.
If i use 'multiple_items=True' i believe it considers all the items to be multiple values for the same item, so it does not return any error, but also does not return anything else.
What can I do?
Thanks.
That's not actuall a JSON document. That is a series of JSON documents concatenated using newlines. You don't need ijson
to read it; you can instead read it line-by-line and use the built-in json
module:
import json
with open('myfile.json') as fd:
for line in fd:
obj = json.loads(line)
# do something with obj here