I am trying to use ijson to retrieve an element from a json dict object.
The json string is inside a file and the only thing in that file is that content:
{"categoryTreeId":"0","categoryTreeVersion":"127","categoryAspects":[1,2,3]}
(that string is very simplified but in fact is over 2GB long)
I need to help to do the following:
1/ Open that file and
2/ Use ijson to load that json data in to some object
3/ Retrieve the list "[1,2,3]" from that object
Why not just using the following simple code:
my_json = json.loads('{"categoryTreeId":"0","categoryTreeVersion":"127","categoryAspects":[1,2,3]}')
my_list = my_json['categoryAspects']
Well, you have to imagine that this "[1,2,3]" list is in fact over 2GB long , so using json.loads() will not work(it would just crash).
I tried a lot of combination (A LOT) and they all failed Here are some examples of the things that I tried
ij = ijson.items(fd,'') -> this does not give any error, the one below do
my_list = ijson.items(fd,'').next()
-> error = '_yajl2.items' object has no attribute 'next'
my_list = ijson.items(fd,'').items()
-> error = '_yajl2.items' object has no attribute 'items'
my_list = ij['categoryAspects']
-> error = '_yajl2.items' object is not subscriptable
This should work:
with open('your_file.json', 'b') as f:
for n in ijson.items(f, 'categoryAspects.item'):
print(n)
Additionally, and if you know your numbers are kind of "normal numbers", you can also pass use_float=True
as an extra argument to items
for extra speed (ijson.items(f, 'categoryAspects.item', use_float=True)
in the code above) -- more details about it in the documentation.
EDIT: Answering a further question: to simply get a list with all the numbers you can create one directly from the items
function like so:
with open('your_file.json', 'b') as f:
numbers = list(ijson.items(f, 'categoryAspects.item'))
Mind you that if there are too many numbers you might still run out of memory, defeating the purpose of doing a streaming parsing.
EDIT2: An alternative to using a list is to create a numpy array with all the numbers, which should give a more compact representation in memory of all the numbers at once, in case they are needed:
with open('your_file.json', 'b') as f:
numbers = numpy.fromiter(
ijson.items(f, 'categoryAspects.item', use_float=True),
dtype='float' # or int, if these are integers
)