pythonjsonijson

Ijson parse from list


I have a list in which each item contains JSON data, so I am trying to parse the data using Ijson since the data load will be huge.

Image for the list

This is what I am trying to achieve:

article_data=#variable which contains the list
parser = ijson.parse(article_data)
for id in ijson.items(parser, 'item'):
    if(id['article_type'] != "Monthly Briefing" and id['article_type']!="Conference"):
        data_article_id.append(id['article_id'])
        data_article_short_desc.append(id['short_desc'])
        data_article_long_desc.append(id['long_desc'])

This is the error I get:

AttributeError: 'generator' object has no attribute 'read'

I thought of converting the list into string and then try to parse in Ijson, but it fails and gives me the same error.

Any suggestions please?

data_article_id=[] 
data_article_short_desc=[] 
data_article_long_desc=[] 

for index in article_data: 
    parser = ijson.parse(index)
    for id in ijson.items(parser, 'item'):
        if(id['article_type'] != "Monthly Briefing" and id['article_type']!="Conference"):
            data_article_id.append(id['article_id'])
            data_article_short_desc.append(id['short_desc'])
            data_article_long_desc.append(id['long_desc'])

since it is in list, i tried this one also .. but it is giving me the same error.

'generator' object has no attribute 'read'


Solution

  • I am assuming that you have a list of byte string json object that you want to parse.

    ijson.items(JSON, prefix) takes a readable byte object as input. That is it takes a opened file or file-like object as input. Specifically, the input should be bytes file-like objects.

    If you are using Python 3, you can use io module with io.BytesIO to create a in-memory binary stream.

    Example

    Suppose input is [b'{"id": "ab"}', b'{"id": "cd"}']

    list_json = [b'{"id": "ab"}', b'{"id": "cd"}']
    for json in list_json:
        item = ijson.items(io.BytesIO(json), "")
        for i in item:
            print(i['id'])
    Output: 
        ab
        cd