pythonjsonstringpython-requestsmalformed

ValueError: malformed string using ast.literal_eval


I'm doing a loop to get json api, here is what I have in my loop:

response_item = requests.request('GET',url_item,params=None,verify=False)
response_item = json.loads(response_item.text)
response_item = ast.literal_eval(json.dumps(response_item, ensure_ascii=False).encode('utf8'))

I scan around 45000 json objects, I generate "url_item" variable for each iteration. Each object is the same, I can get something like 7000 object and I have the following error when I reach the 7064th:

Traceback (most recent call last):
  File "C:\Python27\tools\api_item.py", line 47, in <module>
    response_item = ast.literal_eval(json.dumps(response_item, ensure_ascii=False).encode('utf8'))
  File "C:\Python27\lib\ast.py", line 80, in literal_eval
    return _convert(node_or_string)
  File "C:\Python27\lib\ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "C:\Python27\lib\ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "C:\Python27\lib\ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "C:\Python27\lib\ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "C:\Python27\lib\ast.py", line 79, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

I used to print the second and third "response_item". Of course in this case the third one isn't displayed since I have the error just before, here what I have for the print after the json.load:

{u'restrictions': [], u'name': u'Sac \xe0 dos de base', u'level': 0, u'rarity': u'Basic', u'vendor_value': 11, u'details': {u'no_sell_or_sort': False, u'size': 20}, u'game_types': [u'Activity', u'Wvw', u'Dungeon', u'Pve'], u'flags': [u'NoSell', u'SoulbindOnAcquire', u'SoulBindOnUse'], u'icon': u'https://render.guildwars2.com/file/80E36806385691D4C0910817EF2A6C2006AEE353/61755.png', u'type': u'Bag', u'id': 8932, u'description': u'Un sac de 20 emplacements pour les personnages d\xe9butants.'}

Every item I get before this one has the same type, same format, and I don't have any error except for the 7064th !

Thank you for your help!


Solution

  • You should not use ast.literal_eval() on JSON data. JSON and Python literals may look like the same thing, but they are very much not.

    In this case, your data contains a boolean flag, set to false in JSON. A proper Python boolean uses title-case, so False:

    >>> import json, ast
    >>> s = '{"no_sell_or_sort": false, "size": 20}'
    >>> json.loads(s)
    {u'no_sell_or_sort': False, u'size': 20}
    >>> ast.literal_eval(s)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/ast.py", line 80, in literal_eval
        return _convert(node_or_string)
      File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/ast.py", line 63, in _convert
        in zip(node.keys, node.values))
      File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/ast.py", line 62, in <genexpr>
        return dict((_convert(k), _convert(v)) for k, v
      File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/ast.py", line 79, in _convert
        raise ValueError('malformed string')
    ValueError: malformed string
    

    Other differences include using null instead of None, and Unicode escape sequences in what to Python 2 looks like a plain (bytes) string, using UTF-16 surrogates when escaping non-BMP codepoints.

    Load your data with json.loads(), not ast.literal_eval(). Not only will it handle proper JSON just fine, it is also faster.

    In your case, it appears you are using json.dumps() then try to load the data again with ast.literal_eval(). There is no need for that step, you already had a Python object.

    In other words, the line:

    response_item = ast.literal_eval(json.dumps(response_item, ensure_ascii=False).encode('utf8'))
    

    is redundant at best, and very, very wrong, at worst. Re-encoding response_item to a JSON string does not produce something that can be interpreted as a Python literal.