pythonjsonparsingujson

Python breaks parsing json with characters \"


I'm trying to parse json string with an escape character (Of some sort I guess)

{
    "publisher": "\"O'Reilly Media, Inc.\""
}

Parser parses well if I remove the character \" from the string,

the exceptions raised by different parsers are,

json

  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 17 column 20 (char 392)

ujson

ValueError: Unexpected character in found when decoding object value

How do I make the parser to escape this characters ?

update: enter image description here ps. json is imported as ujson in this example

enter image description here

This is what my ide shows

comma is just added accidently, it has no trailing comma at the end of json, json is valid

enter image description here

the string definition.


Solution

  • You almost certainly did not define properly escaped backslashes. If you define the string properly the JSON parses just fine:

    >>> import json
    >>> json_str = r'''
    ... {
    ...     "publisher": "\"O'Reilly Media, Inc.\""
    ... }
    ... '''  # raw string to prevent the \" from being interpreted by Python
    >>> json.loads(json_str)
    {u'publisher': u'"O\'Reilly Media, Inc."'}
    

    Note that I used a raw string literal to define the string in Python; if I did not, the \" would be interpreted by Python and a regular " would be inserted. You'd have to double the backslash otherwise:

    >>> print '\"'
    "
    >>> print '\\"'
    \"
    >>> print r'\"'
    \"
    

    Reencoding the parsed Python structure back to JSON shows the backslashes re-appearing, with the repr() output for the string using the same double backslash:

    >>> json.dumps(json.loads(json_str))
    '{"publisher": "\\"O\'Reilly Media, Inc.\\""}'
    >>> print json.dumps(json.loads(json_str))
    {"publisher": "\"O'Reilly Media, Inc.\""}
    

    If you did not escape the \ escape you'll end up with unescaped quotes:

    >>> json_str_improper = '''
    ... {
    ...     "publisher": "\"O'Reilly Media, Inc.\""
    ... }
    ... '''
    >>> print json_str_improper
    
    {
        "publisher": ""O'Reilly Media, Inc.""
    }
    
    >>> json.loads(json_str_improper)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/__init__.py", line 338, in loads
        return _default_decoder.decode(s)
      File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 366, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 382, in raw_decode
        obj, end = self.scan_once(s, idx)
    ValueError: Expecting , delimiter: line 3 column 20 (char 22)
    

    Note that the \" sequences now are printed as ", the backslash is gone!