I'm reading a bunch of JSON lines from a file and getting an error while parsing them using json.loads()
. The file was encoded using us-ascii
, and I've tried utf-8
, unicode
, us-ascii
, and utf-16
encoders in json.loads()
but each time I get the same error. I suspect the JSON is malformed. Looks like it contains a spurious character in the first element itself but I can't see it. Here are the details:
The JSON string is:
{'asin': '5555991584', 'title': 'Memory of Trees', 'price': 9.49, 'imUrl': 'http://ecx.images-amazon.com/images/I/51b5WDjdhPL._SX300_.jpg', 'related': {'also_bought': ['B000002LRT', 'B000002LRR', 'B000050XEI', 'B000002MSM', 'B000B8QEYC', 'B001GQ2TGA', 'B000008FEA', 'B002RV01QI', 'B000002NJH', 'B000024V8E', 'B000JVSUXY', 'B00005S8ME', 'B00005K8EC', 'B000J233U8', 'B000J233SK', 'B000JFF2WW', 'B000J233TE', 'B0060ANYZ2', 'B002ZDOXLW', 'B0043ZDU1E', 'B000002U6E', 'B003LN9DRE', 'B000000WFU', 'B00002MG3U', 'B000002URV', 'B00006LJ72', 'B006C4P7BU', 'B00005UF2F', 'B001G9LVGG', 'B000CNF4LU', 'B000CNF4L0', 'B0007GAEGC', 'B00003OP2L', 'B005RYF5H2', 'B0069BUX0G', 'B00006JIAN', 'B000000WC1', 'B000000WF7', 'B001662F64', 'B000060O30', 'B003Y35H44', 'B009CSVPLY', 'B00DD348M2', 'B00005J9UN', 'B000TSQCHS', 'B000J233TY', 'B000001GBJ', 'B00005UE4B', 'B00000IL1K', 'B000BI1YJC', 'B0007M22TI', 'B004BBDHEK', 'B000GRUS22', 'B00J3V97NS', 'B0000062I1', 'B00005QZWI', 'B000UZ4GXC', 'B000WSRPOO', 'B002UZXJA6', 'B0000248JR', 'B000002MHL', 'B0002RUAAQ', 'B00AJLHVB6', 'B004M8SQB6', 'B000ELJAW4', 'B00000I609', 'B000AXWHPI', 'B001932LMW', 'B0007GAEVC', 'B00417HV6E', 'B00020HEH0', 'B0002YCVQK', 'B00E1C4SJC', 'B005FYCF2M', 'B00004UDNP', 'B00DJYJWTO', 'B000002VUC', 'B006ZZANFG', 'B000003BR4', 'B0000CC85G', 'B000KRNCYY', 'B000005J7X', 'B001662F6Y', 'B00004OCQG', 'B0000C7PQK', 'B001C4E6DA', 'B001662F7S', 'B00005QZCS', 'B000000NGH', 'B00063F8BC', 'B002HMHXLS', 'B0012GMY6Y', 'B000EMG9YU', 'B001662F8C', 'B000003F39', 'B000001CZE', 'B0001IXTIG', 'B000FOQ0KA', 'B00000DGUY', 'B0000000JS'], 'buy_after_viewing': ['B000002LRR', 'B002RV01QI', 'B000050XEI', 'B000002LRT']}, 'salesRank': {'Music': 939190}, 'categories': [['CDs & Vinyl', 'New Age', 'Celtic New Age'], ['CDs & Vinyl', 'New Age', 'Meditation'], ['CDs & Vinyl', 'Pop'], ['CDs & Vinyl', 'Rock'], ['Digital Music', 'New Age', 'Celtic New Age']]}
The relevant chunk of code is:
try:
f = open(filepath, 'r')
except IOError as e:
print "Cannot open the file named %s at location %s" % (name, filepath)
sys.exit(1)
tup = []
for line in f:
print line
line_elem = json.loads(line)
print line_elem
And the exception is:
{'asin': '5555991584', 'title': 'Memory of Trees', 'price': 9.49, 'imUrl': .... <I've omitted for brevity>
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
JSON Lint gives me the following error when I paste the line in it:
Parse error on line 1:
{ 'asin': '5555991584'
-----^
Expecting 'STRING', '}'
What could be going wrong?
This is quite simply not valid JSON, as the linter shows. JSON requires strings to use double quote marks, not single.