pythonjsonpandasdataframejupyter

How to solve an OverflowError when exporting pandas dataframe to JSON


In Jupyter, I have a dataframe of 400 000 objects that I can't entirely export to a JSON file without facing the following error.

The export is working great as long as I limit the exportation to the first 141 000 objects , whatever the order of these first objects.

Should I be aware of any size limitation dealing with large JSON files ? Thank you.

OverflowError                             Traceback (most recent call last)
<ipython-input-254-b59373f1eeb2> in <module>
----> 1 df4.to_json('test.json', orient = 'records')

~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index)
   1889                             default_handler=default_handler,
   1890                             lines=lines, compression=compression,
-> 1891                             index=index)
   1892 
   1893     def to_hdf(self, path_or_buf, key, **kwargs):

~/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index)
     56         double_precision=double_precision, ensure_ascii=force_ascii,
     57         date_unit=date_unit, default_handler=default_handler,
---> 58         index=index).write()
     59 
     60     if lines:

~/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in write(self)
     99         return self._write(self.obj, self.orient, self.double_precision,
    100                            self.ensure_ascii, self.date_unit,
--> 101                            self.date_format == 'iso', self.default_handler)
    102 
    103     def _write(self, obj, orient, double_precision, ensure_ascii,

~/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in _write(self, obj, orient, double_precision, ensure_ascii, date_unit, iso_dates, default_handler)
    154                                                double_precision,
    155                                                ensure_ascii, date_unit,
--> 156                                                iso_dates, default_handler)
    157 
    158 

~/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in _write(self, obj, orient, double_precision, ensure_ascii, date_unit, iso_dates, default_handler)
    110             date_unit=date_unit,
    111             iso_dates=iso_dates,
--> 112             default_handler=default_handler
    113         )
    114 

OverflowError: int too big to convert

Solution

  • There is no inherent limitation on data size in JSON, so this isn't your problem: the message suggests some difficulty with a particular integer value.

    This underlines the difficulty of working with such large files, since you now have to isolate the particular record that's causing the problems in the middle of the to_json call.

    Since you know roughly where the problem is you could try converting subsets of your data frame in a bisection technique to home in on the row that's causing the issues.