I currently use scipy.io.readsav() to import IDL .sav files to Python, which is working well, eg:
data = scipy.io.readsav('data.sav', python_dict=True, verbose=True)
However, if the .sav file is large (say > 1 GB), I get a MemoryError when trying to import into Python.
Usually, iterating through the data would of course solve this (if it were a .txt or .csv file) rather than loading it in all in at once, but I don't see how I can do this when using .sav files, considering the only method I know of to import it is using readsav.
Any ideas how I can avoid this memory error?
You expressed interest in iterating over a .sav file. One (not too onerous) way to do this would be to write a lightweight wrapper class or function to use instead of SciPy's readsav()
, using the slightly lower-level functions in the scipy.io.idl
module, such as _read_record()
.
Using that function, one could do something like the following:
from scipy.io import idl
def sav_iterator(file_path):
with open(file_path, "rb") as fp: # open file for reading in binary mode
signature = fp.read(2) # should be b'SR'
recfmt = fp.read(2) # should be b'\x00\x04' for uncompressed
while True:
record_dict = idl._read_record(fp) # parses dict and advances file pointer accordingly
yield record_dict
if record_dict["rectype"] == "END_MARKER":
break # stop iteration
for record in sav_iterator("my_data.sav"):
do_something_with(record) # placeholder
With this method, only one record's worth of data ever needs to be held in memory at a time.