pythoncsvspss-files

How to convert large .sav file into csv file


I am trying to convert a big ~2GB SPSS (.SAV) file into CSV using Python.

If there was a file which size < 500MB, there is no problem doing the following:

import pandas as pd
df = pd.read_spss('stdFile.sav')
df.to_csv("stdFile.csv", encoding = "utf-8-sig")

but in this case, i got a MemoryError...

Iam looking forward solutions, not necessarily in Python. But I don't have a SPSS license, so I must transform the file with another tool.


Solution

  • You can use python's pyreadstat package to read the spss file in chunks, and save each chunk to the csv:

    import pyreadstat
    fpath = "path/to/stdFile.sav"
    outpath = "stdFile.csv"
    # chunksize determines how many rows to be read per chunk
    reader = pyreadstat.read_file_in_chunks(pyreadstat.read_sav, fpath, chunksize= 10000)
    
    cnt = 0
    for df, meta in reader:
        # if on the first iteration write otherwise append
        if cnt>0:
            wmode = "a"
            header = False
        else:
            wmode = "w"
            header = True
        # write
        df.to_csv(outpath, mode=wmode, header=header)
        cnt+=1
    
    
    

    more information here: https://github.com/Roche/pyreadstat#reading-rows-in-chunks