pythonpandasspss-files

How to recode SYSTEM missing values from nan to empty space while saving SPSS system (sav) files from pandas dataframe?


I use savReaderWriter module to save an SPSS file from python pandas dataframe with the following code:

import savReaderWriter as srw
savFileName = 'Outfile name.sav'

records = map(list, df.values)

varNames = list(df.columns)
varTypes = {}

for n, values in enumerate(records[0]):
  varName = varNames[n]
  if df.dtypes[n] == 'float64':
      varTypes[varName] = 0
  else:
      varTypes[varName] = 255

with srw.SavWriter(savFileName, varNames, varTypes, ioUtf8=True) as writer:
    writer.writerows(records)

The problem is that empty string variables in the SPSS file have "nan" values. In the documentation default option for savWriter is missingValues=None, but changing the "None" to '' or any other string doesn't do the job. What would be solution to have empty string instead of nan?

Thank you very much in advance


Solution

  • I guess if you want to represent "nan" values as empty strings the best way is replace them in the source df

    df.fillna('')
    

    and save after that.

    P.S. But please take note of the approach which SPSS uses to work with missing data. These settings are in a file's header.