To read an spss .sav file using pandas/pyreadstat, you use:
df, meta = pyreadstat.read_sav()
to write a dataframe, you use:
pyreadstat.write_sav(df)
How can I read, edit and write a .sav file without losing any meta data, like labels and other things that can be changed in spss?
If this is not possible entirely, what would be the closest to not losing data this way?
Talk is cheap, here's the code. :-)
# using pyreadstat
from pyreadstat import write_sav
class TempFile(type(pathlib.Path())): # type: ignore
def __exit__(self, exc_type, exc_val, exc_tb):
filepath = str(self.absolute())
try:
os.remove(filepath)
except OSError:
logger.exception('romve temporary file: %s failed!', filepath)
self._closed = True
class SpssTool:
@classmethod
def to_spss(cls, df: DataFrame, io: BytesIO, metadata: metadata_container, *, compress: bool = False):
"""Writes a pandas dataframe to a BytesIO object.
Parameters
----------
df : pandas.DataFrame
pandas data frame to write to sav or zsav
io : BytesIO
the buffer to save spss file
metadata: metadata_container
spss file meta data container
compress : bool
whether compress to zsav.
"""
df.columns = SpssTool.get_legal_column_names(df.columns.to_list())
with TempFile(f'/tmp/{uuid4().hex}.{"zsav" if compress else "sav"}') as fp:
write_sav(
df=df,
dst_path=fp,
column_labels=metadata.column_labels if metadata else None,
variable_value_labels=dict(metadata.variable_value_labels) if metadata else {},
variable_measure=metadata.variable_measure if metadata else None,
)
io.write(fp.read_bytes())
Some expalinations:
SpssTool.get_legal_column_names
this is needed because spss file has restriction about the column name, see official document for details: https://www.ibm.com/docs/en/spss-statistics/27.0.0?topic=view-variable-names
metadata_container
This is from
from pyreadstat import metadata_container
. the container holding info about the dataset, you could find more detail in : https://ofajardo.github.io/pyreadstat_documentation/_build/html/index.html#metadata-object-description
Those maybe what you need.