pythonpandasdataframedictionaryspss

Why does converting a nested python dictionary into a pandas dataframe result in "has no attribute 'items' error?


I have a nested dictionary stored in the variable nested_dict_variable. The dictionary is retrieved by using SPSS valueLabels Property (Python)

type(nested_dict_variable) results in dict.

print(nested_dict_variable) results in {0: {1.0: '1 - low', 2.0: '2', 3.0: '3', 4.0: '4', 5.0: '5 - high', 99.0: "99 - don't know"}, 1: {0.0: '0 - no', 1.0: '1 - yes'}, 2: {1.0: '1 - A', 2.0: '2 - B', 3.0: '3 - C'}}

I am trying to convert this nested dictionary into a pandas DataFrame, but receive the following error. I don't understand why this attribute error is raised given that nested_dict_variable is (or seems to be) a dictionary!?

AttributeError                            Traceback (most recent call last)
File c:\mypythonfile.py:38
     36 data_list = []
     37 for outer_key, inner_dict in nested_dict_variable.items():
---> 38     for inner_key, value in inner_dict.items():
     39         data_list.append({'Outer Key': outer_key, 'Inner Key': inner_key, 'Value': value})
     41 df = pd.DataFrame(data_list)

AttributeError: 'ValueLabel' object has no attribute 'items'

Here is my code:

# see: https://www.ibm.com/docs/en/spss-statistics/28.0.0?topic=programs-running-spss-statistics-from-external-python-process#d10392e74
import spss
# import pandas
import pandas as pd

# read spss-data
file = r"C:\SPSS-SampleData1.sav"
spss.Submit(
    f"""
GET FILE='{file}'.
"""
)

var_index = []
nested_dict_variable= {}

# initialise the handling of spss commands
spss.StartDataStep()

# access active dataset (the one that was read above)
datasetObj = spss.Dataset()

# get a list of variable objects
varListObj = datasetObj.varlist
for var in varListObj:
    var_index.append(var.index)
    nested_dict_variable[var.index] = var.valueLabels


spss.EndDataStep()


##### CREATE DATAFRAMES #####

# convert nested dictionary to Pandas DataFrame
data_list = []
for outer_key, inner_dict in nested_dict_variable.items():
    for inner_key, value in inner_dict.items():
        data_list.append({'Outer Key': outer_key, 'Inner Key': inner_key, 'Value': value})

df = pd.DataFrame(data_list)



# end spss process
spss.StopSPSS()

Solution

  • Like user2357112 has pointed out, the var.valueLabels looks like a dict, but it isn't one.

    I had a quick look in the documentation of this python spss package and it says:

    You can iterate through the set of value labels for a variable using the data property, as in:

    varObj = datasetObj.varlist['origin']
    for val, valLab in varObj.valueLabels.data.iteritems():
       print val, valLab
    

    So you could try rewriting your code:

    data_list = []
    for outer_key, inner_dict in nested_dict_variable.items():
        for inner_key, value in inner_dict.data.iteritems():
            data_list.append({'Outer Key': outer_key, 'Inner Key': inner_key, 'Value': value})
    

    I haven't tried it though. Good luck! ;)