pythonpandasdataframedefaultdict

How to convert a defaultdict(list) to Pandas DataFrame


I have a defaultdict(list) object that is of this structure:

{id: [list[list]]}

for example,

'a1': [[0.01, 'cat']],

'a2': [[0.09, 'cat']],

'a3': [[0.5, 'dog']],

...

I'd like to conver this defaultdict(list) into a Pandas DataFrame object.

I tried with the following:

df = pd.DataFrame(list(my_dict.items()), columns=['id', 'category'])

However, I faced a problem with my 'category' column. This is a column of list of list. I'm trying to split out the 2 values in the 'category' into 2 separate columns. So my final DataFrame columns would be ['id', 'score', 'category'].

When I tried with below Apply function:

db['category'].apply(lambda x: x[0][0])

I got an error for 'list index out of range'.

What could be wrong with my code? How shall I create the 2 new columns from a list of lists?

Thank you.


Solution

  • I believe you need:

    df = pd.DataFrame([[k] + v[0] for k, v in my_dict.items()], 
                       columns=['id', 'score', 'category'])
    

    Or:

    df = pd.DataFrame([(k, v[0][0], v[0][1]) for k, v in my_dict.items()], 
                       columns=['id', 'score', 'category'])