pythonpandasdataframedictionarynested

Convert nested python dict to a pandas dataframe


I have weather data in a python dict that I'm trying to convert to a pandas df. (From there I'll load it into SQLServer but I've got that part working)

my_dict = {
'data.outdoor.temperature': {'unit': '℃', 'list': 
{'datetime.datetime(2025, 4, 23, 10, 0): 22.3', 
'datetime.datetime(2025, 4, 23, 14, 0): 21.3', 
'datetime.datetime(2025, 4, 23, 18, 0): 18.2', }}, 
'data.indoor.temperature': {'unit': '℃', 'list': 
{'datetime.datetime(2025, 4, 23, 10, 0): 23.2', 
'datetime.datetime(2025, 4, 23, 14, 0): 23.5', 
'datetime.datetime(2025, 4, 23, 18, 0): 22.9'}}}

The output I'm after is:

outdoor.temperature | 2025_04_20 | 14 | 28.1 | ℃  
outdoor.temperature | 2025_04_20 | 18 | 23.8 | ℃  
.....etc  
indoor.temperature | 2025_04_20 | 14 | 23.5 | ℃  
indoor.temperature | 2025_04_20 | 18 | 23.8 | ℃  
.....etc

The list of datetime & temperatures could be 100s of records

I've tried from_dict, concat / unstack, normalize, flatten and combinations of the lot but I'm beaten


Solution

  • I changed your dictionary constructor to construct date time objects rather than have a set of strings with form datetime.datetime(#, #, #, #, #): ##.#, as I interpreted it to be in dictionary form:

    my_dict1 = {'data.outdoor.temperature': {'unit': '℃', 'list': {datetime.datetime(2025, 4, 23, 10, 0): '22.3', datetime.datetime(2025, 4, 23, 14, 0): '21.3', datetime.datetime(2025, 4, 23, 18, 0): '18.2', }}, 'data.indoor.temperature': {'unit': '℃', 'list': {datetime.datetime(2025, 4, 23, 10, 0): '23.2', datetime.datetime(2025, 4, 23, 14, 0): '23.5', datetime.datetime(2025, 4, 23, 18, 0): '22.9'}}}
    

    Afterwards, I constructed a dictionary representation, then appended the unit column as an index, turned the list column into a Series (unpacking it), before stacking the whole thing and resetting the index.

    pd.DataFrame.from_dict(my_dict1, orient='index').set_index('unit', append=True)['list'].apply(pd.Series).stack().reset_index()
    

    This yielded the following:

                        level_0 unit             level_2     0
    0  data.outdoor.temperature    ℃ 2025-04-23 10:00:00  22.3
    1  data.outdoor.temperature    ℃ 2025-04-23 14:00:00  21.3
    2  data.outdoor.temperature    ℃ 2025-04-23 18:00:00  18.2
    3   data.indoor.temperature    ℃ 2025-04-23 10:00:00  23.2
    4   data.indoor.temperature    ℃ 2025-04-23 14:00:00  23.5
    5   data.indoor.temperature    ℃ 2025-04-23 18:00:00  22.9
    

    (It might appear misaligned because the ℃ character is wider than the standard monospaced character.) You can then rename columns by assigning to df.columns. I chose df.columns = ['where', 'unit', 'date', 'temp'].

    You can clean up where pretty simply: df['where'] = df['where'].str.replace(r'^data\.', '', regex=True).

    The constructor also gives temp as a string, I would do df['temp'] = df['temp'].pipe(pd.to_numeric) to make it numeric.