pythonpython-3.xheapq

Sort and select from a complex nested dictionary using heapq


I have the following nested dictionary:

a={'2020-12-08':
        {'navi.o_efx': {'coint_value': 0.923033, 'hl_value': 0.475025},
        'stm.n_efx': {'coint_value': 0.915424, 'hl_value': 0.294162},
        'kioo.o_efx': {'coint_value': 0.92575, 'hl_value': 0.369817}},
   '2020-09-24':
        {'navi.o_qrvo.o': {'coint_value': 0.919749, 'hl_value': 0.215322},
        'qrvo.o_efx': {'coint_value': 0.976447, 'hl_value': 0.11208},
        'navi.o_stm.n': {'coint_value': 0.974414, 'hl_value': 0.168408},
        'qrvo.o_stm.n': {'coint_value': 0.964797, 'hl_value': 0.14407},
        'stm.n_efx': {'coint_value': 0.935519, 'hl_value': 0.166952}},
   '2020-11-01':
       {'qrvo.o_stm.n': {'coint_value': 0.95096, 'hl_value': 0.104152}}
   }

I wanted to use heapq to sort on the basis of 'hl_value' and chose the smallest 2 sub-dictionaries for a particular date. For example, the final output should be as below:

a={'2020-12-08':
        {'stm.n_efx': {'coint_value': 0.915424, 'hl_value': 0.294162},
        'kioo.o_efx': {'coint_value': 0.92575, 'hl_value': 0.369817}},
   '2020-09-24':
        {'qrvo.o_efx': {'coint_value': 0.976447, 'hl_value': 0.11208},
        'qrvo.o_stm.n': {'coint_value': 0.964797, 'hl_value': 0.14407}},
   '2020-11-01':
       {'qrvo.o_stm.n': {'coint_value': 0.95096, 'hl_value': 0.104152}}
   }

I tried using the below code, but doesn't seem to work :

for k, v in a.items():
    for i_k, i_v in v.items():
        a[k][i_k] = dict(heapq.nsmallest(2, i_v.items(), key=i_v['hl_value']))

Solution

  • Here using heapq.nsmallest:

    import heapq
    
    a = {
        '2020-12-08':
            {
                'navi.o_efx': {'coint_value': 0.923033, 'hl_value': 0.475025},
                'stm.n_efx': {'coint_value': 0.915424, 'hl_value': 0.294162},
                'kioo.o_efx': {'coint_value': 0.92575, 'hl_value': 0.369817}
            },
       '2020-09-24':
            {
                'navi.o_qrvo.o': {'coint_value': 0.919749, 'hl_value': 0.215322},
                'qrvo.o_efx': {'coint_value': 0.976447, 'hl_value': 0.11208},
                'navi.o_stm.n': {'coint_value': 0.974414, 'hl_value': 0.168408},
                'qrvo.o_stm.n': {'coint_value': 0.964797, 'hl_value': 0.14407},
                'stm.n_efx': {'coint_value': 0.935519, 'hl_value': 0.166952}
            },
       '2020-11-01':
            {
                'qrvo.o_stm.n': {'coint_value': 0.95096, 'hl_value': 0.104152}
            }
    }
    
    
    for key in a:
        result = heapq.nsmallest(2, a[key].items(), key=lambda value: value[1]['hl_value'])
        a[key] = dict(result)
    
    print(a)
    

    Output

    {'2020-12-08': {'stm.n_efx': {'coint_value': 0.915424, 'hl_value': 0.294162}, 'kioo.o_efx': {'coint_value': 0.92575, 'hl_value': 0.369817}}, '2020-09-24': {'qrvo.o_efx': {'coint_value': 0.976447, 'hl_value': 0.11208}, 'qrvo.o_stm.n': {'coint_value': 0.964797, 'hl_value': 0.14407}}, '2020-11-01': {'qrvo.o_stm.n': {'coint_value': 0.95096, 'hl_value': 0.104152}}}
    

    Output (pretty printed via print(json.dumps(a, indent=4))):

    {
        "2020-12-08": {
            "stm.n_efx": {
                "coint_value": 0.915424,
                "hl_value": 0.294162
            },
            "kioo.o_efx": {
                "coint_value": 0.92575,
                "hl_value": 0.369817
            }
        },
        "2020-09-24": {
            "qrvo.o_efx": {
                "coint_value": 0.976447,
                "hl_value": 0.11208
            },
            "qrvo.o_stm.n": {
                "coint_value": 0.964797,
                "hl_value": 0.14407
            }
        },
        "2020-11-01": {
            "qrvo.o_stm.n": {
                "coint_value": 0.95096,
                "hl_value": 0.104152
            }
        }
    }