pythondictionaryjagged-arraysawkward-array

filtered elements in jagged list in nested dictionary using awkward


I have a large nested dictionary with jagged list 'c' :

x = {'first_block': 
     {'unit1': {'a': (3,5,4), 'b': 23, 'c': [10]}, 
      'unit2': {'a': (5,8,7), 'b': 15, 'c': [20,10]}, 
      'unit10k': {'a': (2,4,9), 'b': 10, 'c': [6,10,20,5]}},
     
      'second_block': 
       {'unit1' : {'a': (8,20,14), 'b': 10, 'c': [17,12,9]}, 
        'unit2' : {'a': (9,25,50), 'b': 15, 'c': [17,15,9,4,12]}, 
        'unit12k': {'a': (12,24,9), 'b': 23, 'c': [12,22,15,4]}},
     
      'millionth_block': 
      {'unit1': {'a': (35,64,85), 'b': 64, 'c': [50]}, 
       'unit2': {'a': (56,23,34), 'b': 55, 'c': [89,59,77]},
       'unit5k': {'a': (90,28,12), 'b': 85, 'c': [48,90,27,59]}}}  

The elements of 'c' are point labels.

For every unique point label in 'c' I want to produce a filtered list of the corresponding value in 'b',

so for example 'first_block' has unique elements in 'c' of: 5, 6, 10, 20

and i want to obtain/extract the following lists for each 'block', to list each value of 'b' associated with a specific value in 'c' e.g.

first_block:
5: [10]
6: [10]
10: [10,15,23]
20: [10,15]
second_block:
4: [15,23]
9: [10,15]
12: [10,15,23]
15: [15,23]
17: [10,15]
22: [23]
etc.

Any thoughts on how to create this outcome given that 'c' is jagged?

Have been trying to do this by converting to Awkward arrays but documentation is currently sparse, and really don't understand how to do this in Awkward.

Also open to pythonic suggestions which don't involve Awkward


Solution

  • Try this, it reproduces exactly what you want (including sorting)

    x = {'first_block': 
         {'unit1': {'a': (3,5,4), 'b': 23, 'c': [10]}, 
          'unit2': {'a': (5,8,7), 'b': 15, 'c': [20,10]}, 
          'unit10k': {'a': (2,4,9), 'b': 10, 'c': [6,10,20,5]}},
         
          'second_block': 
           {'unit1' : {'a': (8,20,14), 'b': 10, 'c': [17,12,9]}, 
            'unit2' : {'a': (9,25,50), 'b': 15, 'c': [17,15,9,4,12]}, 
            'unit12k': {'a': (12,24,9), 'b': 23, 'c': [12,22,15,4]}},
         
          'millionth_block': 
          {'unit1': {'a': (35,64,85), 'b': 64, 'c': [50]}, 
           'unit2': {'a': (56,23,34), 'b': 55, 'c': [89,59,77]},
           'unit5k': {'a': (90,28,12), 'b': 85, 'c': [48,90,27,59]}}}  
    
    results = {}
    
    for key in x.keys(): # Block level key
        results[key] = {}
    
        for unit in x[key].keys(): # Unit level key in subdict
            for value in x[key][unit]['c']: #List of values in c
                if value not in results[key].keys():
                    #You assign a c level key, create a list
                    results[key][value] = []
    
                #And append values from b
                results[key][value].append(x[key][unit]['b'])
    
        #You sort your dict by key/item
        results[key] = dict(sorted(results[key].items()))
    
    for key in results:
        print (key)
        for value in results[key].keys():
            print (value,results[key][value])
    

    Output:

    first_block
    5 [10]
    6 [10]
    10 [23, 15, 10]
    20 [15, 10]
    second_block
    4 [15, 23]
    9 [10, 15]
    12 [10, 15, 23]
    15 [15, 23]
    17 [10, 15]
    22 [23]
    millionth_block
    27 [85]
    48 [85]
    50 [64]
    59 [55, 85]
    77 [55]
    89 [55]
    90 [85]