pythonrecursive-datastructures

get values of unknown hierarchy of lists and dicts


So lets say I have a bunch of data that is not really known how is structured except that it is a combination of lists, dictionaries and string values. And I would like to extract only the string values (so values of a list, and values in dict and plain string values) and store them in a list.

So it could be:

d = {
    'key1': {
        'key2': {
           'key3' [
              {'key4': 'val1', 'key5': 'val2'}, {'key6': 'val3', 'key7': 'val4'}, 
              {'key8': 'val5', 'key9': 'val6'}, {'key10': 'val7', 'key11': 'val8'},
              'val9',                
           ]
        }, 
        'key12': 'val10'
    }
}

Or even with another list under the lowest dict. I have the following help functions I find useful to flatten a nested list and to traverse a dict. Is there some nice way of accomplishing this? recursively perhaps?

def traverse(value, key=None):
    if isinstance(value, dict):
        for k, v in value.items():
            yield from traverse(v, k)
    else:
        yield key, value

def flatten(_2d_list):
    flat_list = []
    for element in _2d_list:
        if type(element) is list:
            for item in element:
                flat_list.append(item)
        else:
            flat_list.append(element)
    return flat_list

Solution

  • You can use recursion to traverse through your data and then based on the type of the element, you can yield string values from them. For dicts, since you want just the string values, you don't need to pass the keys to your function. Here's a simple recursive generator to yield string values from your data -

    def traverse(value):
        if isinstance(value, dict):
            for v in value.values():
                yield from traverse(v)
        elif isinstance(value, list):
            for v in value:
                yield from traverse(v)
        elif isinstance(value, str):
            yield value
    
    d = {
        'key1': {
            'key2': {
               'key3': [
                  {'key4': 'val1', 'key5': 'val2'}, {'key6': 'val3', 'key7': 'val4'}, 
                  {'key8': 'val5', 'key9': 'val6'}, {'key10': 'val7', 'key11': 'val8'},
                  'val9',                
               ]
            }, 
            'key12': 'val10'
        }
    }
    
    print(list(traverse(d)))
    

    Output:

    ['val1', 'val2', 'val3', 'val4', 'val5', 'val6', 'val7', 'val8', 'val9', 'val10']