pythondictionaryrecursionnesteddepth-first-search

Get list of all sub-keys from a dictionary


I have some dictionaries(json output). I want to get the base element which can be a list of strings or a string. Currently I am doing it like this:-

folder="shared/"
files=os.listdir('shared')


for f in files:
    f=folder+f
    print(f)
    with open(f) as f:
        data = json.load(f)
    #data is a dict now with sub-keys
    for key,value in data.items():
        if value.keys():
            print(value)
    break

This is the input dictionary that was read by the python code:-

{
  "shortshirt": {
    "ralphlauren": {
      "classic": [
        "That Ralph Lauren classic fit is a timeless look!",
        "Nice choice. Can’t go wrong with Ralph Lauren"
      ]
    }
  },
  "socks": {
    "": {
      "": ["Have to find the right socks to keep your feet cozy"]
    }
  }
}

And this is the output that I am getting:-

{'ralphlauren': {'classic': ['That Ralph Lauren classic fit is a timeless look!', 'Nice choice. Can’t go wrong with Ralph Lauren']}}
{'': {'': ['Have to find the right socks to keep your feet cozy']}}

But this is what I want:-

keys=[["shortshirt","ralphlauren","classic"],["socks"]]

value=[['That Ralph Lauren classic fit is a timeless look!', 'Nice choice. Can’t go wrong with Ralph Lauren'], ['Have to find the right socks to keep your feet cozy']]

But I don't know whether to have 2 or 3 level nested loops. If I have an inner loop and in actual there was no nested key then I get the value error. I want to get all the nested keys in a separate list and the base value or values like on the lowest level in another list, any help regarding this will be highly appreciated.


Solution

  • Generators are useful for this problem. Strategies –

    Code:

    def getitems(obj):
    
      def getkeys(obj, stack):
        for k, v in obj.items():
          k2 = ([k] if k else []) + stack # don't return empty keys
          if v and isinstance(v, dict):
            for c in getkeys(v, k2):
              yield c
          else: # leaf
            yield k2
    
      def getvalues(obj):
        for v in obj.values():
          if not v: continue
          if isinstance(v, dict):
            for c in getvalues(v):
              yield c
          else: # leaf
            yield v if isinstance(v, list) else [v]
    
      return list(getkeys(obj,[])), list(getvalues(obj))
    

    Input:

    {
      "shortshirt": {
        "ralphlauren": {
          "classic": [
            "That Ralph Lauren classic fit is a timeless look!",
            "Nice choice. Can't go wrong with Ralph Lauren"
          ]
        }
      },
      "socks": {
        "": {
          "": ["Have to find the right socks to keep your feet cozy"]
        }
      }
    }
    

    Output:

    # keys
    [['classic', 'ralphlauren', 'shortshirt'], ['socks']]
    
    # values
    [['That Ralph Lauren classic fit is a timeless look!', "Nice choice. Can't go wrong with Ralph Lauren"], ['Have to find the right socks to keep your feet cozy']]