pythonjsonjsonpathjson-query

get list of json paths in python


I'm looking to get list of all possible json paths in a json file - can recommend any one?

Eg: if input is below

{
   "_id":{
      "$oid":""
   },
   "aa":false,
   "bb":false,
   "source":"",
   "email":"",
   "createdAt":{
      "$date":""
   },
   "updatedAt":{
      "$date":""
   },
   "cc":"",
   "vv":"",
   "metadata":{
      "vv":"",
      "xx":[{}]
   }
}

o/p:

obj
obj._id
obj._id.$oid
obj.aa
obj.bb
obj.source
obj.email
obj.createdAt
obj.createdAt.$date
obj.updatedAt
obj.updatedAt.$date
obj.cc
obj.vv
obj.metadata
obj.metadata.vv
obj.metadata.xx
obj.metadata.xx[0]

I'm basically looking. a python version of this: https://www.convertjson.com/json-path-list.htm

I want to build a general solution, if any json file - it will be a single value for schema generation (ie one line in a newline delimeted json) Any suggestions ?


Solution

  • You can do this in a reasonably succinct way with a recursive generator. The string "obj" is a little awkward since it doesn't occur in the data structure. On the other hand, adding it at the end is simple:

    def get_paths(d):
        if isinstance(d, dict):
            for key, value in d.items():
                yield f'.{key}'
                yield from (f'.{key}{p}' for p in get_paths(value))
            
        elif isinstance(d, list):
            for i, value in enumerate(d):
                yield f'[{i}]'
                yield from (f'[{i}]{p}' for p in get_paths(value))
    
    paths = ['obj'+s for s in get_paths(d)]
    

    Gives you paths as a list of strings:

    ['obj._id',
     'obj._id.$oid',
     'obj.aa',
     'obj.bb',
     'obj.source',
     'obj.email',
     'obj.createdAt',
     'obj.createdAt.$date',
     'obj.updatedAt',
     'obj.updatedAt.$date',
     'obj.cc',
     'obj.vv',
     'obj.metadata',
     'obj.metadata.vv',
     'obj.metadata.xx',
     'obj.metadata.xx[0]']
    

    Of course, you can wrap that last step in a function like and accept a root object string:

    def get_paths(d, root="obj"):
        def recur(d):
            if isinstance(d, dict):
                for key, value in d.items():
                    yield f'.{key}'
                    yield from (f'.{key}{p}' for p in get_paths(value))
    
            elif isinstance(d, list):
                for i, value in enumerate(d):
                    yield f'[{i}]'
                    yield from (f'[{i}]{p}' for p in get_paths(value))
    
        return (root + p for p in recur(d))
    
    list(get_paths(d))
    # same result