pythonjsonjson-flattener

Python - Method needed to flatten highly nested json, i.e. "class.properties.name.properties.firstname"


I am needing to take a highly nested json file (i.e. Elasticsearch mapping for an index) and produce a list of items.
Example Elasticsearch Mapping:

{
    "mappings": {
        "properties": {
            "class": {
                "properties": {
                    "name": {
                        "properties": {
                            "firstname": {
                                "type": "text"
                            },
                            "lastname": {
                                "type": "text"
                            }
                        }
                    },
                    "age": {
                        "type": "text "
                    }
                }
            }
        }
    }
}

Example Desired Result:

["mappings.properties.class.properties.name.properties.firstname",
 "mappings.properties.class.properties.name.properties.lastname",
 "mappings.properties.class.properties.age"]

I pandas.json_normalize() doesn't quite do what I want. Neither does glom()


Solution

  • You should be able to make a fairly short recursive generator to do this. I'm assuming you want all the keys until you see a dict with type in it:

    d = {
        "mappings": {
            "properties": {
                "class": {
                    "properties": {
                        "name": {
                            "properties": {
                                "firstname": {
                                    "type": "text"
                                },
                                "lastname": {
                                    "type": "text"
                                }
                            }
                        },
                        "age": {
                            "type": "text "
                        }
                    }
                }
            }
        }
    }
    
    def all_keys(d, path=None):
        if path is None:
            path = []
        if not isinstance(d, dict) or 'type' in d:
            yield '.'.join(path)
            return
        for k, v in d.items():
            yield from all_keys(v, path + [k])
    
    list(all_keys(d))
    

    Which gives:

    ['mappings.properties.class.properties.name.properties.firstname',
     'mappings.properties.class.properties.name.properties.lastname',
     'mappings.properties.class.properties.age']