pythonindexinghierarchy

Giving Hierarchy to a list of Titles


I have this list in python

titles = [
    '13.3. Risk',
    '13.3.1. Strategy',
    'SubStrategy',
    '13.3.2. Token',
    'Material',
    'Impact',
    'Aling'
]

And I would like to create hierarchy resulting in something like this:

|_'13.3. Risk'
   |____'13.3.1. Strategy' 
   |     |____'SubStrategy' 
   |____'13.3.2. Token' 
        |____'Material' 
        |____'Impact' 
        |____'Aling'

I have tried regexp, and also the library anytree but I am not getting the desired result. I want to give structure to the list. As a tree or something similar.


Solution

  • I would take two passes at this to reshape your data into the hierarchy. I think that is the simplest thing to follow. In practice, I would probably do this in one pass, but let's do it in two for additional clarity.

    Given:

    titles = [
        "13.3. Risk",
        "13.3.1. Strategy",
        "SubStrategy",
        "13.3.2. Token",
        "Material",
        "Impact",
        "Aling"
    ]
    

    Step 1: Create a dictionary of numbered items and their direct children

    results = {}
    for title in titles:
        if title[0].isnumeric():
            key, name = title.split(" ")
            key = key.strip(".")
            current = results.setdefault(key, {"name": title, "children": []})
            continue
        current["children"].append(title)
    

    This will result in a dictionary that looks like:

    {
        "13.3": {
            "name": "13.3. Risk",
            "children": []
        },
        "13.3.1": {
            "name": "13.3.1. Strategy",
            "children": [
                "SubStrategy"
            ]
        },
        "13.3.2": {
            "name": "13.3.2. Token",
            "children": [
                "Material",
                "Impact",
                "Aling"
            ]
        }
    }
    

    Step 2: Now we can iterate over it and aggregate the root(s) of the tree(s ) by setting the items as children of the appropriate parent. Note that we will assume that root(s) are nodes whose parents cannot be found.

    roots = []
    for key, value in results.items():
        parent_key = key.rsplit(".", 1)[0]
        if parent_key not in results:
            roots.append(value)
            continue
        results[parent_key]["children"].append(value)
    

    Now we can use our root(s):

    for root in roots:
        print("----------------------")
        print(json.dumps(root, indent=4))
        print("----------------------")
    

    Giving us:

    ----------------------
    {
        "name": "13.3. Risk",
        "children": [
            {
                "name": "13.3.1. Strategy",
                "children": [
                    "SubStrategy"
                ]
            },
            {
                "name": "13.3.2. Token",
                "children": [
                    "Material",
                    "Impact",
                    "Aling"
                ]
            }
        ]
    }
    ----------------------