pythondata-structuresgraphnestedadjacency-list

How to build a nested adjacency list from an adjacency list and a hierarchy?


I have a simple adjacency list representation of a graph like this

{
    1: [2, 3, 4],
    2: [5],
    3: [6, 9],
    4: [3],
    5: [3],
    6: [7, 8],
    7: [],
    8: [],
    9: []
}

which looks like this

enter image description here

But each node also has a "type ancestry" which tells us what "type" of node it is, and this is completely different from the hierarchy present in the adjacency list above. For example, it would tell us that "node 6 is of type 'BA', and all type 'BA' nodes are type 'B'".

For example, consider this ancestry:

{
    1: [], # no ancestry
    2: ['AA', 'A'], # Read this as "2 is a type AA node, and all type AA nodes are type A nodes"
    3: ['B'], # 3 is directly under type B
    4: [],
    5: ['AA', 'A'],
    6: ['BA', 'B'],
    7: ['BA', 'B'],
    8: ['BA', 'B'],
    9: ['BB', 'B']
}

which when visualized would look like this

enter image description here

However, instead of connecting nodes directly as per the adjacency list, we must use their "representative types" when available, where the representative types of the nodes would be the nodes just below the lowest common ancestor of their type ancestries. When visualized with this adjustment, it would look like this

enter image description here

So what I want to produce programmatically is a "hierarchical/nested" adjacency list for such a visualization, which would look like below. The main idea is to introduce a subtree for each key in the adjacency list (along with the edges field), which would in turn contain its own adjacency list and so forth (recursively).

{1: {'edges': ['A', 'B', 4], 'subgraphs': {}},
 4: {'edges': ['B'], 'subgraphs': {}},
 'A': {'edges': ['B'],
       'subgraphs': {'AA': {'edges': [],
                            'subgraphs': {2: {'edges': [5], 'subgraphs': {}},
                                          5: {'edges': [], 'subgraphs': {}}}}}},
 'B': {'edges': [],
       'subgraphs': {3: {'edges': ['BA', 'BB'], 'subgraphs': {}},
                     'BA': {'edges': [],
                            'subgraphs': {6: {'edges': [7, 8], 'subgraphs': {}},
                                          7: {'edges': [], 'subgraphs': {}},
                                          8: {'edges': [], 'subgraphs': {}}}},
                     'BB': {'edges': [],
                            'subgraphs': {9: {'edges': [], 'subgraphs': {}}}}}}}

What is an elegant way of transforming the original adjacency list + the separate "ancestry" map to produce such a data structure?


Solution

  • Basing this off of @trincot's answer (who wanted me to write my own answer instead of editing theirs as the change is non-trivial) with a critical change on how edges are made.

    def transform(adjacency, ancestry):
        # utility function to get the element in the list before the target, if one is present
        def get_element_before(lst, target):
            try:
                index = lst.index(target)
                return lst[index - 1] if index - 1 >= 0 else None
            except ValueError:
                return None
    
        # Finds the lowest common ancestor between 2 paths, assuming that the paths
        # are ordered from bottom to top
        def find_lca(path1, path2):
            lca = None
            for a, b in zip(path1[::-1],  path2[::-1]):
                if a == b:
                    lca = a
                else:
                    break
            return lca
    
        # Given two nodes, it adds a link between the correct pair of "representative" nodes
        # in the newly constructed nested graph
    
        def add_link(node1, node2, nodes):
            ancestry1, ancestry2 = ancestry[node1], ancestry[node2]
            # Special cases when the LCA cannot be found
            if not ancestry1 and not ancestry2:
                nodes[node1]['edges'].append(node2)
            elif not ancestry1:
                nodes[node1]['edges'].append(ancestry2[-1])
            elif not ancestry2:
                nodes[ancestry1[-1]]['edges'].append(node2)
            else:
                # When LCA is likely to be present
                lca = find_lca(ancestry1, ancestry2)
                if not lca:
                    # This can happen if the 2 nodes have completely disjoint hierarchy paths
                    nodes[ancestry1[-1]]['edges'].append(ancestry2[-1])
                    return
        
                # The node just below the LCA in each node serves as the "representative" node of that node in the newly built graph
                representative_node1 = get_element_before(ancestry1, lca)
                representative_node2 = get_element_before(ancestry2, lca)
                # If the two nodes are in the same subtree at the same level, they
                # will act as their own representative nodes
                representative_node1 = node1 if representative_node1 is None else representative_node1
                representative_node2 = node2 if representative_node2 is None else representative_node2
                nodes[representative_node1]['edges'].append(representative_node2)
    
    
        # Create the basic object (dict) for each node:
        nodes = { 
            subgraph: { 'edges': [], 'subgraphs': {} } 
                for node, subgraphs in ancestry.items()
                for subgraph in (node, *subgraphs)
        }
        # populate the "edges" attributes between basic nodes (or their "representative" nodes)
        for node, children in adjacency.items():
            for child in children:
                add_link(node, child, nodes)
        
        # keep track of the nodes that are to stay at the root level
        root = dict(nodes)
        # populate the "subgraphs" attributes
        for node, ancestors in ancestry.items():
            for child, parent in zip((node, *ancestors), ancestors):
                nodes[parent]['subgraphs'][child] = nodes[child]
                root.pop(child, None)
    
        return root
    
    

    Testing it

    adj_list = {
        1: [2, 3, 4],
        2: [5],
        3: [6, 9],
        4: [3],
        5: [3],
        6: [7, 8],
        7: [],
        8: [],
        9: []
    }
    
    ancestry = {
        1: [],
        2: ['AA', 'A'],
        3: ['B'],
        4: [],
        5: ['AA', 'A'],
        6: ['BA', 'B'],
        7: ['BA', 'B'],
        8: ['BA', 'B'],
        9: ['BB', 'B']
    }
    
    pprint.pprint(transform(adj_list, ancestry))
    

    produces

    {1: {'edges': ['A', 'B', 4], 'subgraphs': {}},
     4: {'edges': ['B'], 'subgraphs': {}},
     'A': {'edges': ['B'],
           'subgraphs': {'AA': {'edges': [],
                                'subgraphs': {2: {'edges': [5], 'subgraphs': {}},
                                              5: {'edges': [], 'subgraphs': {}}}}}},
     'B': {'edges': [],
           'subgraphs': {3: {'edges': ['BA', 'BB'], 'subgraphs': {}},
                         'BA': {'edges': [],
                                'subgraphs': {6: {'edges': [7, 8], 'subgraphs': {}},
                                              7: {'edges': [], 'subgraphs': {}},
                                              8: {'edges': [], 'subgraphs': {}}}},
                         'BB': {'edges': [],
                                'subgraphs': {9: {'edges': [], 'subgraphs': {}}}}}}}