algorithmgraphgraph-theoryconnected-componentsunion-find

Convert list of lists into connected components of a graph to find node degrees


I have an input of list of lists (in Python) as the following:

lst = [['1', '2'], ['1', '3', '2'], ['5'], ['4', '6']]

Where each integer string represents a node in the graph, and each list within the list represents connectedness between the nodes. So, in the above example, nodes {1, 2, 3} are all connected, {4, 6} are connected, and {5} is connected. I want to figure out the node degrees of the graph above. So, the output would be the number of edges connected to each node, namely:

node_degrees = {'1': 2, '2': 2, '3': 2, '4': 1, '5': 0, '6': 1}

since 2 nodes connect to node '1', 2 to node '2', and so on. It can be noted duplicates are ignored since all we care about is connected components and node degrees for each node.

In order to minimize the effort of re-counting, I was thinking of an algorithm to go through each list of lists, create a set, then somehow count the size of the set, but this wouldn't work since the pathological case where

lst = [['1', '2'], ['2', '3'], ['3', '4'], ['4', '5']]

which would yield {'1': 1, '2': 2, '3': 2, '4': 2, '5': 1}.

I am not sure how I can minimize the work done - set intersections may work, and I also explored union-find but I believe this is not necessary as all I need to do is iterate through the list of lists, find and keep track in a dictionary the amount of edges connected to a node, and keep updating this if the node was not encountered before. However, the brute-force is O(n^2), which is not ideal.

Is there an algorithm that can efficiently go through this input data format and do this without needing to create unnecessary data structures in the process?

I am also trying to see if this can be modeled in a different way besides a graph algorithm - perhaps somehow seeing an equivalent formulation of set intersections or a more simpler, linked list implementation, but I am not sure how to think of another method besides graphs. Another way to formulate this is to think of associations between the different elements within the list of lists.


Solution

  • Just build the adjacency list for your graph.

    Here in python:

    from itertools import combinations
    
    lst = [['1', '2'], ['1', '3', '2'], ['5'], ['4', '6']]
    
    adj = {}
    for clique in lst:
        for u,v in combinations(clique, 2):
            adj.setdefault(u, set()).add(v)
            adj.setdefault(v, set()).add(u)
        if len(clique) == 1: # a bit overkill, just for lonely ['5']
            adj.setdefault(clique[0], set())
    # adj = {'1': {'2', '3'}, '2': {'1', '3'}, '3': {'2', '1'}, '5': set(), '4': {'6'}, '6': {'4'}}
    
    degrees = {v: len(neighbours) for v,neighbours in adj.items()}
    # degrees = {'1': 2, '2': 2, '3': 2, '5': 0, '4': 1, '6': 1}