pythongraphnetworkxdepth-first-searchconnected-components

How to find the connected components of a networkx graph? (not using the nx.connected_components() command)


I have created an undirected graph using networkx, I need to find a list of all the connected components.

connected_components = {}
def dfs(node):
    global connected_components, G  
    if node not in connected_components:
        connected_components[node] = set()
        for next in G.adj[node]:
            dfs(next)
            connected_components[node] = connected_components[next]
        connected_components[node].add(node)

for node_ in G:
    dfs(node_)

connected_comp_as_tuples = map(tuple, connected_components.values())
unique_components = set(connected_comp_as_tuples)
CC=list(unique_components)

I've tried using this code but the result is not the same as the one given using the nx.connected_components() command. What am I doing wrong?


Solution

  • connected_components[node] = connected_components[next] is executed also when next is actually the "parent" from which the DFS call came. This way you can lose out of the set of nodes that were already collected in a set that were really "descendants" of the current node.

    For instance, if you have a graph with 4 nodes, like this:

    enter image description here

    and the edges were added in an order such that G.adj looks like this:

    [
        [2],
        [2],
        [3, 1, 0],
        [2]
    ]
    

    ...then the DFS calls will first go via 0, to 2, to 3, then from 2 to 1. At that moment all is OK, and connected_components[2] has {1, 3}. But then the problem arises: from 2 we visit 0. But node 0 is still associated with an empty set, and so node 2 also gets associated with it, losing the connection with the component that has nodes 1 and 3.

    You should just implement the standard algorithm, using a visited marker:

    def find_components(G):
        visited = set()
        
        def dfs(node):
            if node not in visited:
                visited.add(node)
                yield node
                for nxt in G.adj[node]:
                    yield from dfs(nxt)
        
        for node in G:
            if node not in visited:
                yield tuple(dfs(node))  # a single component
    
    connected_components = list(find_components(G))
    print(connected_components)