pythonalgorithmgraphkosaraju-algorithm

Kosaraju’s algorithm for scc


Can anyone explain me the logic behind Kosaraju’s algorithm for finding connected component?

I have read the description, though I can't understand how the DFS on reversed graph can detect the number of strongly connected components.

def dfs(visited, stack, adj, x):
    visited[x] = 1

    for neighbor in adj[x]:
        if (visited[neighbor]==0):
            dfs(visited, stack, adj, neighbor)

    stack.insert(0, x)
    return stack, visited

def reverse_dfs(visited, adj, x, cc):
    visited[x] = 1

    for neighbor in adj[x]:
        if (visited[neighbor]==0):
            cc += 1
            reverse_dfs(visited, adj, neighbor,cc)
    print(x)
    return cc


def reverse_graph(adj):
    vertex_num = len(adj)
    new_adj = [ [] for _ in range(vertex_num)]
    for i in range(vertex_num):
        for j in adj[i]:
            new_adj[j].append(i)
    return new_adj


def find_post_order(adj):
    vertex_num = len(adj)
    visited = [0] * vertex_num
    stack = []

    for vertex in range(vertex_num):
        if visited[vertex] == 0:
            stack, visited = dfs(visited, stack, adj, vertex)

    return stack


def number_of_strongly_connected_components(adj):
    vertex_num = len(adj)
    new_adj = reverse_graph(adj)
    stack = find_post_order(adj)

    visited = [0] * vertex_num
    cc_num = 0
    while (stack):
        vertex = stack.pop(0)
        print(vertex)
        print('------')
        if visited[vertex] == 0:
            cc_num = reverse_dfs(visited, new_adj, vertex, cc_num)
    return cc_num

if __name__ == '__main__':
    input = sys.stdin.read()
    data = list(map(int, input.split()))
    n, m = data[0:2]
    data = data[2:]
    edges = list(zip(data[0:(2 * m):2], data[1:(2 * m):2]))
    adj = [[] for _ in range(n)]
    for (a, b) in edges:
        adj[a - 1].append(b - 1)
    print(number_of_strongly_connected_components(adj))

Above you can find my implementation. I guess that initial DFS and reverse operation are done correctly, but I can't get how to perform the second DFS properly. Thanks in advance.


Solution

  • The first thing that you should notice is that the set of strongly connected components is the same for a graph and its reverse. In fact, the algorithm actually finds the set of strongly connected components in the reversed graph, not the original (but it's alright, because both graphs have the same SCC).

    The first DFS execution results in a stack that holds the vertices in a particular order such that when the second DFS is executed on the vertices in this order (on the reversed graph) then it computes SCC correctly. So the whole purpose of running the first DFS is to find an ordering of the graph vertices that serves the execution of the DFS algorithm on the reversed graph. Now I'll explain what is the property that the order of vertices in the stack have and how it serves the execution of DFS on the reversed graph.

    So what is the property of the stack? Imagine S1 and S2 are two strongly connected components in the graph, and that in the reversed graph, S2 is reachable from S1. Obviously, S1 cannot be reachable from S2 as well, because if that was the case, S1 and S2 would collapsed into a single component. Let x be the top vertex among the vertices in S1 (that is, all other vertices in S1 are below x in the stack). Similarly, let y be the top vertex among the vertices in S2 (again, all other vertices in S2 are below y in the stack). The property is that y (which belongs to S2) is above x (which belongs to S1) in the stack.

    Why is this property helpful? When we execute DFS on the reversed graph, we do it according to the stack order. In particular, we explore y before we explore x. When exploring y we explore every vertex of S2, and since no vertex of S1 is reachable from S2 we explore all the vertices of S2 before we explore any vertex of S1. But this holds for any pair of connected components such that one is reachable from the other. In particular, the vertex at the top of the stack belongs to a sink component, and after we're done exploring that sink component, the top vertex again belongs to a sink relative to the graph induced by the not-yet-explored vertices.

    Therefore, the algorithm correctly computes all the strongly connected components of the reversed graph, which, as aforesaid, are the same as in the original graph.