scalaclojurefunctional-programminggraph-algorithmtarjans-algorithm

Functional implementation of Tarjan's Strongly Connected Components algorithm


I went ahead and implemented the textbook version of Tarjan's SCC algorithm in Scala. However, I dislike the code - it is very imperative/procedural with lots of mutating states and book-keeping indices. Is there a more "functional" version of the algorithm? I believe imperative versions of algorithms hide the core ideas behind the algorithm unlike the functional versions. I found someone else encountering the same problem with this particular algorithm but I have not been able to translate his Clojure code into idomatic Scala.

Note: If anyone wants to experiment, I have a good setup that generates random graphs and tests your SCC algorithm vs running Floyd-Warshall


Solution

  • The following functional Scala code generates a map that assigns a representative to each node of a graph. Each representative identifies one strongly connected component. The code is based on Tarjan's algorithm for strongly connected components.

    In order to understand the algorithm it might suffice to understand the fold and the contract of the dfs function.

    def scc[T](graph:Map[T,Set[T]]): Map[T,T] = {
      //`dfs` finds all strongly connected components below `node`
      //`path` holds the the depth for all nodes above the current one
      //'sccs' holds the representatives found so far; the accumulator
      def dfs(node: T, path: Map[T,Int], sccs: Map[T,T]): Map[T,T] = {
        //returns the earliest encountered node of both arguments
        //for the case both aren't on the path, `old` is returned
        def shallowerNode(old: T,candidate: T): T = 
          (path.get(old),path.get(candidate)) match {
            case (_,None) => old
            case (None,_) => candidate
            case (Some(dOld),Some(dCand)) =>  if(dCand < dOld) candidate else old
          }
    
        //handle the child nodes
        val children: Set[T] = graph(node)
        //the initially known shallowest back-link is `node` itself
        val (newState,shallowestBackNode) = children.foldLeft((sccs,node)){
          case ((foldedSCCs,shallowest),child) =>
            if(path.contains(child))
              (foldedSCCs, shallowerNode(shallowest,child))
            else {
              val sccWithChildData = dfs(child,path + (node -> path.size),foldedSCCs)
              val shallowestForChild = sccWithChildData(child)
              (sccWithChildData, shallowerNode(shallowest, shallowestForChild))
            }
        }
    
        newState + (node -> shallowestBackNode)
      }
    
      //run the above function, so every node gets visited
      graph.keys.foldLeft(Map[T,T]()){ case (sccs,nextNode) =>
        if(sccs.contains(nextNode))
          sccs
        else
          dfs(nextNode,Map(),sccs)
      }
    }
    

    I've tested the code only on the example graph found on the Wikipedia page.

    Difference to imperative version

    In contrast to the original implementation, my version avoids explicitly unwinding the stack and simply uses a proper (non tail-) recursive function. The stack is represented by a persistent map called path instead. In my first version I used a List as stack; but this was less efficient since it had to be searched for containing elements.

    Efficiency

    The code is rather efficient. For each edge, you have to update and/or access the immutable map path, which costs O(log|N|), for a total of O(|E| log|N|). This is in contrast to O(|E|) achieved by the imperative version.

    Linear Time implementation

    The paper in Chris Okasaki's answer gives a linear time solution in Haskell for finding strongly connected components. Their implementation is based on Kosaraju's Algorithm for finding SCCs, which basically requires two depth-first traversals. The paper's main contribution appears to be a lazy, linear time DFS implementation in Haskell.

    What they require to achieve a linear time solution is having a set with O(1) singleton add and membership test. This is basically the same problem that makes the solution given in this answer have a higher complexity than the imperative solution. They solve it with state-threads in Haskell, which can also be done in Scala (see Scalaz). So if one is willing to make the code rather complicated, it is possible to implement Tarjan's SCC algorithm to a functional O(|E|) version.