haskelldepth-first-search

Can this generic DFS implementation in Haskell be used to detect cycles in an adjacency list?


Consider the following polymorphic DFS function from here:

-- | Depth-first search.
--
-- Generates the list of unique visited states from a
-- given starting state. States are unique up to the
-- characterizing function.
dfsOnN ::
  Ord r =>
  (a -> r)   {- ^ state characterization              -} ->
  (a -> [a]) {- ^ successors function                 -} ->
  [a]        {- ^ initial states                      -} ->
  [a]        {- ^ visited states in depth-first order -}
dfsOnN rep next = loop S.empty
  where
    loop _ [] = []
    loop !seen (x:xs)
      | S.member r seen =     loop seen xs
      | otherwise       = x : loop seen1 (next x ++ xs)
      where
        r     = rep x
        seen1 = S.insert r seen

Is it possible to detect cycles with this? I'm guessing no because of the way we handle the "already-seen" case. If not, is there a way we can generalize it so it retains all of its existing functionality, but can also detect cycles?

For concreteness, let us say we have:

neighbors :: Int -> [Int]
vertices :: [Int]

And we want to call it as dfsOnN id neighbors vertices and somehow use the output to detect cycles.


Solution

  • A DFS that outputs a list discards a lot of information that is learned by a DFS. A "DFS tree" is a more informative output for a DFS. A DFS tree is a spanning tree of the visited nodes, such that edges of the original graph that do not belong to the tree can be classified as either back edges or cross edges depending on whether they form a directed cycle with edges of the tree (the linked Wikipedia article also defines a class forward edges but here we count them as cross edges for simplicity).

    In the case of a DFS with multiple starting vertices, you get a forest of DFS trees.

    data DfsTree a
        = Tree a [DfsTree a]
        | BackEdge a   -- ancestor in the tree
        | CrossEdge a  -- already visited but not an ancestor in the tree
        deriving (Show)
    

    To construct a DFS tree (or forest), we must keep track of (1) previously visited vertices, and (2) ancestors of the current vertex (which is a subset of visited). The main difference between visited and ancestors is that the inner recursive function dfsFrom returns the updated visited, whereas ancestors is discarded so its changes don't persist across children of each vertex.

    dfs :: Ord r => (a -> r) -> (a -> [a]) -> [a] -> [DfsTree a]
    dfs rep next initial = forest
      where
        (_s, forest) = dfsForestFrom Set.empty Set.empty initial
        -- Definition using mapAccumL:
        -- dfsForestFrom ancestors visited = mapAccumL (dfsFrom ancestors) visited
        -- or use the State monad.
        dfsForestFrom ancestors visited [] = (visited, [])
        dfsForestFrom ancestors visited (a : as) = (visited2, tree : trees)
          where
            (visited1, tree) = dfsFrom ancestors visited a
            (visited2, trees) = dfsForestFrom ancestors visited1 as
        dfsFrom ancestors visited a
            | ra `Set.member` ancestors = (visited, BackEdge a)
            | ra `Set.member` visited = (visited, CrossEdge a)
            | otherwise =
                let (visited', branches) = dfsForestFrom (Set.insert ra ancestors) (Set.insert ra visited) (next a) in
                (visited', Tree a branches)
            where ra = rep a
    

    Many operations can then be defined by structural recursion on the DFS tree. dfs guarantees that each reachable vertex appears exactly once in a Tree constructor, that the list of branches in Tree x branches has as many branches as the adjacency list next x, and that every edge corresponds to one constructor (Tree, BackEdge, or CrossEdge). In other words, a DFS tree is a "cycle-free" representation of the graph.

    For example, the code below implements cycle finding and connected component enumeration (only for undirected graphs (where the adjacency matrix is symmetric); connected components are trickier to compute for directed graphs, but it is still possible to do so without modifying dfs).

    A cycle is simply indicated by the presence of a back edge. To make the exercise a bit more interesting, findCycle also returns the vertices in the cycle.

    findCycle :: Ord r => (a -> r) -> (a -> [a]) -> [a] -> Maybe [a]
    findCycle rep next initial = findMaybe (findCycleTree rep []) forest
        where
            forest = dfs rep next initial
    
    findCycleTree :: Eq r => (a -> r) -> [a] -> DfsTree a -> Maybe [a]
    findCycleTree rep ancestors (Tree a branches) = findMaybe (findCycleTree rep (a : ancestors)) branches
    findCycleTree rep ancestors (BackEdge a) = Just (a : reverse (takeWhile (\a' -> rep a /= rep a') ancestors))
    findCycleTree _ _ (CrossEdge _) = Nothing
    
    findMaybe :: (a -> Maybe b) -> [a] -> Maybe b
    findMaybe f = foldr ((<|>) . f) Nothing
    

    Connected components in an undirected graph simply correspond to each tree of the DFS forest. connected just flattens the trees into lists, ignoring back edges and cross edges.

    -- Assumes undirected graph: x ∈ next y <-> y ∈ next x
    connected :: Ord r => (a -> r) -> (a -> [a]) -> [a] -> [[a]]
    connected rep next initial = [c | c@(_ : _) <- collectComponent <$> forest]
        where
            forest = dfs rep next initial
            collectComponent (Tree a branches) = a : (branches >>= collectComponent)
            collectComponent (BackEdge _) = []
            collectComponent (CrossEdge _) = []
    

    Full code with test cases:

    import Control.Applicative ((<|>))
    import Data.Traversable (mapAccumL)
    import qualified Data.Map as Map
    import Data.Map (Map)
    import qualified Data.Set as Set
    
    -- General DFS, outputs a DFS forest
    -- Reference: https://en.wikipedia.org/wiki/Depth-first_search#Output_of_a_depth-first_search
    
    data DfsTree a
        = Tree a [DfsTree a]
        | BackEdge a
        | CrossEdge a
        deriving (Show)
    
    dfs :: Ord r => (a -> r) -> (a -> [a]) -> [a] -> [DfsTree a]
    dfs rep next initial = forest
      where
        (_s, forest) = dfsForestFrom Set.empty Set.empty initial
        -- Definition using mapAccumL:
        -- dfsForestFrom ancestors visited = mapAccumL (dfsFrom ancestors) visited
        -- or use the State monad.
        dfsForestFrom ancestors visited [] = (visited, [])
        dfsForestFrom ancestors visited (a : as) = (visited2, tree : trees)
          where
            (visited1, tree) = dfsFrom ancestors visited a
            (visited2, trees) = dfsForestFrom ancestors visited1 as
        dfsFrom ancestors visited a
            | ra `Set.member` ancestors = (visited, BackEdge a)
            | ra `Set.member` visited = (visited, CrossEdge a)
            | otherwise =
                let (visited', branches) = dfsForestFrom (Set.insert ra ancestors) (Set.insert ra visited) (next a) in
                (visited', Tree a branches)
            where ra = rep a
    
    -- Finding cycles
    
    findCycle :: Ord r => (a -> r) -> (a -> [a]) -> [a] -> Maybe [a]
    findCycle rep next initial = findMaybe (findCycleTree rep []) forest
        where
            forest = dfs rep next initial
    
    findCycleTree :: Eq r => (a -> r) -> [a] -> DfsTree a -> Maybe [a]
    findCycleTree rep ancestors (Tree a branches) = findMaybe (findCycleTree rep (a : ancestors)) branches
    findCycleTree rep ancestors (BackEdge a) = Just (a : reverse (takeWhile (\a' -> rep a /= rep a') ancestors))
    findCycleTree _ _ (CrossEdge _) = Nothing
    
    findMaybe :: (a -> Maybe b) -> [a] -> Maybe b
    findMaybe f = foldr ((<|>) . f) Nothing
    
    exampleCycle :: Maybe [Int]
    exampleCycle = findCycle rep next initial
        where
            rep = id
            next 0 = [1]
            next 1 = []
            next 2 = [3,5]
            next 3 = [4]
            next 4 = []
            next 5 = [6,7]
            next 6 = [2]
            next _ = []
            initial = [0,2]
    
    -- Finding connected components
    
    -- Assumes undirected graph: x ∈ next y <-> y ∈ next x
    connected :: Ord r => (a -> r) -> (a -> [a]) -> [a] -> [[a]]
    connected rep next initial = [c | c@(_ : _) <- collectComponent <$> forest]
        where
            forest = dfs rep next initial
            collectComponent (Tree a branches) = a : (branches >>= collectComponent)
            collectComponent (BackEdge _) = []
            collectComponent (CrossEdge _) = []
    
    exampleConnected :: [[Int]]
    exampleConnected = connected rep next initial
        where
            rep = id
            next 0 = [1]
            next 1 = [0]
            next 2 = [3,4]
            next 3 = [2,4]
            next 4 = [2,3]
            next 5 = [6]
            next 6 = [5]
            next _ = []
            initial = [0,2,5]
    
    main :: IO ()
    main = do
        print exampleCycle      -- cycle: [2,5,6]
        print exampleConnected  -- connected components: [[0,1],[2,3,4],[5,6]]