pythonalgorithmgraphdirected-graph

The problem of reachability in a directed graph, but all predecessors must be reached to reach a node


The problem (tl;dr)

It's similar to the problem of finding the minimal set of vertices in a directed graph from which all vertices can be reached, except that a node must have all its predecessors reached in order to be reached itself.

More formally, let S be a set of nodes belonging to a directed graph G. A node n of G is said to be reachable from S if and only if n belongs to S or if all predecessors of n are reachable from S.

We'll say that S is a generator of G if all nodes of G are reachable from S.

We can be sure that such a generator always exists for every graph (i.e. the set containing all nodes will always work).

The question is to design an algorithm that returns a minimum-length generator of G, given a graph G.

Example

Let's take this graph.

{1,3,6} is a generator because {1} unlocks 2, {2,3} unlocks 5 and {1,3,6} unlocks 4.

Now, if you take S = {1,5}, nodes 3, 4 and 6 will remain unreached. S is therefore not a generator of this graph.

In fact, we can see that, if S is a generator, every cycle of the graph must contain at least one node in S.

My attempt to design an algorithm (skip this part if it's too long; it's not essential to my question)

I've written an algorithm that returns such (almost minimal?) sets, given a graph G. The idea is, given a node n, to construct a kind of “dependency tree” whose leaves form a set from which n is reachable. To build the dependency tree, we'll need to add the predecessors of n, then all the predecessors of the predecessors, and so on, until no new nodes are discovered.

In this way, the dependency tree will also contain all the dependency trees of the predecessors, and so on. So, for the sake of convenience, we'll return a hash table that associates the discovered nodes with their dependency trees.

Algorithm: DependencyTree
Inputs: G (a graph), n (a node of G)
Outputs: a hash table mapping the nodes discovered to their dependency trees

sub_trees <- empty hash table
sub_trees[n] <- leaf with value n
Q <- an empty queue
enqueue n in Q

While Q is not empty
    node <- dequeue Q
    node_tree <- sub_trees[node]

    For pred belonging to {predecessors of n in G}
        pred_tree <- leaf with value pred
        add pred_tree to children of node_tree
        
        If pred does not belong to sub_tree Then
            sub_tree[pred] <- pred_tree
            enqueue pred in Q
        End If
    End For
End While
       
Return sub_trees

What we're going to do now is take a node of G, make its dependency tree, see if all the nodes appear in it; if so, stop there; if not, take a node that doesn't appear and repeat the operation until all the nodes have been discovered.

The union of the leaves of the different trees will form a generator of G, a set from which all the nodes of G are reachable.

During the process, we will keep a set containing all the roots (the nodes passed as inputs to DependencyTree). In this way, we only need to return the union of the leaves of the root dependency trees instead of the union of the leaves of all the trees.

Algorithm: ConstructGenerator
Inputs: G (a graph)
Outputs: a generator of G

trees <- empty hash table
roots <- empty set

While some node n of G does not belong to trees:
    add n to roots
    n_tree <- DependencyTree(n)
    merge trees with n_tree
End While

generator <- empty set

For root in roots:
    root_leaves <- leaves of tree[root]
    merge generator with root_leaves
End For

Return generator 

That's all very well, but the resulting set will contain a lot of useless nodes. In fact, a node will be useless if it doesn't need to belong to the set in order to be reached, i.e. if the leaves of its dependency tree already belong to the set. However, there is a special case: if a node is a leaf of its own tree, removing the node from the set will prevent the set from reaching it, which is bad.

We can then define a final algorithm to reduce the resulting set:

Algorithm: ReduceSet
Input: tree (a hash table linking nodes to their dependency tree), s (a set of nodes).
Results: the reduced set

reduced <- empty set

For node in s
    node_leaves <- leaves of trees[node]
    If node belongs to node_leaves or node_leaves is not a subset of s Then
        add node to reduced
    End If
End For

Return reduced

Finally, we edit the algorithm ConstructGenerator to return ReduceSet(generator) instead of just generator.

Python implementation

First, we will define a class Node that will be used to represent a tree:

from __future__ import annotations

class Node:
    def __init__(self, value: int, children: set[Node] = None):
        self._value = value

        if not children:
            children = set()

        self._children = children

    @property
    def children(self):
        return self._children

    @property
    def value(self):
        return self._value
    
    def add_child(self, child: Node):
        self._children.add(child)

And we are ready to write the function:

import networkx as nx
from collections import deque

def construct_generator(graph: nx.DiGraph) -> set:
    trees: dict[int, Node] = {}
    roots = set()

    def dependency_tree(start_node: int):
        trees[start_node] = Node(start_node)
        queue = deque([start_node])

        while queue:
            node = queue.popleft()
            node_tree = trees[node]
            
            for pred in graph.predecessors(node):
                pred_tree = Node(pred)
                node_tree.add_child(pred_tree)

                if pred not in trees:
                    trees[pred] = pred_tree
                    queue.append(pred)

    graph_nodes = set(graph.nodes)
    while len(trees) < len(graph): # while there are undiscovered nodes
        node = graph_nodes.pop()
        if node not in trees:
            roots.add(node)
            dependency_tree(node)
            
    leaves: dict[int,set] = {} # a dictionary mapping a node to the leaves of its dependency tree

    def get_leaves(node: Node) -> set: # returns the leaves of node's tree and populates the leaves dictionary at the same time
        node_leaves = set()
        
        if not node.children or len(node.children) == 0:
            node_leaves.add(node.value)
        else:
            for child in node.children:
                node_leaves.update(get_leaves(child))
        
        if node.value not in leaves:
            leaves[node.value] = node_leaves
        return node_leaves

    system = set()
    for root in roots:
        root_leaves = get_leaves(trees[root])
        system.update(root_leaves) # construction of the system of nodes as the union of leaves

    reduced_system = set()
    for node in system:
        node_leaves = leaves[node]
        if node in node_leaves or not node_leaves.issubset(system):
            reduced_system.add(node)

    return reduced_system

If we take the graph mentioned as an example at the beginning:

G = nx.DiGraph()
G.add_edges_from([(1,2),(1,3),(1,4),(2,3),(2,5),(3,4),(3,5),(4,3),(5,1),(6,4)])
print(construct_generator(G)) # will show {1,3,6}

I have found that for large graphs (hundreds of edges), the sets returned by the function are not of the same length, probably because it uses sets that are not ordered and the order of the iterations of the sets can have an impact on the result. However, this means that this method does not return the minimum-length generator (at least not all the times).

Finally

Is there any algorithm which returns a generator of minimal length of a directed graph?


Solution

  • This problem is NP hard. That can be seen by reducing the minimum feedback vertex set problem to your reachability problem.

    Start with any directed graph G. Construct a new graph H by adding a single vertex v with an edge connecting it to every vertex in G.

    Let S be any solution to your reachability problem on H. It is obvious that v must be in S because it has no predecessor, and is a predecessor of everything else. It is also clear that if you take S away from G, it cannot contain any cycles. And therefore S must contain a vertex feedback set for G.

    Conversely, any subset S of H that contains both v and a vertex feedback set for G will solve your reachability problem. (You can prove this by induction on the size of G minus S. Just prove that there is always an element with no predecessor in G, remove that element, then apply the induction hypothesis.)

    Therefore a solver for your minimum reachability problem allows us to solve the minimum feedback vertex problem for arbitrary graphs. Since that problem is NP hard, so is your problem.