It's similar to the problem of finding the minimal set of vertices in a directed graph from which all vertices can be reached, except that a node must have all its predecessors reached in order to be reached itself.
More formally, let S be a set of nodes belonging to a directed graph G. A node n of G is said to be reachable from S if and only if n belongs to S or if all predecessors of n are reachable from S.
We'll say that S is a generator of G if all nodes of G are reachable from S.
We can be sure that such a generator always exists for every graph (i.e. the set containing all nodes will always work).
The question is to design an algorithm that returns a minimum-length generator of G, given a graph G.
Let's take this graph.
{1,3,6} is a generator because {1} unlocks 2, {2,3} unlocks 5 and {1,3,6} unlocks 4.
Now, if you take S = {1,5}, nodes 3, 4 and 6 will remain unreached. S is therefore not a generator of this graph.
In fact, we can see that, if S is a generator, every cycle of the graph must contain at least one node in S.
I've written an algorithm that returns such (almost minimal?) sets, given a graph G. The idea is, given a node n, to construct a kind of “dependency tree” whose leaves form a set from which n is reachable. To build the dependency tree, we'll need to add the predecessors of n, then all the predecessors of the predecessors, and so on, until no new nodes are discovered.
In this way, the dependency tree will also contain all the dependency trees of the predecessors, and so on. So, for the sake of convenience, we'll return a hash table that associates the discovered nodes with their dependency trees.
Algorithm: DependencyTree
Inputs: G (a graph), n (a node of G)
Outputs: a hash table mapping the nodes discovered to their dependency trees
sub_trees <- empty hash table
sub_trees[n] <- leaf with value n
Q <- an empty queue
enqueue n in Q
While Q is not empty
node <- dequeue Q
node_tree <- sub_trees[node]
For pred belonging to {predecessors of n in G}
pred_tree <- leaf with value pred
add pred_tree to children of node_tree
If pred does not belong to sub_tree Then
sub_tree[pred] <- pred_tree
enqueue pred in Q
End If
End For
End While
Return sub_trees
What we're going to do now is take a node of G, make its dependency tree, see if all the nodes appear in it; if so, stop there; if not, take a node that doesn't appear and repeat the operation until all the nodes have been discovered.
The union of the leaves of the different trees will form a generator of G, a set from which all the nodes of G are reachable.
During the process, we will keep a set containing all the roots (the nodes passed as inputs to DependencyTree
). In this way, we only need to return the union of the leaves of the root dependency trees instead of the union of the leaves of all the trees.
Algorithm: ConstructGenerator
Inputs: G (a graph)
Outputs: a generator of G
trees <- empty hash table
roots <- empty set
While some node n of G does not belong to trees:
add n to roots
n_tree <- DependencyTree(n)
merge trees with n_tree
End While
generator <- empty set
For root in roots:
root_leaves <- leaves of tree[root]
merge generator with root_leaves
End For
Return generator
That's all very well, but the resulting set will contain a lot of useless nodes. In fact, a node will be useless if it doesn't need to belong to the set in order to be reached, i.e. if the leaves of its dependency tree already belong to the set. However, there is a special case: if a node is a leaf of its own tree, removing the node from the set will prevent the set from reaching it, which is bad.
We can then define a final algorithm to reduce the resulting set:
Algorithm: ReduceSet
Input: tree (a hash table linking nodes to their dependency tree), s (a set of nodes).
Results: the reduced set
reduced <- empty set
For node in s
node_leaves <- leaves of trees[node]
If node belongs to node_leaves or node_leaves is not a subset of s Then
add node to reduced
End If
End For
Return reduced
Finally, we edit the algorithm ConstructGenerator
to return ReduceSet(generator)
instead of just generator
.
First, we will define a class Node
that will be used to represent a tree:
from __future__ import annotations
class Node:
def __init__(self, value: int, children: set[Node] = None):
self._value = value
if not children:
children = set()
self._children = children
@property
def children(self):
return self._children
@property
def value(self):
return self._value
def add_child(self, child: Node):
self._children.add(child)
And we are ready to write the function:
import networkx as nx
from collections import deque
def construct_generator(graph: nx.DiGraph) -> set:
trees: dict[int, Node] = {}
roots = set()
def dependency_tree(start_node: int):
trees[start_node] = Node(start_node)
queue = deque([start_node])
while queue:
node = queue.popleft()
node_tree = trees[node]
for pred in graph.predecessors(node):
pred_tree = Node(pred)
node_tree.add_child(pred_tree)
if pred not in trees:
trees[pred] = pred_tree
queue.append(pred)
graph_nodes = set(graph.nodes)
while len(trees) < len(graph): # while there are undiscovered nodes
node = graph_nodes.pop()
if node not in trees:
roots.add(node)
dependency_tree(node)
leaves: dict[int,set] = {} # a dictionary mapping a node to the leaves of its dependency tree
def get_leaves(node: Node) -> set: # returns the leaves of node's tree and populates the leaves dictionary at the same time
node_leaves = set()
if not node.children or len(node.children) == 0:
node_leaves.add(node.value)
else:
for child in node.children:
node_leaves.update(get_leaves(child))
if node.value not in leaves:
leaves[node.value] = node_leaves
return node_leaves
system = set()
for root in roots:
root_leaves = get_leaves(trees[root])
system.update(root_leaves) # construction of the system of nodes as the union of leaves
reduced_system = set()
for node in system:
node_leaves = leaves[node]
if node in node_leaves or not node_leaves.issubset(system):
reduced_system.add(node)
return reduced_system
If we take the graph mentioned as an example at the beginning:
G = nx.DiGraph()
G.add_edges_from([(1,2),(1,3),(1,4),(2,3),(2,5),(3,4),(3,5),(4,3),(5,1),(6,4)])
print(construct_generator(G)) # will show {1,3,6}
I have found that for large graphs (hundreds of edges), the sets returned by the function are not of the same length, probably because it uses sets that are not ordered and the order of the iterations of the sets can have an impact on the result. However, this means that this method does not return the minimum-length generator (at least not all the times).
Is there any algorithm which returns a generator of minimal length of a directed graph?
This problem is NP hard. That can be seen by reducing the minimum feedback vertex set problem to your reachability problem.
Start with any directed graph G
. Construct a new graph H
by adding a single vertex v
with an edge connecting it to every vertex in G
.
Let S
be any solution to your reachability problem on H
. It is obvious that v
must be in S
because it has no predecessor, and is a predecessor of everything else. It is also clear that if you take S
away from G
, it cannot contain any cycles. And therefore S
must contain a vertex feedback set for G
.
Conversely, any subset S
of H
that contains both v
and a vertex feedback set for G
will solve your reachability problem. (You can prove this by induction on the size of G
minus S
. Just prove that there is always an element with no predecessor in G
, remove that element, then apply the induction hypothesis.)
Therefore a solver for your minimum reachability problem allows us to solve the minimum feedback vertex problem for arbitrary graphs. Since that problem is NP hard, so is your problem.