As we all know, topological sorts are not unique. A graph can have many topological sorts. For example, the graph {A, B, C}, {(A,C)} has exactly three sorts: {A,B,C}, {B,A,C}, and {A,C,B}.
My problem is this, using the NetworkX library in Python, I am trying to select a random topological sort from a directed graph. NetworkX has two functions that deal with topological sorting: nx.topological_sort() and nx.all_topological_sorts(). all_topological_sorts() will return all possible topological sorts. For the graph I mentioned above, it returns all three possible sorts.
import networkx as nx
nodes = ["A", "B", "C"]
G = nx.DiGraph()
G.add_nodes_from(nodes)
G.add_edges_from([("A","C")])
print(list(nx.all_topological_sorts(G)))
nx.topological_sort() will return only a single sort using Kahn's Algorithm. nx.all_topological_sorts() does not use Kahn's algorithm. It Just based on how Kahn is implemented, if I change the order the nodes are inserted in the DiGraph, I change the topological sort Kahn returns. However, if I generated all possible permutations of {A, B, C} and pass that into Kahn's, it only ever returns the topological sort {A, B, C} and {B, A, C}. No permutation returns {A, C, B} despite it being a valid sort. I would like all the possible topological sorts to be equally weighted, but at minimum I do not want one to be completely overlooked. I did some playing around with larger graphs, and in that case thousands of possible topological sorts are completely overlooked.
Now, I could just generate all topological sorts and then select one randomly. That would be equal; however, I am dealing with graphs much larger than three vertices, and that is just practically infeasible for me. I was wondering if there is a more computationally efficient way of getting a random topological sort, or if there some hackery I can do to the Graph to equally weight the possible topological sorts from Kahn's.
I read this answer, however the poster points out that these are not equally weighted. So I expect I will run into the same problem as above. There is also this answer, but many comments disagree that the proposed algorithm is equally weighted. Could someone confirm? If it at all possible topological sorts have a non-zero chance of being selected, that's sufficient. However, I would like to stay within the NetworkX library if I can.
Any insight is greatly appreciated!
This is an implementation of the approach I outlined in my comment above:
import numpy as np
import networkx as nx
def get_random_topological_sort(graph):
"""Compute a random topological sort for a directed graph."""
# Wrapper around _get_random_topological_sort that handles graphs with non-integer nodes.
mapping = dict(zip(sorted(graph), range(len(graph))))
integer_graph = nx.relabel_nodes(graph, mapping=mapping, copy=True)
integer_nodes = _get_random_topological_sort(integer_graph)
inverse_mapping = dict(zip(mapping.values(), mapping.keys()))
return [inverse_mapping[ii] for ii in integer_nodes]
def _get_random_topological_sort(graph):
"""Compute a random topological sort for a directed graph with integer nodes."""
order = np.arange(len(graph))
np.random.shuffle(order)
# compute a random topological sort for each component
ptr = 0
sorted_nodes = np.zeros((len(graph)), dtype=int)
for component in nx.connected_components(graph.to_undirected()):
# re-initialize subgraph with random node order
np.random.shuffle(list(component))
subgraph = nx.DiGraph()
subgraph.add_nodes_from(component)
subgraph.add_edges_from(nx.subgraph(graph, component).edges())
# combine results across components
indices = list(sorted(order[ptr:ptr+len(component)]))
sorted_nodes[indices] = list(nx.topological_sort(subgraph))
ptr += len(component)
return sorted_nodes
if __name__ == "__main__":
# nodes = range(6)
# edges = [
# (0, 1),
# (2, 3),
# (2, 4),
# ]
nodes = ["a", "b", "c", "d", "e", "f"]
edges = [
("a", "b"),
("c", "d"),
("c", "e"),
]
G = nx.DiGraph()
G.add_nodes_from(nodes)
G.add_edges_from(edges)
for ii in range(10):
print(get_random_topological_sort(G))
# ['c', 'd', 'a', 'f', 'e', 'b']
# ['c', 'a', 'b', 'd', 'f', 'e']
# ['a', 'b', 'c', 'd', 'e', 'f']
# ['a', 'f', 'c', 'd', 'e', 'b']
# ['a', 'b', 'f', 'c', 'd', 'e']
# ['a', 'b', 'c', 'd', 'f', 'e']
# ['c', 'a', 'd', 'b', 'e', 'f']
# ['c', 'f', 'a', 'b', 'd', 'e']
# ['a', 'c', 'b', 'd', 'e', 'f']
# ['c', 'd', 'a', 'f', 'b', 'e']