Splitting dag dependencies in Python
I tried implementing a script for splitting dag dependencies, my goal is to split the node dependencies so that for each node that has more then one dependency, I would duplicate the node until it has one dependency (the sub dependecies of the duplicated node also need to be duplicated), I tried to use the bread-first search algorithm for that but I'm not sure if it's the right way.
for example for a this dag: [example dag]: (https://i.sstatic.net/WqgJoDwX.png)
I expect to get the following result: (https://i.sstatic.net/829jW9yT.png)
Assume that I'm provided with the following dag dependencies in a dict formant:
dag_dependencies = {
'A': ['B', 'C'],
'B': ['D'],
'C': ['D'],
'D': ['E', 'F'],
'E': [],
'F': [],
}
I expect to get:
expected_dag_dependencies = {
'A': ['B', 'C'],
'B': ['D1'],
'C': ['D2'],
'D1': ['E1', 'F1'],
'D2': ['E2', 'F2'],
'E1': [],
'F1': [],
'E2': [],
'F2': [],
}
how to implement this on python?
You can use a depth-first traversal on each "root" in your DAG, collecting the tree that can be reached from such root.
Then see which node names occur more than once, and then number those while creating the new data structure:
from collections import Counter
def split_dag(dag):
def dfs(node):
counter[node] += 1
return [node, *map(dfs, dag[node])]
def save(tree):
node, *children = tree
if node in dupes:
dupes[node] += 1
node = f"{node}{dupes[node]}"
forest[node] = list(map(save, children))
return node
counter = Counter()
forest = {}
roots = set(dag.keys()).difference(node for neighbors in dag.values()
for node in neighbors)
trees = list(map(dfs, roots))
dupes = { node: 0 for node, freq in counter.items() if freq > 1 }
for tree in trees:
save(tree)
return forest
Here is how you can run it with your example:
dag_dependencies = {
'A': ['B', 'C'],
'B': ['D'],
'C': ['D'],
'D': ['E', 'F'],
'E': [],
'F': [],
}
forest = split_dag(dag_dependencies)
print(forest)