I am trying to get from an (potentially) unordered edge list
let edge_list: Vec<(i32, i32)> = vec![(2, 4), (1, 2), (1, 3), (2, 5), (3, 6), (3, 7), (4, 8)];
to a "json tree"
Node {
id: 1,
children: vec![
Node {
id: 2,
children: vec![
Node {
id: 4,
children: vec![
Node {
id: 8,
children: vec![],
}
]
},
Node {
id: 5,
children: vec![],
}
]
},
Node {
id: 3,
children: vec![
Node {
id: 6,
children: vec![],
},
Node {
id: 7,
children: vec![],
}
]
}
]
}
before serialization
struct Node {
id: i32,
children: Vec<Node>
}
I've tried a bunch of stuff but they end up being extremely inefficient. Can anyone help out?
I guess you are talking about huge trees, because for the size of your example, any algorithm is sufficient ;)
But for huge numbers, the only thing that really matters is asymptotic time complexity, normally expressed in Big-O notation.
It's kinda complicated to calculate, so I won't go into detail. Just know that if you manage to write an algorithm that manages to get in a smaller complexity class, you expect speedups in the order of thousands, depending on your problem size.
Just as a quick rundown, those are the most important complexity classes in computer science:
O(1
) < O(log(n)
) < O(n
) < O(n*log(n)
) < O(n^2
) < O(n^k
) < O(k^n
)
1
) means that the calculation always takes the same duration, no matter how large the input isk^n
) means it scales so badly that usually you can't compute more than 10-11 itemsn*log(n)
.Now let's see how we can solve the problem.
The simplest idea is to have a bunch of trees and for every element, search the child and the parent and connect them. That means for every item (n
) we need to search through the already existing trees (n
, because we have no ordering in the trees) and then search for the parent (n
) and connect them (1
). That would give us a complexity of n^2
, which is terrible.
The best idea I could come up with is following:
n
) and inserting each of it in a HashMap
(1
, amortized); bringing this step to a complexity of n
.HashSet
(n
), collect all children (n
) and then take the difference (n
). Every node that is a parent but not a child is a root node. This step has a complexity of n + n + n
, which is still n
.n
) to know the order in which we have to create children, and then create a node for each (1
) followed by adding it to its parent (1
). This step therefore has the complexity n
.Those steps together have the complexity n + n + n
, which is the complexity class of n
. Meaning, if we compare them at 1000 items, this algorithm should be roughly 1000x times faster than the primitive n^2
algorithm.
Here is some code with it:
use std::collections::{HashMap, HashSet};
#[derive(Debug)]
pub struct Node {
pub id: i32,
pub children: Vec<Node>,
}
fn build_children_map(edge_list: &[(i32, i32)]) -> HashMap<i32, Vec<i32>> {
let mut children_map: HashMap<i32, Vec<i32>> = HashMap::new();
for &(parent, child) in edge_list {
children_map.entry(parent).or_default().push(child);
}
children_map
}
fn find_roots(edge_list: &[(i32, i32)]) -> Vec<i32> {
let parents = edge_list
.iter()
.map(|(parent, _)| parent)
.cloned()
.collect::<HashSet<_>>();
let children = edge_list
.iter()
.map(|(_, child)| child)
.cloned()
.collect::<HashSet<_>>();
parents.difference(&children).cloned().collect()
}
fn build_node_recursively(id: i32, children_map: &HashMap<i32, Vec<i32>>) -> Node {
let children = children_map
.get(&id)
.map(|children| {
children
.iter()
.map(|child_id| build_node_recursively(*child_id, children_map))
.collect::<Vec<_>>()
})
.unwrap_or_default();
Node { id, children }
}
pub fn build_tree(edge_list: &[(i32, i32)]) -> Node {
let children_map = build_children_map(edge_list);
let [root]: [i32; 1] = find_roots(edge_list).as_slice().try_into().unwrap();
build_node_recursively(root, &children_map)
}
fn main() {
let edge_list: Vec<(i32, i32)> = vec![(2, 4), (1, 2), (1, 3), (2, 5), (3, 6), (3, 7), (4, 8)];
let tree = build_tree(&edge_list);
println!("{:#?}", tree);
}
Node {
id: 1,
children: [
Node {
id: 2,
children: [
Node {
id: 4,
children: [
Node {
id: 8,
children: [],
},
],
},
Node {
id: 5,
children: [],
},
],
},
Node {
id: 3,
children: [
Node {
id: 6,
children: [],
},
Node {
id: 7,
children: [],
},
],
},
],
}
Note that we don't really perform any sanity checks on the input. This algorithm will crash with an infinite recursion if the input contains loops, for example.
There are of course ways to correct for that, but that is out of scope for this question, in my opinion.