Tree from edge list

I am trying to get from an (potentially) unordered edge list

let edge_list: Vec<(i32, i32)> = vec![(2, 4), (1, 2), (1, 3), (2, 5), (3, 6), (3, 7), (4, 8)];

to a "json tree"

Node { 
    id: 1, 
    children: vec![
        Node { 
            id: 2, 
            children: vec![
                Node { 
                    id: 4, 
                    children: vec![
                        Node {
                            id: 8,
                            children: vec![],
                        }
                    ]
                }, 
                Node { 
                    id: 5, 
                    children: vec![],
                }
            ] 
        }, 
        Node { 
            id: 3, 
            children: vec![
                Node { 
                    id: 6, 
                    children: vec![],
                }, 
                Node { 
                    id: 7, 
                    children: vec![],
                }
            ] 
        }
    ] 
}

before serialization

struct Node {
   id: i32,
   children: Vec<Node>
}

I've tried a bunch of stuff but they end up being extremely inefficient. Can anyone help out?

Solution

I guess you are talking about huge trees, because for the size of your example, any algorithm is sufficient ;)

But for huge numbers, the only thing that really matters is asymptotic time complexity, normally expressed in Big-O notation.

It's kinda complicated to calculate, so I won't go into detail. Just know that if you manage to write an algorithm that manages to get in a smaller complexity class, you expect speedups in the order of thousands, depending on your problem size.

Just as a quick rundown, those are the most important complexity classes in computer science:

O(1) < O(log(n)) < O(n) < O(n*log(n)) < O(n^2) < O(n^k) < O(k^n)

O(1) means that the calculation always takes the same duration, no matter how large the input is
O(k^n) means it scales so badly that usually you can't compute more than 10-11 items
everything else in between is where most problems lie in. For example, sorting an array (via comparisons) falls under n*log(n).

Now let's see how we can solve the problem.

The simplest idea is to have a bunch of trees and for every element, search the child and the parent and connect them. That means for every item (n) we need to search through the already existing trees (n, because we have no ordering in the trees) and then search for the parent (n) and connect them (1). That would give us a complexity of n^2, which is terrible.

The best idea I could come up with is following:

Build a map of parent -> children mappings. This requires iterating through all pairs (n) and inserting each of it in a HashMap (1, amortized); bringing this step to a complexity of n.
Find all root nodes by collecting all parents in a HashSet (n), collect all children (n) and then take the difference (n). Every node that is a parent but not a child is a root node. This step has a complexity of n + n + n, which is still n.
Recurse through the parent-child map (n) to know the order in which we have to create children, and then create a node for each (1) followed by adding it to its parent (1). This step therefore has the complexity n.

Those steps together have the complexity n + n + n, which is the complexity class of n. Meaning, if we compare them at 1000 items, this algorithm should be roughly 1000x times faster than the primitive n^2 algorithm.

Here is some code with it:

use std::collections::{HashMap, HashSet};

#[derive(Debug)]
pub struct Node {
    pub id: i32,
    pub children: Vec<Node>,
}

fn build_children_map(edge_list: &[(i32, i32)]) -> HashMap<i32, Vec<i32>> {
    let mut children_map: HashMap<i32, Vec<i32>> = HashMap::new();

    for &(parent, child) in edge_list {
        children_map.entry(parent).or_default().push(child);
    }

    children_map
}

fn find_roots(edge_list: &[(i32, i32)]) -> Vec<i32> {
    let parents = edge_list
        .iter()
        .map(|(parent, _)| parent)
        .cloned()
        .collect::<HashSet<_>>();

    let children = edge_list
        .iter()
        .map(|(_, child)| child)
        .cloned()
        .collect::<HashSet<_>>();

    parents.difference(&children).cloned().collect()
}

fn build_node_recursively(id: i32, children_map: &HashMap<i32, Vec<i32>>) -> Node {
    let children = children_map
        .get(&id)
        .map(|children| {
            children
                .iter()
                .map(|child_id| build_node_recursively(*child_id, children_map))
                .collect::<Vec<_>>()
        })
        .unwrap_or_default();

    Node { id, children }
}

pub fn build_tree(edge_list: &[(i32, i32)]) -> Node {
    let children_map = build_children_map(edge_list);

    let [root]: [i32; 1] = find_roots(edge_list).as_slice().try_into().unwrap();

    build_node_recursively(root, &children_map)
}

fn main() {
    let edge_list: Vec<(i32, i32)> = vec![(2, 4), (1, 2), (1, 3), (2, 5), (3, 6), (3, 7), (4, 8)];

    let tree = build_tree(&edge_list);
    println!("{:#?}", tree);
}

Node {
    id: 1,
    children: [
        Node {
            id: 2,
            children: [
                Node {
                    id: 4,
                    children: [
                        Node {
                            id: 8,
                            children: [],
                        },
                    ],
                },
                Node {
                    id: 5,
                    children: [],
                },
            ],
        },
        Node {
            id: 3,
            children: [
                Node {
                    id: 6,
                    children: [],
                },
                Node {
                    id: 7,
                    children: [],
                },
            ],
        },
    ],
}

Note that we don't really perform any sanity checks on the input. This algorithm will crash with an infinite recursion if the input contains loops, for example.

There are of course ways to correct for that, but that is out of scope for this question, in my opinion.