rusttreeedges

Tree from edge list


I am trying to get from an (potentially) unordered edge list

let edge_list: Vec<(i32, i32)> = vec![(2, 4), (1, 2), (1, 3), (2, 5), (3, 6), (3, 7), (4, 8)];

to a "json tree"

Node { 
    id: 1, 
    children: vec![
        Node { 
            id: 2, 
            children: vec![
                Node { 
                    id: 4, 
                    children: vec![
                        Node {
                            id: 8,
                            children: vec![],
                        }
                    ]
                }, 
                Node { 
                    id: 5, 
                    children: vec![],
                }
            ] 
        }, 
        Node { 
            id: 3, 
            children: vec![
                Node { 
                    id: 6, 
                    children: vec![],
                }, 
                Node { 
                    id: 7, 
                    children: vec![],
                }
            ] 
        }
    ] 
}

before serialization

struct Node {
   id: i32,
   children: Vec<Node>
}

I've tried a bunch of stuff but they end up being extremely inefficient. Can anyone help out?


Solution

  • I guess you are talking about huge trees, because for the size of your example, any algorithm is sufficient ;)

    But for huge numbers, the only thing that really matters is asymptotic time complexity, normally expressed in Big-O notation.

    It's kinda complicated to calculate, so I won't go into detail. Just know that if you manage to write an algorithm that manages to get in a smaller complexity class, you expect speedups in the order of thousands, depending on your problem size.

    Just as a quick rundown, those are the most important complexity classes in computer science:

    O(1) < O(log(n)) < O(n) < O(n*log(n)) < O(n^2) < O(n^k) < O(k^n)

    Now let's see how we can solve the problem.

    The simplest idea is to have a bunch of trees and for every element, search the child and the parent and connect them. That means for every item (n) we need to search through the already existing trees (n, because we have no ordering in the trees) and then search for the parent (n) and connect them (1). That would give us a complexity of n^2, which is terrible.

    The best idea I could come up with is following:

    Those steps together have the complexity n + n + n, which is the complexity class of n. Meaning, if we compare them at 1000 items, this algorithm should be roughly 1000x times faster than the primitive n^2 algorithm.

    Here is some code with it:

    use std::collections::{HashMap, HashSet};
    
    #[derive(Debug)]
    pub struct Node {
        pub id: i32,
        pub children: Vec<Node>,
    }
    
    fn build_children_map(edge_list: &[(i32, i32)]) -> HashMap<i32, Vec<i32>> {
        let mut children_map: HashMap<i32, Vec<i32>> = HashMap::new();
    
        for &(parent, child) in edge_list {
            children_map.entry(parent).or_default().push(child);
        }
    
        children_map
    }
    
    fn find_roots(edge_list: &[(i32, i32)]) -> Vec<i32> {
        let parents = edge_list
            .iter()
            .map(|(parent, _)| parent)
            .cloned()
            .collect::<HashSet<_>>();
    
        let children = edge_list
            .iter()
            .map(|(_, child)| child)
            .cloned()
            .collect::<HashSet<_>>();
    
        parents.difference(&children).cloned().collect()
    }
    
    fn build_node_recursively(id: i32, children_map: &HashMap<i32, Vec<i32>>) -> Node {
        let children = children_map
            .get(&id)
            .map(|children| {
                children
                    .iter()
                    .map(|child_id| build_node_recursively(*child_id, children_map))
                    .collect::<Vec<_>>()
            })
            .unwrap_or_default();
    
        Node { id, children }
    }
    
    pub fn build_tree(edge_list: &[(i32, i32)]) -> Node {
        let children_map = build_children_map(edge_list);
    
        let [root]: [i32; 1] = find_roots(edge_list).as_slice().try_into().unwrap();
    
        build_node_recursively(root, &children_map)
    }
    
    fn main() {
        let edge_list: Vec<(i32, i32)> = vec![(2, 4), (1, 2), (1, 3), (2, 5), (3, 6), (3, 7), (4, 8)];
    
        let tree = build_tree(&edge_list);
        println!("{:#?}", tree);
    }
    
    Node {
        id: 1,
        children: [
            Node {
                id: 2,
                children: [
                    Node {
                        id: 4,
                        children: [
                            Node {
                                id: 8,
                                children: [],
                            },
                        ],
                    },
                    Node {
                        id: 5,
                        children: [],
                    },
                ],
            },
            Node {
                id: 3,
                children: [
                    Node {
                        id: 6,
                        children: [],
                    },
                    Node {
                        id: 7,
                        children: [],
                    },
                ],
            },
        ],
    }
    

    Note that we don't really perform any sanity checks on the input. This algorithm will crash with an infinite recursion if the input contains loops, for example.

    There are of course ways to correct for that, but that is out of scope for this question, in my opinion.