functional-programmingocamlbinary-search-tree

The correct way to build a Binary Search Tree in OCaml


Ok, I have written a binary search tree in OCaml.

type 'a bstree = 
    |Node of 'a * 'a bstree * 'a bstree
    |Leaf


let rec insert x = function
    |Leaf -> Node (x, Leaf, Leaf)
    |Node (y, left, right) as node -> 
        if x < y then
            Node (y, insert x left, right)
        else if x > y then
            Node (y, left, insert x right)
        else
            node

The above code was said to be good in The right way to use a data structure in OCaml

However, I found a problem. This insert will only work when building a bst from a list in one go, such as

let rec set_of_list = function
     [] > empty
   | x :: l > insert x (set_of_list l);;

So if we build a bst from a list continuously, no problem, we can get a complete bst which has all nodes from the list.

However, if I have a bst built previously and now I wish to insert a node, then the resulting bst won't have complete nodes from the previous tree, am I right?

Then how should I write a bst in OCaml so that we create a new bst with all nodes from previous tree to keep the previous tree immutable? If every time I need to copy all nodes from old bst, will that impact the performance?


Edit:

So let's say initially, a bst is created with one node t1 = (10, Leaf, Leaf).

then I do let t2 = insert 5 t1, then I get t2 = (10, (5, Leaf, Leaf), Leaf), right? inside t2, let's give a variable c1 to the child node (5, Leaf, Leaf)

then I do let t5 = insert 12 t2, then I get t3 = (10, (5, Leaf, Leaf), (15, Leaf, Leaf)). let's give a variable c2 to the child node (5, Leaf, Leaf)

So my question is whether c1 == c2? Are the two (5, Leaf, Leaf)s in t2 and t3 exactly the same?


Solution

  • Look at the accepted answer to the linked question. To be specific this line here:

    let tree_of_list l = List.fold_right insert l Leaf
    

    Work out the chain of what is happening. Take the list 1,2,3.

    First we have no tree and the result of insert 1 Leaf.

    call this T1

    Next is the tree generated by insert 2 T1

    call this T2

    Then the tree generated by insert 3 T2

    This is what is returned as the result of Tree_of_list.

    If we call the result T3 then somewhere else in code call insert 4 T3 there is no difference in the result returned from insert than in calling Tree_of_list with the list 1,2,3,4.