c++treecontiguous

Memory continuity and search tree


I'm trying to build a search tree on top of my results. Some kind of k-ary tree with n leafs at the end. I'm looking for a C++ solution and experimenting with std::vector but can't get it done as I need memory consistency. It could be done by nested vectors but I can't do that.

Let me explain the details by example:

A unsorted result could be

Result R = { 4, 7, 8, 3, 1, 9, 0, 2, 2, 9, 6 }

On top of that I need a tree with nodes which in my specific problem are centroids. But to keep it simple I will use artificial values here.

I define the search tree dimensions as

Height H = 2
Branch B = 3

The tree at first

4 7 8 3 1 9 0 2 2 9 6

Second step

layer_0          1.6    5      8.2
                   |    |        |
         +-+-+-+-+-+  +-+  +-+-+-+
         | | | | | |  | |  | | | |
layer_1  3 1 0 2 2 3  4 6  7 8 9 9

Last step

layer_0                 1.6          5           8.2
                          |          |             |
                  +---+---+    +-+---+    +---+----+
layer_1         0.8 1.6 2.4  4.2 5 5.8  6.4 8.2  8.4
                  |       |    |     |    |   |    |
                +-+   +-+-+    |     |    |   |  +-+
layer_2         1 0   2 2 3    4     6    7   8  9 9

This last tree is not a k-ary tree as the end-leafs sizes are 0 <= size <= |R|.

At this moment I'm experimenting with two vectors.

std::vector<size_t> layer_2;

std::vector<float> leafs;

std::size_t width, height;

With help of width and height it would be possible to navigate through leafs. But I'm questioning myself how to elegantly connect leafs and layer_2?

How would a good solution look like?


Solution

  • Note: This solution is continuous in the sense that it uses a contiguous data structure (vector or array) instead of a Node pointer style tree, but has the potential to have unused space within the data structure depending on the application.

    A case where this approach wastes a lot of space: The max amount of branches per node is large but most nodes actually have far fewer children. This will have no effect on how long it takes to find the leafs though. In fact it is a trade off to make that bit quite fast.

    consider a 3 branch tree with 4 levels in continuous memory: R,a,b,c,aa,ab,ac,ba,bb,bc,ca,cb,cc,aaa,aab,aac,aba,abb,abc,baa.... where the index of a node's children ranges from (parent_index*3)+1 to (parent_index*3)+3

    The important caveat I alluded to is that every node must always have it's three child spaces in the vector, array, whatever. If a node has say only 2 children, just fill that extra space with a null_child value to hold the space. (this is where the wasted space comes from)

    The advantage is that now, finding all of the leaves is easy.

    first_leaf_index = 0
    for(i=0;i<(4-1);i++)//in this example 4 is the depth
        first_leaf_index += 3^(i) //three is max branches per node
    

    At that point just iterate to the end of the data structure