sqldatabasedata-structuresrelational-databasespace-partitioning

How can I store a binary space partitioning tree in a relational database?


I'm trying to store the data in a binary space partitioning tree in a relational database. The tricky part about this data structure is it has two different types of nodes. The first type, which we call a data node, simply holds a certain number of items. We define the maximum number of items able to be held as t. The second type, which we refer to as a container node, holds two other child nodes. When an item is added to the tree, the nodes are recursed until a data node is found. If the number of items in the data node are less than t, then the item is inserted into the data node. Otherwise the data node is split into two other data nodes, and is replaced by one of the container nodes. When an element is deleted, a reverse process must happen.

I'm a little bit lost. How am I supposed to make this work using a relational model?


Solution

  • Why not have two tables, one for nodes and one for items? (Note that I used the term "leaf" instead of "data" nodes below when I wrote my answer; a "leaf" node has data items, a non-"leaf" node contains other nodes.)

    The node table would have columns like this: id primary key, parentid references node, leaf boolean and in addition some columns to describe the spatial bounaries of the node and how it will/has been split. (I don't know if you're working in 2D or 3D so I haven't given details on the geometry.)

    The data table would have id primary key, leafid references node and whatever data.

    You can traverse the tree downward by issuing SELECT * FROM node WHERE parentid = ?queries at each level and checking which child to descend into. Adding a data item to a leaf is a simple INSERT. Splitting a node requires unsetting the leaf flag, inserting two new leaf nodes, and updating all the data items in the node to point to the appropriate child node by changing their leafid values.

    Note that SQL round trips can be expensive, so if you're looking to use this for a real application, consider using a relatively large t in the DB constructing a finer-grained tree in memory of the leaves you are interested in after you have the data items.