databaseb-tree

Order of B+ Tree


Looking at different sources, it seems to me that there are two definitions to B+ tree order. The first one is talking about number of entries per node, where the second one is talking about the number of children (pointers). Are they referring to the same definition? or maybe I couldn't see how they are related.

  1. (https://cs186berkeley.net/notes/note4/)

"The number d is the order of a B+ tree. Each node (with the exception of the root node) must have d ≤ x ≤ 2d entries assuming no deletes happen (it’s possible for leaf nodes to end up with < d entries if you delete data). The entries within each node must be sorted."

  1. (https://www.cs.cornell.edu/courses/cs3110/2012sp/recitations/rec25-B-trees/rec25.html)

"A B-tree of order m is a search tree in which each nonleaf node has up to m children. The actual elements of the collection are stored in the leaves of the tree, and the nonleaf nodes contain only keys. Each leaf stores some number of elements; the maximum number may be greater or (typically) less than m."


Solution

  • First, note that this topic is the same for B+ trees as for B trees. In fact, the second quote deals with B trees. But that is not relevant for the definition of the term order.

    Are they referring to the same definition? or maybe I couldn't see how they are related.

    No, they use a different definition.

    Wikipedia's article on B-tree has a paragraph covering this difference in terminology:

    The literature on B-trees is not uniform in its terminology.

    Bayer and McCreight (1972), Comer (1979), and others define the order of B-tree as the minimum number of keys in a non-root node. Folk and Zoellick points out that terminology is ambiguous because the maximum number of keys is not clear. An order 3 B-tree might hold a maximum of 6 keys or a maximum of 7 keys. Knuth (1998) avoids the problem by defining the order to be the maximum number of children (which is one more than the maximum number of keys).

    So the first quote you made, uses the Bayer and McCreight's definition (and "x ≤ 2d" implies that the maximum number of keys is always even), while the second quote uses Knuth's definition.

    Unrelated to your question, but the remark in the first quote that "it’s possible for leaf nodes to end up with < d entries if you delete data" assumes a "lazy" variant. In that variant the tree could become relatively inefficient when a lot is deleted from it. The usual procedure is to rebalance the tree after deletion (Deletion from a leaf node), which may involve merging nodes. This way the constraint on the minimum number of keys is maintained throughout the tree.