data-structurestreeb-treeb-plus-tree

Should data in B+Tree be ordered and how much data should I be loading at a time?


Within my B+Tree I have ordered keys at the leaf levels, with data pointers to a separate data file. This question refers to the order of data within this data file.

As I see it, When ordered:

when not ordered:

Alternatively, should I just load entire nodes into memory when I need to access data from an entry within that node?

Looking for a second opionion on the best way I should be storing data (ordered/unordered) and how many data pointers should I be loading when performing a simple "get" for one value?

Thanks!


Solution

  • Alternatively, should I just load entire nodes into memory when I need to access data from an entry within that node?

    Absolutely. That is the idea behind the B-tree family of data structures. It is not intended for reading/writing partial nodes (blocks) from/to slow storage. When a node is needed, it is read in its entirety into memory (if not already loaded), manipulated, and written back entirely to persist it.

    As the manipulation of the data in the node itself happens in memory, the choice of whether to keep the node's content sorted or not is of less importance. The read/write operations to the slow(er) storage will be much more determining for the overall performance.

    Making considerations of the time complexity of either choice are irrelevant too, since there is a preset maximum to the number of keys in a node. So all single-node operations can be considered to have a constant time complexity.

    You could time the different implementations and see if that leads to a clear choice. This choice will depend on the degree of the B-tree.