jenaindicestdb

What indices does jena TDB2 use?


I am trying to find out what indices TDB2 builds. I found out by the code that it uses B+ trees to store them on disc but I didn't get what they contain and how they are used.

So my detailed questions are:

It would also help me if you could point me to a white paper or something similar about TDB2's software design. I searched for it but couldn't find anything.


Solution

  • TDB2 has a "id" for each RDF term (literal's URIs, blank nodes). The id is a fixed length 64. Another way of say ting is it keeps a dictionary.

    For triples it keeps SPO, POS, and OSP (this is configurable but that's the default). A triple is stored in an index as those ids - so 3 ids per triple. Fixed length.

    Indexes are memory mapped files outside the heap by default. They provide the good usability.

    That's the current default setup. The code isolates changes e.g. 64 bit ids could be longer, different index choices made.