[SOLVED] What indices does jena TDB2 use?

What indices does jena TDB2 use?

I am trying to find out what indices TDB2 builds. I found out by the code that it uses B+ trees to store them on disc but I didn't get what they contain and how they are used.

So my detailed questions are:

For which collation order of RDF triples (like SPO, SOP, POS, PSO, ... ) does it build indices?
How are RDF Terms encoded and stored?
What strategy is used to load the indices into main memory? (I would expect paging)?

It would also help me if you could point me to a white paper or something similar about TDB2's software design. I searched for it but couldn't find anything.

Solution

TDB2 has a "id" for each RDF term (literal's URIs, blank nodes). The id is a fixed length 64. Another way of say ting is it keeps a dictionary.

For triples it keeps SPO, POS, and OSP (this is configurable but that's the default). A triple is stored in an index as those ids - so 3 ids per triple. Fixed length.

Indexes are memory mapped files outside the heap by default. They provide the good usability.

That's the current default setup. The code isolates changes e.g. 64 bit ids could be longer, different index choices made.