[SOLVED] Traversing BerkleyDB database in store order

Traversing BerkleyDB database in store order

When using cursors in BerkleyDB JE I found that traversing a dataset generate a lot of random read IO. It happens because BDB traverse dataset in primary key ascending order.

In my application I have not any requirements to process dataset in order (mathematically speaking, my operation is commutative) and I interested in maximizing throughput.

Is there any way to process dataset with cursor in store order and not in primary key order.

Solution

I would guess not ; BDBJE is a log-structured database - ie, all writes are appended to the end of a log. This means that records are always appended to the last log, and may supercede records in previous logs. Because BDBJE cannot by design write to old logs, it cannot mark old records as superceded, so you cannot walk forward through the storage processing records because you are unaware of whether the record is current without having processed records from later in the log.

BDBJE will clean old logs as their "live" record count diminishes by copying the live records forward into new logs and deleting the old files, which shuffles the ordering yet more.

I found the Java binding of Kyoto Cabinet to be faster than BDB for raw insert performance, and you have a choice of storage formats, which may allow you to optimize your cursor-ordered record traverse performance. The license is similar (Kyoto Cabinet is GPL3, BDB is the Oracle BDB License (copyleft)) unless you pay for a commercial license in either case.

Update : As of version 5.0.34, BDBJE includes the DiskOrderedCursor class which addresses the required use case - it traverses records in log sequence, which in an unfragmented log file should be the same as disk order.