berkeley-db-je

Berkeley DB File compression


This is in conjunction with my previous question Click here. We are using berkeley DB for temporary storage before it is processed and stored into a relational DB.The problem arises when the size increases beyond a certain point.Now we have to either split the files into smaller ones or compress the existing files.In this question I want to ask the compression part,whether berkeley DB has any built in compression utility or do we have to do it programatically.If it is built in,then it will always be faster.


Solution

  • From here:

    According to the Berkeley FAQ there are two ways of optimizing it (before compression):

    1. Compact
    2. Vacuum

    You can also implement your own compression algorithm as shown here.

    How different is the Berkeley DB VACUUM from SQLite's?

    SQLite implements the VACUUM command as a database dump followed by a complete reload from that dump. It is an expensive operation, locking the entire database for the duration of the operation. It is also an all or nothing operation. Either it works, or it fails and you have to try again sometime. When SQLite finishes, the database is frequently smaller in size (file size is smaller) and the btree is better organized (shallower) than before due to in-order key insertion of the data from the dump file. SQLite, when it works and when you can afford locking everyone out of the database, does a good job of VACUUM. Berkeley DB approaches this in a completely different way. For many releases now Berkeley DB's B-Tree implementation has had the ability to compact while other oprations are in-flight. Compacting is a process wherein the B-Tree nodes are examined and, when less than optimal, they are re-organized (reverse split, etc.). The more shallow your B-Tree, the fewer lookups required to find the data at a leaf node. Berkeley DB can compact sections of the tree, or the whole tree at once. For 7x24x365 (five-nines) operation this is critical. The BDB version of compact won't adversly impact ongoing database operations whereas SQLite's approach does. But compaction doesn't address empty sections of the database (segments of the database file where deleted data once lived). Berkeley DB also supports compression of database files by moving data within the file, then truncating the file returning that space to the filesystem. As of release 5.1 of Berkeley DB, the VACUUM command will compact and compress the database file(s). This operation takes more time than the dump/load approach of SQLite because it is doing more work to allow for the database to remain operational. We believe this is the right trade-off, but if you disagree you can always dump/load the database in your code.