databasenosqlberkeley-dbleveldbrocksdb

Choosing the right DBM-like C++ library for sequential data


I am trying to choose a database for a newly developing application. There are so many alternatives and it’s so easy to choose a wrong one. First of all, there is a requirement to not use database servers. A required database should be a static or dynamic C++ library. The data that needs to be stored is an array of records. They vary but are fixed for a given dataset (so they can be stored in a table). The information in each row could be from several hundred bytes up to several megabytes. And a number of rows may be millions for now and expected to grow.

The index of the row could be used as a key. No need to maintain a separate key column.

Data is inserted sequentially. Read access will be performed only by iterating all the data or some segment of it sequentially (May need to iterate with steps like each 5th).

  1. I don’t think that relational DBs are good feet for many reasons. a. They are mostly server-based. I know about SQLite but as far as I know, it stores data in one file which I assume may lead to issues related to maximum file size. b. We don’t need the power that SQL provides instead we would like to have more flexibility in stored data types.
  2. There are Key/Value non-SQL dbms like BerkeleyDB, RocksDB, or something like luxio for lighter alternatives. The functionality they provide is more than enough for the task. And this might be the right choice however I don’t know how well they are optimized for such case where we have continuous integer keys. The associative key access (which is not required for us) may have some overhead in performance.
  3. I know there are some type of non-SQL databases called “wide-column” which I am not familiar with. However, the name sounds like it is perfect for our task. All databases I can find are server of claud based. If you know dbm-like library for such type of database please advise. I am not experienced in databases so please correct me if I am wrong in any of 3 above stamens.

Solution

  • If your row data can grow to megabytes, and you're talking about only millions of records, maybe just figure out a way to lay it out in a filesystem? If you need a more database-like index, use SQLite for the keys, and have the data records point to a location on the filesystem. This kind of thing will be far quicker to implement and get right than trying to do it all in one giant database.