
How to write (long) integer values to Berkeley DB using bsddb3?

I am trying to use Berkeley DB to store a frequency table (i.e. hashtable with string keys and integer values). The table will be written, updated, and read from Python; so I am currently experimenting with bsddb3. This looks like it will do most of what I want, except it looks like it only supports string values?

If I understand correctly, Berkeley DB supports any kind of binary key and value. Is there a way to efficiently pass raw long integers in/out of Berkeley DB using bsddb3? I know I can convert the values to/from strings, and this is probably what I will end up doing, but is there a more efficient way? I.e. by storing 'raw' integers?

Background: I am currently working with a large (potentially tens, if not hundreds, of millions of keys) frequency table. This is currently implemented using a Python dictionary, but I abort the script when it starts to swap into virtual memory. Yes I looked at Redis, but this stores the entire database in memory. So I'm about to try Berkeley DB. I should be able to improve the creation efficiency by using short-term in-memory caching. I.e. create an in-memory Python dictionary, and then periodically add this to the master Berkeley DB frequency table.


  • Do you need to read the data back from a language other than python? If not, you can just use pickle on the python long integers, and unpickle them when you read them back in. You might be able to (probably be able to) use the shelve module, which would do this automatically for you. But even if not, you can manually pickle and unpickle the values.

    >>> import cPickle as pickle
    >>> pickle.dumps(19999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, pickle.HIGHEST_PROTOCOL)
    >>> pickle.loads('\x80\x02\x8a(\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x7fT\x97\x05p\x0b\x18J#\x9aA\xa5.{8=O,f\xfa\x81|\xa1\xef\xaa\xfd\xa2e\x02.')