pythonmmaprocksdbdata-serialization

Best way to load a mmap dictionary in Python without deserializing


Context:

I've got Python processes running on the same container and I want to be able to share a read-only key-value object between them.

I'm aware I could use something like Redis to share that info, but I'm looking for optimal solution in regards to latency as well as memory usage.

My idea was to generate a binary object on disk and open that file using mmap

Question: That brings me to my question, is there a binary format or library that would load a read-only file in ram an offer a dictionary interface, without the need to deserialize the file content? This way I could mmap the file in every process, all process would be re-using the same RAM for that file and I would be able to access the file's content with a dict-like interface?

I'm looking a file/object format for dictionaries, similar to what Parquet is to columnar storage, that could be consumed in read-only mode by python.


Solution

  • Since your data is read-only, and you do not need to iterate over keys in lexicographic order: use constant database cdb. They are bindings for python. You might want to look into an easy grab in the stdlib called dbm.