pythondatabaselmdbarcticdb

Is there a way to specify library size when using arcticdb with lmdb?


I am working on a program which use arcticdb with a local instance of lmdb. During this process I want to create multiple libraries based on different types of data. As the size of the data will differ I want the library size to also be different, but as best as I can tell the library size mimics the map size when creating the orignal database connection. I have defined a minimal code based recreation of the problem below:

from arcticdb import Arctic
#Define arctic instance
ac = Arctic("lmdb://" + getcwd() + "/database?map_size=2GB")
#Create libraries
ac.create_library('test1')
ac.create_library('test2')

This will result in the following structure:

database
|__arctic_cfg
|_test1
| |_data.mdb ~2GB
| |_lock.mdb
|_test2
  |_data.mdb ~2GB
  |_lock.mdb

As can be seen each library will take the orignal map size, whereas in my program I want some libraries to be eg 100MB, others to be 20MB and so on. How can I achieve that?

I looked into the documentation and cannot gain any insight from there, one approach would perhaps be a master class with multiple arctic instances but then they would connect to different arctic_cfg folders and be tricky to interact with as I understand.


Solution

  • There's no mechanism to do this with ArcticDB currently.

    The behaviour does depend on your operating system. On Windows, the disk space is allocated "eagerly" and therefore the two libraries in your post would indeed take up 2GB of disk space. On Linux however, the disk is allocated lazily and therefore each of your libraries would use only the space it needs.

    The workaround you suggest of using a separate Arctic instance for each library size you need is sensible.

    Please feel free to raise an issue against ArcticDB for this as I agree that it would be useful to specify the map size during the get_library call instead of having it at the Arctic level.