pythonunixpickleshelve

Why does the shelve module in python sometimes create files with different extensions?


I'm running a Python program which uses the shelve module on top of pickle. After running this program sometimes I get one output file as a.data but at other times I get three output files as a.data.bak, a.data.dir and a.data.dat.

Why is that?


Solution

  • There is quite some indirection here. Follow me carefully.

    The shelve module is implemented on top of the dbm module. This module acts as a facade for 3(* different specific DBM implementations, and it will pick the first module available when creating a new database, in the following order:

    1. dbm.gnu, Python module for the GNU DBM library; you would use it directly if you needed the extra functionality it offers over the base dbm module (it lets you iterate over the keys in stored order and 'pack' the database to free up space from deleted objects).
    2. dbm.ndbm, a proxy module using either the ndbm, BSD DB and GNU DBM libraries (choosen when Python is compiled).
    3. dbm.dumb, a pure-python implementation.

    It is this range of choices that makes shelve files appear to grow extra extensions on different platforms.

    The dbm.dumb module is the one that adds the .bak, .dat and .dir extensions:

    Open a dumbdbm database and return a dumbdbm object. The filename argument is the basename of the database file (without any specific extensions). When a dumbdbm database is created, files with .dat and .dir extensions are created.

    The .dir file is moved to .bak as new index dicts are committed for changes made to the data structures (when adding a new key, deleting a key, or by calling .sync() or .close()).

    It means that the other three options for anydbm are not available on your platform.

    The other formats may give you other extensions. The dbm module may use .dir, .pag or .db, depending on what library was used for that module.


    (* Python 2 had four dbm modules, it would default to the deprecated dbhash module, which in turn was built on top of the bsddb module. These were both removed from Python 3.