I'm running a Python program which uses the shelve
module on top of pickle
. After running this program sometimes I get one output file as a.data
but at other times I get three output files as a.data.bak
, a.data.dir
and a.data.dat
.
Why is that?
There is quite some indirection here. Follow me carefully.
The shelve
module is implemented on top of the dbm
module. This module acts as a facade for 3(* different specific DBM implementations, and it will pick the first module available when creating a new database, in the following order:
dbm.gnu
, Python module for the GNU DBM library; you would use it directly if you needed the extra functionality it offers over the base dbm
module (it lets you iterate over the keys in stored order and 'pack' the database to free up space from deleted objects).dbm.ndbm
, a proxy module using either the ndbm
, BSD DB and GNU DBM libraries (choosen when Python is compiled).dbm.dumb
, a pure-python implementation.It is this range of choices that makes shelve
files appear to grow extra extensions on different platforms.
The dbm.dumb
module is the one that adds the .bak
, .dat
and .dir
extensions:
Open a dumbdbm database and return a dumbdbm object. The filename argument is the basename of the database file (without any specific extensions). When a dumbdbm database is created, files with
.dat
and.dir
extensions are created.
The .dir
file is moved to .bak
as new index dicts are committed for changes made to the data structures (when adding a new key, deleting a key, or by calling .sync()
or .close()
).
It means that the other three options for anydbm
are not available on your platform.
The other formats may give you other extensions. The dbm
module may use .dir
, .pag
or .db
, depending on what library was used for that module.
(* Python 2 had four dbm modules, it would default to the deprecated dbhash
module, which in turn was built on top of the bsddb
module. These were both removed from Python 3.