pythondbm

Python dumbdbm, when will data be written back to disk?


I'm using Python2.7's dumbdbm, but this question also applies to Python3's dbm.dumb.

The documentation says:

dumbdbm.sync()
Synchronize the on-disk directory and data files. This method is called by the sync() method of Shelve objects.

I've got three questions:

  1. If I don't call sync, will disk file get updated?
  2. And does this function always write data back to disk, not inverse?
  3. What if I call close?

Solution

  • One — perhaps the best if not only — way to answer questions like this that aren't specifically addressed in the documentation is to read the source code (when it's available, as it is here).

    The dumbdbm.py file should be in your /Python/Lib directory and can also be viewed online in your browser through the Mercurial source code revision control system at:

        https://hg.python.org/cpython/file/2.7/Lib/dumbdbm.py

    The first thing to notice is the longish comment at the beginning of the private _Database class — which is what a dumbdbm database really is — because it seems to generally deal with what seems to be overall theme of your questions:

    class _Database(UserDict.DictMixin):
    
        # The on-disk directory and data files can remain in mutually
        # inconsistent states for an arbitrarily long time (see comments
        # at the end of __setitem__).  This is only repaired when _commit()
        # gets called.  One place _commit() gets called is from __del__(),
        # and if that occurs at program shutdown time, module globals may
        # already have gotten rebound to None.  Since it's crucial that
        # _commit() finish successfully, we can't ignore shutdown races
        # here, and _commit() must not reference any globals.
    

    In-depth information about specific methods can be found by reading the source code for them. Given that, here's what I think the answers to your questions would be for version 2.7 of Python:

    1. If I don't call sync, will disk file get updated?

      From the preceding comment, it sounds like it will as long as your program shuts down gracefully.

      Beyond that it depends on the methods that have been called. Some may, but only partially. For instance, it looks like __setitem__() does, depending on whether the item is for a entirely new key or an existing one. For the latter cases there's a comment at the end of part that deals with them that says (realizing that _commit() is just another name for sync()):

      Note that _index may be out of synch with the directory file now: _setval() and _addval() don't update the directory file. This also means that the on-disk directory and data files are in a mutually inconsistent state, and they'll remain that way until _commit() is called. Note that this is a disaster (for the database) if the program crashes (so that _commit() never gets called).

    2. And does this function always write data back to disk, not inverse?

      sync() / _commit() does not appear to load any data back into memory from the disk.

    3. What if I call close?

      close() just calls _commit() and then sets all internal data structures to None, preventing any further database operations.

    In conclusion, for a somewhat humorous take on the meta-subject here, I suggest you read Learn to Read the Source, Luke.