pythonpython-asyncioshelve

Does shelve write to disk on every change?


I wish to use shelve in an asyncio program and I fear that every change will cause the main event loop to stall.

While I don't mind the occasional slowdown of the pickling operation, the disk writes may be substantial.

Every how often does shelve sync to disk? Is it a blocking operation? Do I have to call .sync()?

If I schedule the sync() to run under a different thread, a different asyncio task may modify the shelve at the same time, which violates the requirement of single-thread writes.


Solution

  • shelve, by default, is backed by the dbm module, in turn backed by some dbm implementation available on the system. Neither the shelve module, nor the dbm module, make any effort to minimize writes; an assignment of a value to a key causes a write every time. Even when writeback=True, that just means that new assignments are placed in the cache and immediately written to the backing dbm; they're written to make sure the original value is there, and the cache entry is made because the object assigned might change after assignment and needs to be handled just like a freshly read object (meaning it will be written again when synced or closed, in case it changed).

    While it's possible some implementation of the underlying dbm libraries might include some caching, AFAICT, most do try to write immediately (that is, pushing data to the kernel immediately without user-mode buffering), they just don't necessarily force immediate synchronization to disk (though it can be requested, e.g. with gdbm_sync).

    writeback=True will make it worse, because when it does sync, it's a major effort (it literally rewrites every object read or written to the DB since the last sync, because it has no way of knowing which of them might have been modified), as opposed to the small effort of rewriting a single key/value pair at a time.

    In short, if you're really concerned about blocking writes, you can't use unthreaded async code without potential blocking, but said blocking is likely short-lived as long as writeback=True is not involved (or as long as you don't sync/close it until performance considerations are no longer relevant). If you need to have truly non-blocking async behavior, all shelve interactions will need to occur under a lock in worker threads, and either writeback must be False (to avoid race conditions pickling data) or if writeback is True, you must take care to avoid modifying any object that might be in the cache during the sync/close.