pythonmultithreadingflush

Why is another thread used here to flush the data?


I am reading the code of pickdb.

In this function:

def dump(self):
    '''Force dump memory db to file'''
    json.dump(self.db, open(self.loco, 'wt'))
    self.dthread = Thread(
        target=json.dump,
        args=(self.db, open(self.loco, 'wt')))
    self.dthread.start()
    self.dthread.join()
    return True

I don't understand why a thread is started to dump() again after using the dump() method

I'm new to this and I don't know if this is a way to ensure the data is flushed


Solution

  • Seems like a merge error by the author of the codebase.

    The original pull request by dron22 removed the main thread call to dump(): https://github.com/patx/pickledb/issues/18

    But when the patch is merged by maintainer patx, it kept the main thread dump: https://github.com/patx/pickledb/commit/658fdda86abbd1b5a37fb0f9fc678ca145cc25c6

    It's all moot anyway, even writing the file in a separate thread isn't going to save you from all cases of data corruption. You need to actually do an atomic filesystem operation or have journalling to prevent data corruption when writing structured data in a database, neither of which are trivial to implement.

    This is yet another reason why you shouldn't use someone else's pet database project for important projects, especially one that's practically unmaintained. Keep to mainstream databases.