pythondatabaseshelve

Python shelve module why is this code fast, but the other slow?


The output of the code below is

fast: 0.018553733825683594
slow: 7.0305609703063965

and, more than that, the file of slow.dat is 10,252KB while fast.dat is only 32KB. Why is the fast one small.. fast and small while slow.... is so slow and big?

import shelve
import random
import time

start = time.time()
db = shelve.open('fast') 

db["answers"] = []
answers = []

for i in range(1000):
    answer = {
        "foo0": random.randint(1,10),
        "foo1": random.randint(1,10),
        "foo2": random.randint(1,10),
        "foo3": random.randint(1,10),
        "foo4": random.randint(1,10),
        "foo5": random.randint(1,10)
    }
    answers.append(answer)

db['answers'] = answers
db.close()
print("fast:", time.time() - start)


start = time.time()
db = shelve.open('slow') # slow and uses !!!!WAY MORE SPACE!!!!
db["answers"] = []

for i in range(1000):
    answer = {
        "foo0": random.randint(1,10),
        "foo1": random.randint(1,10),
        "foo2": random.randint(1,10),
        "foo3": random.randint(1,10),
        "foo4": random.randint(1,10),
        "foo5": random.randint(1,10)
    }
    db['answers'] = db['answers'] + [answer]

db.close()
print("slow:", time.time() - start)

Solution

  • The docs say that shelve has problems with knowing whether a mutable structure has been modified. They suggest using writeback=True which caches the structures in memory and writes them on .sync and .close.

    This improved the required time and space by little but OP said that also using .append on these lists solves the problem.

    If there are still problems I would suggest using a better-suited database to your situation than shelve.