python orm database-connection ponyorm premature-optimization

Should I minimize the usage of db_session in Pony ORM? What is the purpose of db_session?

I am wondering, whether I should minimize usage of db_session or not? Let's consider these two equivalent examples:

def do_stuff():
    with db_session:
        task = orm.make_proxy(Task.select().first())
        task.mark_started()
    ...
    this_function_might_take_a_long_time(task)
    ...
    with db_session:
        task.mark_done()

@db_session
def do_stuff():
    task = Task.select().first()
    task.mark_started()
    commit()
    ...
    this_function_might_take_a_long_time(task)
    ...
    task.mark_done()

From reading the documentation I can tell that Pony doesn't encourage micro-managing db_sessions

With this code each of view function you will define will be wrapped with db_session so you should not care about them.

However, here ~~it suggests that there might be a cost of having it open~~ (EDIT: it doesn't, read the answer)

Before sending the first query, Pony gets a database connection from the connection pool.

Am I correct to say that anything beyond B is premature optimization, and A should be considered only in limited DB connection count scenarios?

Solution

Pony ORM author Alexander Kozlovsky @metaprogrammer answered this in the Official Pony ORM Telegram chat.

The purpose of db_session is to manage three things:

Database connection

Pony associates separate connection to each thread. If process does not use threads, then only one connection will be used. When db_session ends, it returns its connection to the connection pool. This means that connection remains open and keeps for the future usage. Next db_session will use the same connection. So, regarding conection usage, there is no difference between a single db_session and several sequential db_sessions

Transaction state

When db_session ends, it perform implicit commit. There is no difference between implicit and explicit commits, so if you have a single db_session with manual commit() call it is the same as several sequential db_sessions. However, if you don't use explicit commit(), then long db_session may hold database lock and prevent other processes to work with database or specific table until commit is performed

In-memory cache of loaded objects

The main difference between a single db_session and several sequential db_sessions is in managing in-memory cache of objects loaded from the database. Each db_session has separate cache. Objects inside a cache are cross-linked by relations. If you load a bunch od Course, Student and Group objects, they are all interlinked to each other via relationship attributes. Because of this, it is impossible to unload some objects off the cache and keep the rest. Garbage collector cannot collect just some objects from the cache, because they all are point to each other with circular references. So the cache can be deleted only as a whole when db_session is finished. So, if you have a single long-standing db_session, it will not free memory until the end. But several smaller db_session may require load the same object from the database several times. So there is a memory/performance trade-off

P.S. Even if db_session is finished, connection will still be hold until program end or explicit db.disconnect() call. Pony closes connection implicitly only if some database exception inside db_session leads to rollback.