pythonsqlalchemyforkconnection-pooling

Will calling Engine.dispose() in a forked process cause errors in another process?


When using SQLAlchemy in a forked process, the recommended approach per sqlalchemy documentation (EDIT: originally linked 1.3 docs) is to call engine.dispose() immediately upon initializing the forked process. This is to prevent child processes from sharing the connection with a parent process. If you are using a Pool object, it will look something like this:

from multiprocessing import Pool

engine = create_engine("mysql+mysqldb://user:pass@host/dbname")

def run_in_process(some_data_record):
    with engine.connect() as conn:
        conn.execute(text("..."))


def initializer():
    """ensure the parent proc's database connections are not touched
    in the new connection pool"""
    engine.dispose(close=False)


with Pool(10, initializer=initializer) as p:
    p.map(run_in_process, data)

However, disposing the entire connection pool seems a bit extreme to me. In my mind, this results in a child process telling the parent process to drop all of its connections. This works just fine in a single-user application, but I am unsure of how calling Engine.dispose() will behave in a web application being accessed by many users.

Let's say I have forked process A, which is currently doing a long-running transaction with my engine. When I create a new process B, and call Engine.dispose(), will this cause disruptions in process A?


Solution

  • If you fork() and then call Engine.dispose in the child initialize as you and the latest docs suggest (or right away before using the pool)

    Let's say I have forked process A, which is currently doing a long-running transaction with my engine. When I create a new process B, and call Engine.dispose(), will this cause disruptions in process A?

    Assuming you call .dispose(close=False) the new process B (child of A) shouldn't affect A .. normally .dispose() would try to close A's connection causing badness, but setting close=False avoids this, leaving eventual closing to the parent .. .dispose() then creates a new connection pool as usual

    This feature was added in v1.4.33+ for this case

    Added new parameter Engine.dispose.close, defaulting to True. When False, the engine disposal does not touch the connections in the old pool at all, simply dropping the pool and replacing it. This use case is so that when the original pool is transferred from a parent process, the parent process may continue to use those connections.

    Internally, Engine.dispose works by literally creating a new connection pool .pool property which clobbers the old one - the core of the change commit just directly enables skipping .pool.dispose(), which closes out the connections

    -    def dispose(self) -> None:
    +    def dispose(self, close: bool = True) -> None:
             """ [[docstring omitted]]
             """
    -        self.pool.dispose()
    +        if close:
    +            self.pool.dispose()
             self.pool = self.pool.recreate()
             self.dispatch.engine_disposed(self)
    

    QueuePool (default Pool) logic lib/sqlalchemy/pool/impl.py

        def recreate(self) -> QueuePool:
            self.logger.info("Pool recreating")
            return self.__class__([...])
    
        def dispose(self) -> None:
            while True:
                try:
                    conn = self._pool.get(False)
                    conn.close()
                except sqla_queue.Empty:
                    break
            [...]
            self.logger.info("Pool disposed. %s", self.status())
    

    You originally linked to an old set of the docs (1.3), but your example is from the latest (2.x) https://docs.sqlalchemy.org/en/latest/core/pooling.html#using-connection-pools-with-multiprocessing-or-os-fork

    When you call fork(), the child processes soft-copy the parent process' memory and inherit all of its file descriptors (but not threads) - this is generally the huge benefit of (it's very efficient) and source of issues (whoops, multiple stdout writers, etc.) in the fork()-based multiprocessing model

    If you have a long-running transaction, it can't be happening in the main process thread or you'd be blocked on it and unable to fork() .. or conversely you may be calling fork() from another thread in order to work in parallel! In either case, you're still safe as the other working thread won't be copied during your fork() call (Are threads copied when calling fork?)