pythonasynchronousredisgarbage-collectionconnection-pooling

Can I rely on garbage collector to close asynchronous database connections in Python?


My team is working on an asynchronous HTTP web server implemented in Python (CPython 3.11 to be exact). We're using Redis for data storage and connect to it with the help of the redis-py library. Since the HTTP server is asynchronous, we use the redis.asyncio.Redis client class - it creates a connection pool internally and manages it automatically.

The Redis server is hosten in AWS, and will have password rotation configured. Currently, we're trying to come up with a way how to deal with this fact in our Python code automatically. There're 2 steps that we have to perform:

  1. Create a new connection pool as soon as we know that new credentials are available
  2. Close the existing connection pool as soon as we know that it will not be used anymore

The problem here is step #2. It's not guaranteed that we'll be able to introduce any synchronization mechanism that would tell us whether the connection pool can be safely closed manually (i.e. there're no HTTP requests being handled at that very moment which rely on the old connection pool), so we're looking for an alternative automated solution first. Right now I'd like to know whether we can rely on garbage collector to safely close any existing connections for us.

According to the documentation, redis.asyncio.Redis instances must be closed manually because the __del__ magic method, which is synchronous by its nature, cannot execute await self.aclose() itself. At the same time, I'm wondering what happens if these objects are simply destroyed by the GC. In theory, the cleanup process should go something like this:

  1. GC destroys a redis.asyncio.Redis instance (a.k.a. client) together with all its fields
  2. GC destroys the connection-pool-class instance stored inside that client
  3. GC destroys the connection-list stored inside that connection pool
  4. GC desroys all the connection-class instances stored inside that list

I performed an aritificial test similar to this:

client = Redis(...)

await asyncio.gather(*(
    client.get(f"key_{i}") for i in range(100)
))

# checkpoint 1
client = Redis(...)
# checkpoint 2

And the Redis server reported 100 connections being open at checkpoint #1, and 0 connections being open at checkpoint #2 (in one case immediately, and in a slightly different case I had to make another request to the Redis server using the new client instance beforehand). It seems that (ab-)using the GC like this doesn't keep any connections hanging on the Redis server, but can we be sure that everything will be properly cleaned up on the HTTP server, and we won't end up with any memory leaks or hanging system resources?


Solution

  • Short answer: "Can I rely...?"

    Yes.


    There's two terms you keep using which we should carefully define.

    Where you say "python", I choose to interpret that as "the cPython 3.12 interpreter" (or pretty much any modern 3.x interpreter).

    Where you say "GC", I mostly view that as "ancient variable goes out-of-scope".

    The "python language" encompasses several implementations, including Jython and Iron Python. Each implementation has its own approach to managing and reclaiming memory allocations.

    The cPython bytecode interpreter certainly does have a Garbage Collector. But it is executed infrequently, for the special case of dealing with cyclic datastructures. Usually reference counting is what's of interest. When an object's ref count goes to zero, for example when it goes out of scope, the cPython interpreter immediately reclaims its storage. This is quite predictable synchronous behavior. Very few python apps rely heavily on __del__ methods to reclaim resources, as __del__ execution may be deferred for a long time or even indefinitely.

    In java we commonly see x = null, to tell the GC that x is no longer needed and it's open season on collecting the storage of what it used to point to. In python we can write del x, but that is seldom useful. If caller still holds a reference, then decrementing our reference count won't drive it to zero, so nothing happens.

    The situation where explicit del tends to be most useful is del mydict[some_key], to prevent a dictionary's storage from growing without bound.


    You are not abusing any GC facilities.

    As long as async routines eventually return, then yes, redis objects will be reclaimed without memory leaks.