pythonpostgresqlpsycopg2psycopg3

psycopg3 pool: all connections getting lost instantly after long idle time


If it's a standalone persistant connection, I have no problem, connection lasts for hours.
If I use psycopg(3) Connection pool, I make make requests, first requests are ok and my pool size stays at 5, at one point pool size decreases and I get a Pool Timeout when client makes a new request.
Then I tried: start pool, do not request anything, just wait. After some time (around 1h) I look at postgresql (pg_stat_activity), I have 5 idle (=pool size) connections. Then I make a request from client, and all connections vanish at same time (I can see it from pg_stat_activity) and Pool Timeout, and situation is stuck.

I also tried to decrease max_timeout to 900 but still same issue.

def init_pool(self, min_cnx=5):
    cnx_str = f"host={DB_HOST} port={DB_PORT} dbname={DB_NAME} user={DB_USERNAME} password={DB_USERPWD}"
    self.pool = ConnectionPool(conninfo=cnx_str, min_size=min_cnx, open=True, check=ConnectionPool.check_connection)

def query(self, q, dbv=None, debug=False) -> list:

    print("pool size: ", len(self.pool._pool))
    print("pool stats before: ", self.pool.get_stats())

    with self.pool.connection() as cnx:

        if cnx.closed:
            self.pool.check()
            raise ConnectionError("ERROR: PostgreSQL cnx from pool is closed.")

        cnx.autocommit = True
        cnx.row_factory = self.row_factory

        with psycopg.ClientCursor(cnx) as rdc:
            rdc.execute(q, dbv) if dbv else rdc.execute(q)

            if debug and rdc._query:
                print(rdc._query.query)

            if rdc.description:
                data = rdc.fetchall()
            else:
                data = []

        print("pool stats after query: ", self.pool.get_stats())
    print("pool stats after: ", self.pool.get_stats())
    return data

And logs:

[pid: 236344|app: 0|req: 26/26] () {56 vars in 1083 bytes} [Mon Feb  5 11:41:56 2024] POST /v1/user => generated 933 bytes in 
109 msecs (HTTP/1.1 200) 8 headers in 749 bytes (1 switches on core 0)                                                                                                                                         
pool size:  3                                                                                                                               
pool stats before:  {'connections_num': 5, 'requests_num': 3, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms
': 34, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 3, 'requests_waiting': 0}                                            
pool stats after query:  {'connections_num': 5, 'requests_num': 4, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usa
ge_ms': 34, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 2, 'requests_waiting': 0}                                       
pool stats after:  {'connections_num': 5, 'requests_num': 4, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms'
: 49, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 2, 'requests_waiting': 0}                                             
[pid: 236344|app: 0|req: 28/28] () {56 vars in 1087 bytes} [Mon Feb  5 11:41:58 2024] POST /v1/iobjs => generated 4788 bytes i
n 29 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)                                                                     
[pid: 236344|app: 0|req: 29/29] () {54 vars in 816 bytes} [Mon Feb  5 11:42:05 2024] OPTIONS /v1/user/quit => generated 0 byte
s in 0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)                                                                                                                                                              
pool size:  0   
pool stats before:  {'connections_num': 5, 'requests_num': 6, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms
': 62, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 0, 'requests_waiting': 0}                                            
Traceback (most recent call last):                                                          
  File "/var/srvr/log.py", line 68, in process
    self.db.query(
  File "/var/srvr/pg3p.py", line 71, in query
    with self.pool.connection() as cnx:
  File "/usr/local/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/var/srvr/lib/python3.12/site-packages/psycopg_pool/pool.py", line 170, in connection
    conn = self.getconn(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/srvr/lib/python3.12/site-packages/psycopg_pool/pool.py", line 204, in getconn
    raise PoolTimeout(
psycopg_pool.PoolTimeout: couldn't get a connection after 30.00 sec  
pool size:  0  
pool stats before:  {'connections_num': 5, 'requests_num': 7, 'requests_queued': 2, 'connections_ms': 268, 'requests_wait_ms': 30035, 'usage
_ms': 62, 'requests_errors': 1, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 0, 'requests_waiting': 1}

EDIT:
I switched back to 1 single persistant connection, it is very stable (days). Then following advices in comments, I moved back to pool with min_size=10 and max_size=20.
No change in behaviour: pool is loosing connections without trying to initiate new ones to replace the lost (also tried 20 and 50 min/max, no difference)

pool stats after:  {'connections_num': 11, 'requests_num': 34, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 67, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 5, 'requests_waiting': 0}
[pid: 282017|app: 0|req: 39/39]  () {56 vars in 1087 bytes}[Thu Feb  8 11:02:17 2024] POST /v1/iobjs => generated 30081 bytes  in 10 msecs (HTTP/1.1 200) 6 headers in 303 bytes (1 switches on core 0)
pool size:  5                                                         
pool stats before:  {'connections_num': 11, 'requests_num': 34, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_ ms': 67, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 5, 'requests_waiting': 0}                          
pool stats after query:  {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'u sage_ms': 67, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}       
pool stats after:  {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}       
[pid: 282017|app: 0|req: 40/40]  () {56 vars in 1087 bytes} [Thu Feb  8 11:02:17 2024] POST /v1/iobjs => generated 4788 bytes i n 5 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)   
[pid: 282017|app: 0|req: 41/41]  () {54 vars in 808 bytes} [Thu Feb  8 11:02:26 2024] OPTIONS /v1/iobjs => generated 0 bytes in  0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)   
[pid: 282017|app: 0|req: 42/42]  () {54 vars in 814 bytes} [Thu Feb  8 11:02:26 2024] OPTIONS /v1/settings => generated 0 bytes  in 0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)                                                                  
pool size:  3                                                          
pool stats before:  {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_ ms': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}       
pool stats after query:  {'connections_num': 11, 'requests_num': 36, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'u sage_ms': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 2, 'requests_waiting': 0}       
pool stats after:  {'connections_num': 11, 'requests_num': 36, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 73, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 2, 'requests_waiting': 0}       
[pid: 282017|app: 0|req: 43/43]  () {56 vars in 1087 bytes} [Thu Feb  8 11:02:26 2024] POST /v1/iobjs => generated 4788 bytes i n 6 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)   
Traceback (most recent call last):                                     File "main.py", line 326, in v1_settings                              

    row = db.query_row(                                                                                                                     
          ^^^^^^^^^^^^^

and postgresql logs (debug3) show nothing special (AFAIU):

2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG:  proc_exit(0): 2 callbacks to make                                                
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG:  exit(0)                                                                          
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make                            
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make                                
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG:  proc_exit(-1): 0 callbacks to make                                               
2024-02-08 11:02:17.079 CET [281970] DEBUG:  server process (PID 282007) exited with exit code 0                                            
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to make                             
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  shmem_exit(0): 6 on_shmem_exit callbacks to make                                 
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  proc_exit(0): 2 callbacks to make                                                
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  exit(0)                                                                          
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make                            
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make                                
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  proc_exit(-1): 0 callbacks to make                                               
2024-02-08 11:02:26.188 CET [282009] pguser@maindb DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to make                             
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  proc_exit(0): 2 callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  exit(0)
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:26.189 CET [281970] DEBUG:  server process (PID 282006) exited with exit code 0
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG:  shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG:  proc_exit(0): 2 callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG:  exit(0)
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG:  proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:26.193 CET [281970] DEBUG:  server process (PID 282009) exited with exit code 0
2024-02-08 11:02:26.194 CET [281970] DEBUG:  server process (PID 282011) exited with exit code 0
2024-02-08 11:02:33.979 CET [281970] DEBUG:  postmaster received pmsignal signal
2024-02-08 11:02:33.983 CET [282844] DEBUG:  InitPostgres
2024-02-08 11:02:33.985 CET [282844] DEBUG:  autovacuum: processing database "template1"

query_row() being same as query():

def query_row(self, q, dbv=None, debug=False):

    with self.pool.connection() as cnx:
        cnx.autocommit = True
        cnx.row_factory = self.row_factory

        with psycopg.ClientCursor(cnx) as c:
            c.execute(q, dbv) if dbv else c.execute(q)

            if debug and c._query:
                print(c._query.query)

            if c.rowcount == 1:
                return c.fetchone()
            else:
                return None

Solution

  • connection logs from postgresql gave a hint: SSL error before all connection going dust:

    2024-02-09 10:38:05.627 CET [297036] LOG:  checkpoint starting: time                                                                        
    2024-02-09 10:38:05.739 CET [297036] LOG:  checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0. 103 s, sync=0.002 s, total=0.112 s; sync files=2, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=0/1C4BB70, redo lsn=0/1C4BB38
    2024-02-09 10:46:46.488 CET [297518] user@db LOG:  SSL error: decryption failed or bad record mac
    2024-02-09 10:46:46.489 CET [297518] user@db LOG:  could not receive data from client: Connection reset by peer
    2024-02-09 10:46:46.489 CET [297518] user@db LOG:  disconnection: session time: 0:10:00.165 user=user database=db host=127.0.0.1 port=56266
    2024-02-09 12:18:21.905 CET [297528] user@db LOG:  disconnection: session time: 1:41:35.416 user=user database=db host=127.0.0.1 port=56356
    2024-02-09 12:18:21.909 CET [297519] user@db LOG:  disconnection: session time: 1:41:35.587 user=user database=db host=127.0.0.1 port=56276
    2024-02-09 12:18:24.998 CET [297520] user@db LOG:  disconnect103 s, sync=0.002 s, total=0.112 s; sync files=2, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=0/1C4BB70, redo lsn=0/1C4BB38
    2024-02-09 12:18:25.033 CET [297521] user@db LOG:  disconnection: session time: 1:41:38.672 user=user database=db host=127.0.0.1 port=56296
    2024-02-09 12:18:33.726 CET [297522] user@db LOG:  disconnection: session time: 1:41:47.360 user=user database=db host=127.0.0.1 port=56302
    2024-02-09 12:18:33.739 CET [297523] user@db LOG:  disconnection: session time: 1:41:47.372 user=user database=db host=127.0.0.1 port=56308
    2024-02-09 12:18:33.745 CET [297524] user@db LOG:  disconnection: session time: 1:41:47.316 user=user database=db host=127.0.0.1 port=56310
    2024-02-09 12:18:33.760 CET [297525] user@db LOG:  disconnection: session time: 1:41:47.325 user=user database=db host=127.0.0.1 port=56320
    2024-02-09 12:18:35.793 CET [297526] user@db LOG:  disconnection: session time: 1:41:49.353 user=user database=db host=127.0.0.1 port=56336
    2024-02-09 12:18:35.847 CET [297527] user@db LOG:  disconnection: session time: 1:41:49.365 user=user database=db host=127.0.0.1 port=56340
    

    Reason of this SSL error comes from uwsgi default behaviour: if you have 2 workers, at init a context is created (variables initialized, among them Pool and connections) then workers share memory. It results the classical case where connections are shared between 2 workers.
    At one point in time, it is causing a SSL inconsistency, resulting in mass-closing.
    Solution was to instruct, with lazy-apps=true, uwsgi to create each worker with its own context.