If it's a standalone persistant connection, I have no problem, connection lasts for hours.
If I use psycopg(3) Connection pool, I make make requests, first requests are ok and my pool size stays at 5, at one point pool size decreases and I get a Pool Timeout when client makes a new request.
Then I tried: start pool, do not request anything, just wait. After some time (around 1h) I look at postgresql (pg_stat_activity), I have 5 idle (=pool size) connections. Then I make a request from client, and all connections vanish at same time (I can see it from pg_stat_activity) and Pool Timeout, and situation is stuck.
I also tried to decrease max_timeout to 900 but still same issue.
def init_pool(self, min_cnx=5):
cnx_str = f"host={DB_HOST} port={DB_PORT} dbname={DB_NAME} user={DB_USERNAME} password={DB_USERPWD}"
self.pool = ConnectionPool(conninfo=cnx_str, min_size=min_cnx, open=True, check=ConnectionPool.check_connection)
def query(self, q, dbv=None, debug=False) -> list:
print("pool size: ", len(self.pool._pool))
print("pool stats before: ", self.pool.get_stats())
with self.pool.connection() as cnx:
if cnx.closed:
self.pool.check()
raise ConnectionError("ERROR: PostgreSQL cnx from pool is closed.")
cnx.autocommit = True
cnx.row_factory = self.row_factory
with psycopg.ClientCursor(cnx) as rdc:
rdc.execute(q, dbv) if dbv else rdc.execute(q)
if debug and rdc._query:
print(rdc._query.query)
if rdc.description:
data = rdc.fetchall()
else:
data = []
print("pool stats after query: ", self.pool.get_stats())
print("pool stats after: ", self.pool.get_stats())
return data
And logs:
[pid: 236344|app: 0|req: 26/26] () {56 vars in 1083 bytes} [Mon Feb 5 11:41:56 2024] POST /v1/user => generated 933 bytes in
109 msecs (HTTP/1.1 200) 8 headers in 749 bytes (1 switches on core 0)
pool size: 3
pool stats before: {'connections_num': 5, 'requests_num': 3, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms
': 34, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 3, 'requests_waiting': 0}
pool stats after query: {'connections_num': 5, 'requests_num': 4, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usa
ge_ms': 34, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 2, 'requests_waiting': 0}
pool stats after: {'connections_num': 5, 'requests_num': 4, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms'
: 49, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 2, 'requests_waiting': 0}
[pid: 236344|app: 0|req: 28/28] () {56 vars in 1087 bytes} [Mon Feb 5 11:41:58 2024] POST /v1/iobjs => generated 4788 bytes i
n 29 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)
[pid: 236344|app: 0|req: 29/29] () {54 vars in 816 bytes} [Mon Feb 5 11:42:05 2024] OPTIONS /v1/user/quit => generated 0 byte
s in 0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)
pool size: 0
pool stats before: {'connections_num': 5, 'requests_num': 6, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms
': 62, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 0, 'requests_waiting': 0}
Traceback (most recent call last):
File "/var/srvr/log.py", line 68, in process
self.db.query(
File "/var/srvr/pg3p.py", line 71, in query
with self.pool.connection() as cnx:
File "/usr/local/lib/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/var/srvr/lib/python3.12/site-packages/psycopg_pool/pool.py", line 170, in connection
conn = self.getconn(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/srvr/lib/python3.12/site-packages/psycopg_pool/pool.py", line 204, in getconn
raise PoolTimeout(
psycopg_pool.PoolTimeout: couldn't get a connection after 30.00 sec
pool size: 0
pool stats before: {'connections_num': 5, 'requests_num': 7, 'requests_queued': 2, 'connections_ms': 268, 'requests_wait_ms': 30035, 'usage
_ms': 62, 'requests_errors': 1, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 0, 'requests_waiting': 1}
EDIT:
I switched back to 1 single persistant connection, it is very stable (days). Then following advices in comments, I moved back to pool with min_size=10 and max_size=20.
No change in behaviour: pool is loosing connections without trying to initiate new ones to replace the lost (also tried 20 and 50 min/max, no difference)
pool stats after: {'connections_num': 11, 'requests_num': 34, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 67, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 5, 'requests_waiting': 0}
[pid: 282017|app: 0|req: 39/39] () {56 vars in 1087 bytes}[Thu Feb 8 11:02:17 2024] POST /v1/iobjs => generated 30081 bytes in 10 msecs (HTTP/1.1 200) 6 headers in 303 bytes (1 switches on core 0)
pool size: 5
pool stats before: {'connections_num': 11, 'requests_num': 34, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_ ms': 67, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 5, 'requests_waiting': 0}
pool stats after query: {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'u sage_ms': 67, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}
pool stats after: {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}
[pid: 282017|app: 0|req: 40/40] () {56 vars in 1087 bytes} [Thu Feb 8 11:02:17 2024] POST /v1/iobjs => generated 4788 bytes i n 5 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)
[pid: 282017|app: 0|req: 41/41] () {54 vars in 808 bytes} [Thu Feb 8 11:02:26 2024] OPTIONS /v1/iobjs => generated 0 bytes in 0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)
[pid: 282017|app: 0|req: 42/42] () {54 vars in 814 bytes} [Thu Feb 8 11:02:26 2024] OPTIONS /v1/settings => generated 0 bytes in 0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)
pool size: 3
pool stats before: {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_ ms': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}
pool stats after query: {'connections_num': 11, 'requests_num': 36, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'u sage_ms': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 2, 'requests_waiting': 0}
pool stats after: {'connections_num': 11, 'requests_num': 36, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 73, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 2, 'requests_waiting': 0}
[pid: 282017|app: 0|req: 43/43] () {56 vars in 1087 bytes} [Thu Feb 8 11:02:26 2024] POST /v1/iobjs => generated 4788 bytes i n 6 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)
Traceback (most recent call last): File "main.py", line 326, in v1_settings
row = db.query_row(
^^^^^^^^^^^^^
and postgresql logs (debug3) show nothing special (AFAIU):
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG: proc_exit(0): 2 callbacks to make
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG: exit(0)
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG: proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:17.079 CET [281970] DEBUG: server process (PID 282007) exited with exit code 0
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: proc_exit(0): 2 callbacks to make
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: exit(0)
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:26.188 CET [282009] pguser@maindb DEBUG: shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: proc_exit(0): 2 callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: exit(0)
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:26.189 CET [281970] DEBUG: server process (PID 282006) exited with exit code 0
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG: shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG: shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG: proc_exit(0): 2 callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG: exit(0)
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG: proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:26.193 CET [281970] DEBUG: server process (PID 282009) exited with exit code 0
2024-02-08 11:02:26.194 CET [281970] DEBUG: server process (PID 282011) exited with exit code 0
2024-02-08 11:02:33.979 CET [281970] DEBUG: postmaster received pmsignal signal
2024-02-08 11:02:33.983 CET [282844] DEBUG: InitPostgres
2024-02-08 11:02:33.985 CET [282844] DEBUG: autovacuum: processing database "template1"
query_row() being same as query():
def query_row(self, q, dbv=None, debug=False):
with self.pool.connection() as cnx:
cnx.autocommit = True
cnx.row_factory = self.row_factory
with psycopg.ClientCursor(cnx) as c:
c.execute(q, dbv) if dbv else c.execute(q)
if debug and c._query:
print(c._query.query)
if c.rowcount == 1:
return c.fetchone()
else:
return None
connection logs from postgresql gave a hint: SSL error before all connection going dust:
2024-02-09 10:38:05.627 CET [297036] LOG: checkpoint starting: time
2024-02-09 10:38:05.739 CET [297036] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0. 103 s, sync=0.002 s, total=0.112 s; sync files=2, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=0/1C4BB70, redo lsn=0/1C4BB38
2024-02-09 10:46:46.488 CET [297518] user@db LOG: SSL error: decryption failed or bad record mac
2024-02-09 10:46:46.489 CET [297518] user@db LOG: could not receive data from client: Connection reset by peer
2024-02-09 10:46:46.489 CET [297518] user@db LOG: disconnection: session time: 0:10:00.165 user=user database=db host=127.0.0.1 port=56266
2024-02-09 12:18:21.905 CET [297528] user@db LOG: disconnection: session time: 1:41:35.416 user=user database=db host=127.0.0.1 port=56356
2024-02-09 12:18:21.909 CET [297519] user@db LOG: disconnection: session time: 1:41:35.587 user=user database=db host=127.0.0.1 port=56276
2024-02-09 12:18:24.998 CET [297520] user@db LOG: disconnect103 s, sync=0.002 s, total=0.112 s; sync files=2, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=0/1C4BB70, redo lsn=0/1C4BB38
2024-02-09 12:18:25.033 CET [297521] user@db LOG: disconnection: session time: 1:41:38.672 user=user database=db host=127.0.0.1 port=56296
2024-02-09 12:18:33.726 CET [297522] user@db LOG: disconnection: session time: 1:41:47.360 user=user database=db host=127.0.0.1 port=56302
2024-02-09 12:18:33.739 CET [297523] user@db LOG: disconnection: session time: 1:41:47.372 user=user database=db host=127.0.0.1 port=56308
2024-02-09 12:18:33.745 CET [297524] user@db LOG: disconnection: session time: 1:41:47.316 user=user database=db host=127.0.0.1 port=56310
2024-02-09 12:18:33.760 CET [297525] user@db LOG: disconnection: session time: 1:41:47.325 user=user database=db host=127.0.0.1 port=56320
2024-02-09 12:18:35.793 CET [297526] user@db LOG: disconnection: session time: 1:41:49.353 user=user database=db host=127.0.0.1 port=56336
2024-02-09 12:18:35.847 CET [297527] user@db LOG: disconnection: session time: 1:41:49.365 user=user database=db host=127.0.0.1 port=56340
Reason of this SSL error comes from uwsgi default behaviour: if you have 2 workers, at init a context is created (variables initialized, among them Pool and connections) then workers share memory. It results the classical case where connections are shared between 2 workers.
At one point in time, it is causing a SSL inconsistency, resulting in mass-closing.
Solution was to instruct, with lazy-apps=true
, uwsgi to create each worker with its own context.